[![Github](https://img.shields.io/github/stars/lab-ml/nn?style=social)](https://github.com/lab-ml/nn)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/compressive/experiment.ipynb)                    

## [Counterfactual Regret Minimization (CFR)](https://nn.labml.ai/cfr/index.html) on Kuhn Poker

This is an experiment learning to play Kuhn Poker with Counterfactual Regret Minimization CFR algorithm.

Install the `labml-nn` package

In [1]:
!pip install labml-nn

Collecting labml-nn
[?25l  Downloading https://files.pythonhosted.org/packages/4c/22/1d55151f5b2ba65bf2241684a701ecbb5d3d88ba70a2761ac64a7a7d40ba/labml_nn-0.4.89-py3-none-any.whl (154kB)
[K     |████████████████████████████████| 163kB 16.4MB/s eta 0:00:01
Collecting einops
  Downloading https://files.pythonhosted.org/packages/5d/a0/9935e030634bf60ecd572c775f64ace82ceddf2f504a5fd3902438f07090/einops-0.3.0-py2.py3-none-any.whl
Collecting labml>=0.4.103
[?25l  Downloading https://files.pythonhosted.org/packages/31/14/ffed9b2bd050230ee774ac178576da9152a753351a18e98e095128781af7/labml-0.4.103-py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 12.3MB/s 
Collecting labml-helpers>=0.4.76
  Downloading https://files.pythonhosted.org/packages/49/df/4d920a4a221acd3cfa384dddb909ed0691b08682c0d8aeaabeee2138624f/labml_helpers-0.4.76-py3-none-any.whl
Collecting gitpython
[?25l  Downloading https://files.pythonhosted.org/packages/fb/67/47a04d8a9d7f94645676fe683f1ee3fe9be01fe

Imports

In [16]:
from labml import experiment, analytics
from labml_nn.cfr.analytics import plot_infosets
from labml_nn.cfr.kuhn import Configs
from labml_nn.cfr.infoset_saver import InfoSetSaver

Create an experiment, we only write tracking information to `sqlite` to speed things up.
Since the algorithm iterates fast and we track data on each iteration, writing to
other destinations such as Tensorboard can be relatively time consuming.
SQLite is enough for our analytics.

In [4]:
experiment.create(name='kuhn_poker', writers={'sqlite', 'screen'})

Initialize configurations

In [5]:
conf = Configs()

Set experiment configurations and assign a configurations dictionary to override configurations

In [6]:
experiment.configs(conf)

Set PyTorch models for loading and saving

In [10]:
experiment.add_model_savers({'info_sets': InfoSetSaver(conf.cfr.info_sets)})

Start the experiment and run the training loop.

In [11]:
# Start the experiment
with experiment.start():
    conf.cfr.iterate()

KeyboardInterrupt: 

In [14]:
inds = analytics.runs(experiment.get_uuid())

In [15]:
dir(inds)

['average_strategy_A_b',
 'average_strategy_A_p',
 'average_strategy_Ab_b',
 'average_strategy_Ab_p',
 'average_strategy_K_b',
 'average_strategy_K_p',
 'average_strategy_Kb_b',
 'average_strategy_Kb_p',
 'average_strategy_Q_b',
 'average_strategy_Q_p',
 'average_strategy_Qb_b',
 'average_strategy_Qb_p',
 'regret_A_b',
 'regret_A_p',
 'regret_Ab_b',
 'regret_Ab_p',
 'regret_K_b',
 'regret_K_p',
 'regret_Kb_b',
 'regret_Kb_p',
 'regret_Q_b',
 'regret_Q_p',
 'regret_Qb_b',
 'regret_Qb_p',
 'strategy_A_b',
 'strategy_A_p',
 'strategy_Ab_b',
 'strategy_Ab_p',
 'strategy_K_b',
 'strategy_K_p',
 'strategy_Kb_b',
 'strategy_Kb_p',
 'strategy_Q_b',
 'strategy_Q_p',
 'strategy_Qb_b',
 'strategy_Qb_p',
 'time_Track',
 'time_loop']

In [17]:
plot_infosets(inds['average_strategy.*'], width=600, height=500).display()

In [18]:
analytics.scatter(inds.average_strategy_Q_b, inds.average_strategy_Kb_b,
                  width=400, height=400)