Open Pure CFR
Pure CFR is a time and memory-efficient variant of the Counterfactual Regret Minimization (CFR) algorithm algorithm for computing strategies for an extensive-form game with imperfect information, such as poker. Unlike Monte Carlo CFR (MCCFR), Pure CFR samples a pure strategy profile (exactly one action assigned probability 1 at every state) on each iteration and performs an update assuming the players and chance follow the sampled pure profile. In practice, strategies improve faster over time than the original CFR algorithm. In addition, while previous CFR algorithms require floating-point values (doubles), Pure CFR uses integer values (ints) and thus uses approximately half the memory of other common variants. Furthermore, Pure CFR retains the theoretical guarantees of MCCFR, and thus converges (probabilistically) to an equilibrium profile in two-player zero-sum games. More details regarding Pure CFR can be found in my PhD dissertation. Credit goes to Oskari Tammelin for inventing the algorithm.
About this Implementation
This implementation of Pure CFR computes poker strategies that are playable in the Annual Computer Poker Competition (ACPC). In addition to games played at the ACPC, this implementation can also be run on any game that can be defined under the project_acpc_server framework. The code runs on both Linux and Windows under Cygwin. The code has not been tested on a Mac, but feel free to give Mac a shot if you are feeling adventurous.
First, you must have both
gcc-g++ installed on your machine. Then, in your open-pure-cfr directory, simply run
make and wait for the code to finish compiling. Once complete, you should have three new programs in your open-pure-cfr directory:
This is the main program that generates strategies for games specified under the project_acpc_server framework. Sets of three files are generated during the run, a
.player file. Run
./pure_cfr with no additional arguments to display the usage information. We will now describe in detail each of the required and optional arguments before providing some examples of how to run
pure_cfr on a variety of games.
pure_cfr requires two arguments. The first argument must be a file that defines the game to be played. The games provided by the project_acpc_server code can be found in the
games/ subdirectory, along with a definition for Kuhn Poker. The second argument is a prefix that specifies where and what name the output files will be and have respectively.
After these two arguments are specified, a number of different options can be selected:
--config=<file>- Overwrites the two required arguments and the default options through values specified in
parameters.cpp::read_params( )for details on how to format this file.
--rng=<seed1:seed2:seed3:seed4|TIME>- Specifies the seeds to be used to initialize the random number generator, where
seed4are integer values. The random number generator is used to sample a pure strategy profile on each iteration from chance and the players. Alternatively, passing the option
--rng=TIMEinitializes the random number generator according to the current time.
--card-abs=<NULL|BLIND>- Specifies a card abstraction to be used. Only two card abstractions are currently implemented.
--card-abs=NULLspecifies no card abstraction (not even suit isomorphisms), while
--card-abs=BLINDspecifies that all hands fall into the same bucket. NULL is only feasible in toy games, like Kuhn Poker, that use very few cards, while BLIND essentially means that the players never look at the public or their private cards.
--action-abs=<NULL|FCPA>- Specifies an action abstraction to be used. This option should only be used for nolimit games.
--action-abs=NULLspecifies that all actions remain legal in the abstract game, while
--action-abs=FCPAspecifies that only fold, call, pot-sized raises, and all-ins are legal in the abstract game. NULL is only feasible in small nolimit games with low stack sizes.
--load-dump=<dump_prefix>- Loads the regrets and (if
--no-averageis not selected) average strategy from a previous run from the files prefixed by
dump_prefix. This prefix should be the full name of the files to be loaded, but without the
--threads=<num_threads>- Specifies the number of threads to use. Additional threads provide a near-linear speed-up in the algorithm, so use as many as you can afford.
--status=<dd:hh:mm:ss>- Prints status updates to
--checkpoint=<start_time[,mult_time[,add_time]]>- Specifies how frequently the program should dump the regrets and average strategy to disk, where
add_timeare specified using the
dd:hh:mm:ssformat. First, the program will dump after
start_timehas passed from the time the program started. Later dump times depend on whether
add_timeare provided. If
mult_timeis provided, the next dump will come after
mult_time, then again after
mult_time, and so on until the program terminates. If, in addition,
add_timeis provided, then the next dump will come after
add_time, then again after (
add_time, and so on. If
mult_timeis not specified, then the next dumps will occur at 2 *
start_time, then again after 3 *
start_time, and so on.
--max-walltime=<dd:hh:mm:ss>- Specifies when it is time to perform a final dump of regrets and average strategy to disk. After the final dump, the program is terminated.
--no-average- Specifies that no average strategy is to be computed. Currently, average strategy computation in games with more than two players is not supported, and so for such games, this option is mandatory.
Let's start with a very simple example that requires very little computing resources to run:
./pure_cfr games/kuhn.game test.kuhn --status=1 --max-walltime=30
pure_cfr on unabstracted Kuhn Poker for 30 seconds, with status updates printed every second. In all likelihood, the resulting average strategy should be an approximate Nash equilibrium profile. Once complete, you should have three files that look something like:
test.kuhn.iter-???.secs-60.regrets- This is a binary file that contains the accumulated regret for both players at every information set in the game.
test.kuhn.iter-???.secs-60.avg-strategy- This is another binary file that specifies the (sampled) average strategy for both players.
test.kuhn.iter-???.secs-60.player- This is a human-readable wrapper file that contains information about the command-line arguments specified and the prefix of the binary files. On repeated runs, you can instead pass in
--config=test.kuhn.iter-???.secs-60.playerrather than specifying the options on the command line. In addition, the
DO_AVERAGEline in this file specifies whether the average strategy (
DO_AVERAGE TRUE) or the current strategy (
DO_AVERAGE FALSE) is to be played and printed by
Our second example converges to an equilibrium for abstract heads-up limit Texas hold'em where neither player looks at the cards dealt:
./pure_cfr games/holdem.limit.2p.reverse_blinds.game test.holdem.2pl --rng=TIME --card-abs=BLIND --threads=4 --checkpoint=1:0:0 --max-walltime=1:0:0:0
Here, we specify the random number generator to be initialized by the current time, use 4 processors, dump regrets and average strategy to disk every hour, and terminate after 1 day.
Finally, we also provide an example of running
pure_cfr on abstract three-player nolimit Texas hold'em:
./pure_cfr games/holdem.nolimit.3p.game test.holdem.3pn --card-abs=BLIND --action-abs=FCPA --threads=4 --max-walltime=3:0:0:0 --no-average
In addition to the players not being able to see any cards, the players may also only take the actions fold, call, pot-sized raise, and all-in. The options specify 4 processors to be used and to terminate after 3 days of computation. In addition, no average strategy is computed (recall that this is currently required for games with more than two players). Thus, upon completion, only the
.player files are written to disk.
This program is a simple tool for displaying the outputted strategy in a human-readable format. Running
./print_player_strategy with no arguments displays the usage.
One argument is required and a second argument is optional. For the required argument,
print_player_strategy takes the filename of a
.player file generated from
pure_cfr. The optional argument
--max-round=<round> can be used to only print the strategy up to and including round
For example, we can print our Kuhn Poker strategy generated from the example above as follows:
In addition, you can also display the current strategy specified by the regrets. To do so, in the
.player file, change the line
DO_AVERAGE TRUE to
DO_AVERAGE FALSE and run
print_player_strategy again. This strategy may not be an approximate equilibrium.
As a second example, to print out the pre-flop strategy of our heads-up limit hold'em profile generated above after one hour, we would run
./print_player_strategy test.holdem.2pl.iter-???.secs-3600.player --max-round=1
This program allows a strategy profile generated by
pure_cfr to be played through the ACPC protocol. It is based on
example_player from the project_acpc_server framework that communicates back and forth with a dealer program. Similar to the examples provided by the project_acpc_server package, a bash script or something similar is required to run an agent. For example, to run a heads-up limit hold'em profile generated after 1 hour, write a bash script that looks something like this:
#!/bin/bash cd path/to/open-pure-cfr/ ./pure_cfr_player test.holdem.2pl.iter-???.secs-3600.player $1 $2
(Note that ACPC submissions in the past require the
cd path/to/open-pure-cfr/ line to be removed). Save this to a file called, say,
test.holdem.2pl.iter-???.secs-3600.sh and make it executable via
chmod +x test.holdem.2pl.iter-???.secs-3600.sh. This bash script can then be used to play through the ACPC protocol. For more information on running a match through the ACPC framework, please see the documentation included with the project_acpc_server framework.
When multiple threads are specified for
pure_cfr through the
--threads option, these threads act independently on the regrets and average strategy in shared memory. Each thread runs independent iterations through the entire tree visited by the sampled pure strategy profile and no safety precautions are taken to avoid the threads from conflicting with one another. This means that if two threads happen to update the regret at the same location at the same time, one of the updates will be overwritten. Because the chances of this occurring in a large game tree are slim, and because billions of iterations are typically required before competent play is reached, a few iterations of lost updates are not a big concern.
As mentioned in the opening of this README, Pure CFR stores regrets and the average strategy using integer values rather than floating-point values. In this implementation, each regret entry is stored as an
int and each average strategy entry is stored as an
int32_t. One exception to this is that each average strategy entry in the preflop round is stored as an
int64_t. The reason 64-bit ints are used in the preflop instead of 32-bit ints is because the preflop entries are updated (incremented) most frequently of all the average strategy entries and will be the first to overflow. I found cases where overflow occurred with 32-bit ints in the preflop long before the strategy had finished improving, and so 64-bit ints are now used to prevent early overflow. Since the preflop round is also the smallest, the increase in memory usage in very minor.
First of all, thanks to Oskari Tammelin for sharing his ideas for Pure CFR. I must also thank Neil Burch and Michael Johanson of the Computer Poker Research Group as many of the optimization tricks in this implementation were learned from them.
A demo of Pure CFR and other variants are provided by Oskari Tammelin.
Feel free to email me at email@example.com with any questions about this project.