### Counterfactual Regret Minimization (CFR) and its application to Kuhn Poker


Source consulted: https://modelai.gettysburg.edu/2013/cfr/cfr.pdf


Kuhn Poker is a simple 3-card poker game created by Harold E. Kuhn. Two players each bet 1 chip blind into the pot before the deal. Three cards (usually K, Q, and J) are suffled, and one card is dealt to each player and held as private information in the original Kuhn Poker game. We implement the game as described in the above paper with additional change as inspired by the [ReBeL](https://arxiv.org/abs/2007.13544) paper. Players do not have any private information -- a referee observes the players' cards and makes decision based on the players' strategy which they announce in the beginning of the game. The players then update their belief based on how the other player's play and simultaneously update their strategy. Further information can be found on the paper.  

In [48]:
## source: https://ai.plainenglish.io/building-a-poker-ai-part-6-beating-kuhn-poker-with-cfr-using-python-1b4172a6ab2d

from typing import List, Dict
import random
import numpy as np
import sys

Actions = ['B', 'C']  # bet/call vs check/fold

class InformationSet():
    def __init__(self):
        self.cumulative_regrets = np.zeros(shape=len(Actions))
        self.strategy_sum = np.zeros(shape=len(Actions))
        self.num_actions = len(Actions)

    def normalize(self, strategy: np.array) -> np.array:
        """Normalize a strategy. If there are no positive regrets,
        use a uniform random strategy"""
        if sum(strategy) > 0:
            strategy /= sum(strategy)
        else:
            strategy = np.array([1.0 / self.num_actions] * self.num_actions)
        return strategy

    def get_strategy(self, reach_probability: float) -> np.array:
        """Return regret-matching strategy"""
        strategy = np.maximum(0, self.cumulative_regrets)
        strategy = self.normalize(strategy)

        self.strategy_sum += reach_probability * strategy
        return strategy

    def get_average_strategy(self) -> np.array:
        return self.normalize(self.strategy_sum.copy())


class KuhnPoker():
    @staticmethod
    def is_terminal(history: str) -> bool:
        return history in ['BC', 'BB', 'CC', 'CBB', 'CBC']

    @staticmethod
    def get_payoff(history: str, cards: List[str]) -> int:
        """get payoff for 'active' player in terminal history"""
        if history in ['BC', 'CBC']:
            return +1
        else:  # CC or BB or CBB
            payoff = 2 if 'B' in history else 1
            active_player = len(history) % 2
            player_card = cards[active_player]
            opponent_card = cards[(active_player + 1) % 2]
            if player_card == 'K' or opponent_card == 'J':
                return payoff
            else:
                return -payoff


class KuhnCFRTrainer():
    def __init__(self):
        self.infoset_map: Dict[str, InformationSet] = {}
        self.current_average_strategy = np.array([0.5, 0.5])

    def get_information_set(self, card_and_history: str) -> InformationSet:
        """add if needed and return"""
        if card_and_history not in self.infoset_map:
            self.infoset_map[card_and_history] = InformationSet()
        return self.infoset_map[card_and_history]

    def cfr(self, cards: List[str], history: str, reach_probabilities: np.array, active_player: int):
        if KuhnPoker.is_terminal(history):
            return KuhnPoker.get_payoff(history, cards)

        my_card = cards[active_player]
        info_set = self.get_information_set(my_card + history)

        strategy = info_set.get_strategy(reach_probabilities[active_player])
        
        ####### CFR-AVG modification as per Rebel #############
        strategy = (self.current_average_strategy + strategy)/2
        self.current_average_strategy = strategy
        ########################################################
        opponent = (active_player + 1) % 2
        counterfactual_values = np.zeros(len(Actions))

        for ix, action in enumerate(Actions):
            action_probability = strategy[ix]

            # compute new reach probabilities after this action
            new_reach_probabilities = reach_probabilities.copy()
            new_reach_probabilities[active_player] *= action_probability

            # recursively call cfr method, next player to act is the opponent
            counterfactual_values[ix] = -self.cfr(cards, history + action, new_reach_probabilities, opponent)

        # Value of the current game state is just counterfactual values weighted by action probabilities
        node_value = counterfactual_values.dot(strategy)
        for ix, action in enumerate(Actions):
            info_set.cumulative_regrets[ix] += reach_probabilities[opponent] * (counterfactual_values[ix] - node_value)

        return node_value

    def train(self, num_iterations: int) -> int:
        util = 0
        kuhn_cards = ['J', 'Q', 'K']
        for _ in range(num_iterations):
            cards = random.sample(kuhn_cards, 2)
            history = ''
            reach_probabilities = np.ones(2)
            util += self.cfr(cards, history, reach_probabilities, 0)
        return util

In [46]:
num_iterations = 1000
cfr_trainer = KuhnCFRTrainer()
util = cfr_trainer.train(num_iterations)

print(f"\nRunning Kuhn Poker chance sampling CFR for {num_iterations} iterations")
print(f"\nExpected average game value (for player 1): {(-1./18):.3f}")
print(f"Computed average game value               : {(util / num_iterations):.3f}\n")

print("We expect the bet frequency for a Jack to be between 0 and 1/3")
print("The bet frequency of a King should be three times the one for a Jack\n")

print(f"History  Bet  Pass")
for name, info_set in sorted(cfr_trainer.infoset_map.items(), key=lambda s: len(s[0])):
    print(f"{name:3}:    {info_set.get_average_strategy()}")


Running Kuhn Poker chance sampling CFR for 1000 iterations

Expected average game value (for player 1): -0.056
Computed average game value               : -0.061

We expect the bet frequency for a Jack to be between 0 and 1/3
The bet frequency of a King should be three times the one for a Jack

History  Bet  Pass
J  :    [0.19339867 0.80660133]
Q  :    [0.00998004 0.99001996]
K  :    [0.66291148 0.33708852]
QB :    [0.30502981 0.69497019]
QC :    [0.02380952 0.97619048]
JB :    [0.00149254 0.99850746]
JC :    [0.34916518 0.65083482]
KB :    [0.99854227 0.00145773]
KC :    [0.99562682 0.00437318]
JCB:    [9.39219577e-04 9.99060780e-01]
QCB:    [0.6053928 0.3946072]
KCB:    [0.99779272 0.00220728]


In [49]:
num_iterations = 1000
cfr_trainer = KuhnCFRTrainer()
util = cfr_trainer.train(num_iterations)

print(f"\nRunning Kuhn Poker chance sampling CFR for {num_iterations} iterations")
print(f"\nExpected average game value (for player 1): {(-1./18):.3f}")
print(f"Computed average game value               : {(util / num_iterations):.3f}\n")

print("We expect the bet frequency for a Jack to be between 0 and 1/3")
print("The bet frequency of a King should be three times the one for a Jack\n")

print(f"History  Bet  Pass")
for name, info_set in sorted(cfr_trainer.infoset_map.items(), key=lambda s: len(s[0])):
    print(f"{name:3}:    {info_set.get_average_strategy()}")


Running Kuhn Poker chance sampling CFR for 1000 iterations

Expected average game value (for player 1): -0.056
Computed average game value               : -0.005

We expect the bet frequency for a Jack to be between 0 and 1/3
The bet frequency of a King should be three times the one for a Jack

History  Bet  Pass
K  :    [0.99852071 0.00147929]
J  :    [0.51056561 0.48943439]
Q  :    [0.99842271 0.00157729]
JB :    [0.0015528 0.9984472]
JC :    [0.42504716 0.57495284]
QB :    [0.99855072 0.00144928]
QC :    [0.91930593 0.08069407]
KB :    [0.9984985 0.0015015]
KC :    [0.9984985 0.0015015]
KCB:    [0.99510175 0.00489825]
JCB:    [0.00117214 0.99882786]
QCB:    [0.86627698 0.13372302]
