### Counterfactual Regret Minimization (CFR) and its application to Kuhn Poker


Source consulted: https://modelai.gettysburg.edu/2013/cfr/cfr.pdf


Kuhn Poker is a simple 3-card poker game created by Harold E. Kuhn. Two players each bet 1 chip blind into the pot before the deal. Three cards (usually K, Q, and J) are suffled, and one card is dealt to each player and held as private information in the original Kuhn Poker game. We implement the game as described in the above paper with additional change as inspired from the [ReBeL](https://arxiv.org/abs/2007.13544) paper. Players do not have any private information -- a referee observes the players' cards and makes decision based on the players' strategy which they announce in the beginning of the game. The players then update their belief based on how the other player's play and simultaneously update their strategy. Further information can be found on the paper.  

In [1]:
import numpy as np

In [3]:
class Kuhn_Poker():
    """
    Actions: Pass (0)
             Bet  (1)
             
    Number of Actions: 2
    
    """
    def __init__(self):
        self.kPass = 0
        self.kBet = 1
        self.num_actions = 2
        self.infoset = InfoSet()
    
    @staticmethod
    def is_terminal(history):
        
        return history in ['00', '11', '10', '01']
        
    @staticmethod
    def get_payoff(history, cards):
        
        """
        payoff structure:
        
            1) check if both players have had at least one action:
                 i) a 'terminal' pass after the first action:
                     a) if it is a terminal pass, then a double terminal pass gives 1 chip to 
                         the player with higher card
                     b) if it is single pass after a bet, the player betting wins 1 chip
                 
                 ii) if not terminal pass, but two consecutive bets, then player with higher card
                     gets 2 chips
    
        """
        
        plays = len(history)
        player = len(history[plays-1]) % 2
        op = 1 - player
        
        if is_terminal(history):
            terminalPass = history[plays-1][1] == '0'
            doubleBet = history[plays-1] == '11'
            if terminalPass:
                if history[plays-1] == '00':
                    return 1  if cards[player] > cards[op]  else -1
                else:
                    return 1
            
            elif doubleBet:
                return 2 if cards[player] > cards[op] else -2
        

class InfoSet():
    """
    Action-observation history of the game
    
    """
    
    def __init__(self):
        self.regretSum = np.zeros(shape=2)
        self.strategy = np.zeros(shape=2)
        self.strategySum = np.zeros(shape=2) 
    
    def normalize(self):
        """
        Normalize the strategy to (0, 1). If the regrets are negative, return uniform strategy.
        
        """
        if sum(self.strategy) > 0:
            self.strategy /= sum(self.strategy)
        else:
            self.strategy = np.ones(2)/2
        
        return self.strategy
    
    def getStrategy(self, reach_prob):
        self.strategy = np.max(0, self.regretSum)
        self.normalize() 
        
        self.strategySum = reach_prob * self.strategy
        
        return self.strategy
    
    def getAverageStrategy(self):
        return self.strategySum/sum(self.strategySum)
 


        