<a href="https://colab.research.google.com/github/rissicay/reinforcement_learning/blob/main/Poker_Environment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Poker Environment

### Github Link
https://github.com/rissicay/reinforcement_learning/blob/main/Poker_Environment.ipynb <br><br>

This is a simplified Poker environment that can be used to learn a good policy for starting hand selection. The game in this environment is a player vs agent game. The ante is set at \$1 and bet size is locked at \$50. Initial balance is \$1000. The goal of this environment is either player going broke or the player earning 10x their initial balance (\$10,000).

We use pydealer pip package to handle creating a deck, shuffling a deck and dealing cards.

## Global variables, Static Function & Dependencies

Ante (BLIND variable), bet size, the goal and the inital balance are all global variables that can be adjusted. These values are very important in regards to the fit of the rewards function.

The two static functions are just method to handle the conversion between pydealer and the score function below.

In [None]:
!pip install pydealer
import pydealer

BLIND = 1
BET_SIZE = 50
PRIZE_GOAL = 10000
INITIAL_BALANCE = 1000

def convert_value(value):
    if value == 'Ace':
        return 'A'
    elif value == 'King':
        return 'K'
    elif value == 'Queen':
        return 'Q'
    elif value == 'Jack':
        return 'J'
    elif value == '10':
        return 'T'
    else:
        return value


def convert_suit(suit):
    if suit == "Spades":
        return "S"
    elif suit == "Clubs":
        return "C"
    elif suit == "Diamonds":
        return "D"
    elif suit == "Hearts":
        return "H"



## The Environment

The reset function of the environment will set the balance of the player back to the intial balance amount and call the deal cards method. Deal cards method will shuffle the cards for both the player and the dealer. The observations that are returned by reset is the balance of the player and the player cards.

The step function takes the actions given by the user. If the user folds their hand, they will subtract the ante from the users balance, deal new set of card and check if the goal has been meet. The step function then will return the observation (balance and player cards), the reward (the amount loss) and a boolean on whether the game is done or not. If the user bets, then the step function calculates the winning or losses from the game. The step function will then deal a new set of cards. The step function will then return the new observations, the rewards which is the winnings or loss by the user and the boolean on whether the goal is met or not.

In [None]:
class Environment:

    def who_won(self):

        ## score function taken from
        ## https://stackoverflow.com/questions/10363927/the-simplest-algorithm-for-poker-hand-evaluation
        def score(hand):
            ranks = '23456789TJQKA'
            rcounts = {ranks.find(r): ''.join(hand).count(r) for r, _ in hand}.items()
            score, ranks = zip(*sorted((cnt, rank) for rank, cnt in rcounts)[::-1])
            if len(score) == 5:
                if ranks[0:2] == (12, 3):  # adjust if 5 high straight
                    ranks = (3, 2, 1, 0, -1)
                straight = ranks[0] - ranks[4] == 4
                flush = len({suit for _, suit in hand}) == 1
                '''no pair, straight, flush, or straight flush'''
                score = ([(1,), (3, 1, 1, 1)], [(3, 1, 1, 2), (5,)])[flush][straight]
            return score, ranks

        scores = [(i, score(hand.split())) for i, hand in enumerate([self.final_player_cards, self.final_dealer_cards])]
        winner = sorted(scores, key=lambda x: x[1])[-1][0]
        return winner

    def is_done(self):
        if self.balance <= 0:
            return True
        elif self.balance >= PRIZE_GOAL:
            return True

        return False

    def __init__(self):
        self.balance = INITIAL_BALANCE
        self.final_player_cards = None
        self.final_dealer_cards = None

    def deal_cards(self):
        deck = pydealer.Deck()
        deck.shuffle()
        player_cards = deck.deal(2)

        #### Get dealers cards ####

        dealer_cards = []

        idx_of_seven = deck.find('2')
        dealer_cards.append(deck.get(idx_of_seven[0])[0])
        idx_of_seven = deck.find('2')
        dealer_cards.append(deck.get(idx_of_seven[0])[0])

        #### Get players cards ####

        flop = deck.deal(3)

        final_player_cards = ""
        final_dealer_cards = ""

        for card in player_cards:
            final_player_cards += convert_value(card.value) + convert_suit(card.suit) + " "

        for card in flop:
            final_player_cards += convert_value(card.value) + convert_suit(card.suit) + " "

        for card in dealer_cards:
            final_dealer_cards += convert_value(card.value) + convert_suit(card.suit) + " "

        for card in flop:
            final_dealer_cards += convert_value(card.value) + convert_suit(card.suit) + " "

        self.final_player_cards = final_player_cards.rstrip()
        self.final_dealer_cards = final_dealer_cards.rstrip()

    def reset(self):
        self.balance = INITIAL_BALANCE

        self.deal_cards()

        return self.balance, self.final_player_cards

    def step(self, action):

        # fold this hand
        if action == 0:
            self.balance = self.balance - BLIND

            self.deal_cards()

            return (self.balance, self.final_player_cards), -BLIND, self.is_done(), {}

        elif action == 1:
            self.balance = self.balance - BLIND - BET_SIZE

            winner = self.who_won()

            if winner == 0:
                reward_outcome = (BET_SIZE + BLIND) * 2
            else:
                reward_outcome = -(BET_SIZE + BLIND)

            self.balance = self.balance + reward_outcome

            self.deal_cards()

            return (self.balance, self.final_player_cards), reward_outcome, self.is_done(), {}

def make():
    return Environment()
  

## The Policy

To meet our goal is quite simple, because we know exactly how the dealer works. The dealer will always have a pair of twos, because we know this, we know that it is impossible for us to win if we have less than a pair. A simple policy that only bets when we have a pair will mean that over time we should meet the goal.

This policy while meeting our goals, is quite useless to us.

In [None]:
def policy(observation, time_step):

    ## we only want to bet if we have a pair
    balance, player_cards = observation

    individual_card = player_cards.split(' ')

    if individual_card[0][0] == individual_card[1][0]:
        return 1
    else:
        return 0

## Example code

In [None]:
    env = make()

    observation = env.reset()

    TIME_LIMIT = 200

    for t in range(TIME_LIMIT):
        action = policy(observation, t)

        observation, reward, done, _ = env.step(action)

        if done and t < TIME_LIMIT - 1:
            print("Task completed in", t, "time steps")
            break
    else:
        print("Time limit exceeded. Try again.")

    env.reset()

Time limit exceeded. Try again.


(1000, '2H TD 3H 5H QS')