# EASY BLACKJACK PROJECT

We want to implement this little side project were we implement an agent playing blackjack against the house. 
First of all we implement an easy deck class to simulate the deck behaviour. It is really simplified but long story short we use a full deck everytime and when there are too less cards in the deck we use a new one.
We implemented in italian the name of the cards because they sound better.

In [18]:
import random
class Deck():
    def __init__(self):
        self.__seed = ['cuori', 'quadri', 'fiori', 'picche']
        self.__values = ['asso', 'due', 'tre', 'quattro', 'cinque', 'sei', 'sette', 'otto', 'nove', 'dieci', 'jack', 'regina', 're']
        self.__integer_values = {v : i + 1 for i, v in enumerate(self.__values)}
        self.__integer_values['jack'] = 10
        self.__integer_values['regina'] = 10
        self.__integer_values['re'] = 10
        self.__ordered_card = {c : i for i, c in enumerate([f"{v} di {s}" for s in self.__seed for v in self.__values])}
        self.__ordered_value = {i : c for i, c in enumerate([f"{v} di {s}" for s in self.__seed for v in self.__values])}
        self.__cards = [f"{v} di {s}" for s in self.__seed for v in self.__values ]

    def new_deck(self):
        self.__cards = [f"{v} di {s}" for v in self.__values for s in self.__seed]

    def shuffle(self):
        random.shuffle(self.__cards)

    def get_one(self):
        return self.__cards.pop()
    
    def get_some(self, k = 2):
        if k > 52:
            print("Maximum card requested 52")
            raise Exception
        if k > len(self.__cards):
            print("A new deck will be used") # just to remember
            self.new_deck()
        res = [self.get_one() for i in range (k)]
        return res
    
    def get_value_from_card(self, c):
        return self.__ordered_card[c]
    
    def get_card_from_value(self, v):
        return self.__ordered_value[v]

    def get_integer_value(self, c):
        v = c.split()[0]
        return self.__integer_values[v]
    
    def get_deck(self):
        return self.__cards

Now that the deck is working we build the house. The house is like a player but with a fixed policy (soft 17). The policy states the following : if hour hand value is less than 17, we ask for an other card. Otherwise we stay. It assign 1 the value 1 or 11 respectively based on what he can actually do (1 if 11 is too much, 11 otherwise).

In [194]:
class House:

    def __init__(self):
        self.hand = []
        self.hand_v = []
        self.value = 0

    def first(self, deck : Deck, verbose = True):
        self.hand = deck.get_some(2)
        self.hand_v = [deck.get_integer_value(c) for c in self.hand]
        if self.hand_v[0] == 1:
            self.hand_v[0] = 11
        if self.hand_v[1] == 1 and self.hand_v[0] != 11:
            self.hand_v[1] = 11
        if verbose:
            print(self.hand[0])
        return self.hand_v[0]

    def play(self, deck : Deck, verbose = True):
        if verbose:
            print(self.hand[1])
        # Soft 17 : under 17 hit, over it stare
        self.value = sum(self.hand_v)
        while self.value < 17:
            card = deck.get_one()
            if verbose:
                print(card)
            if deck.get_integer_value(card) == 1:
                if self.value + 11 <= 21:
                    self.hand.append(card)
                    self.hand_v.append(11)
                    self.value += 11
                else:
                    self.hand.append(card)
                    self.hand_v.append(deck.get_integer_value(card))
                    self.value = sum(self.hand_v)
            else:
                self.hand.append(card)
                self.hand_v.append(deck.get_integer_value(card))
                self.value = sum(self.hand_v)
        return self.value

In [337]:
d = Deck()
d.shuffle()
h = House()
h.first(d)
h.play(d)

sei di picche
jack di fiori
cinque di quadri


21

Now we need to implement the player. The player will be a simple agent where we will implement some value iteration (for now).

We also define the State Space and all what we need for the MDP.

### State Space 

We store the couple (hand_value, house_value) [we know only the first card of the house but we think at a semplificated version on how to compute the average value of the house value at the end] as the state representation $S({hand, house})$. There are 4 more states : $win, lose, 21, even$ and they are all terminal states.

Then we have only 2 different actions, Hit and Stare. In a terminal state no action is allowed.

We define now our Reward function as follows:

$$
R((i, h), Hit, (i+v, h)) = 0 \\
R((i, h), Stare, win) = 1 \\
R((i, h), Stare, lose) = -1 \\
R((i, h), Stare, even) = 0 \\
R((i, h), Stare, 21) = 1.5
$$

Then we define our Transition Model as follows:

$$
T((i, h), Hit, (i+10, h)) = 4/14 \\
T((i, h), Hit, (i+j, h)) = 1/14 \\

\text{House value} := H = h + 5.8 * \left \lceil{\frac{17 - h}{5.8}}\right \rceil \\

T((i, h), Stare, win) = i > H \land i < 21 \\
T((i, h), Stare, even) = i = H \lor (i > 21 \land H > 21) \\
T((i, h), Stare, 21) = i = 21 \\
T((i, h), Stare, Lose) = 1 - \text{all the other transition}
$$

We can see how we hardly simplify the transition model making the uncertainty about hte house value falling into a "finite value". The House Value is computed as an average on the card extraction value and so $  \left \lceil{\frac{17 - h}{5.8}}\right \rceil $ is the average number of Hit called from the House.

In [30]:
# We will check only good reward thank to all the handling implemented below
def Reward(first_state, action, last_state):
    if action == 'Hit':
        return 0
    if action == 'Stare' and last_state == 'Win':
        return 1
    if action == 'Stare' and last_state == 'Lose':
        return -1
    if action == 'Stare' and last_state == 'Even':
        return 0
    if action == 'Stare' and last_state == '21':
        return 1.5

def AvailableAction(state):
    # In a terminal state we can't do anything
    # In a state where our value is greater or equal to 21 we can only Stare
    # In a state where our value is less then 21 we can Hit or Stare
    Terminal_state = ['Win', 'Lose', 'Even', '21'] 
    if state in Terminal_state:
        return []
    if state[0] >= 21:
        return ['Stare']
    return ['Hit', 'Stare']

# We now for sure to chack on valid action thank to availableaction
def ReachableState(state, action):
    if action == 'Hit':
        return [(state[0]+i, state[1]) for i in range (1, 12)]
    if action == 'Stare':
        return ['Win', 'Lose', 'Even', '21']

from math import ceil
# We now for sure to check on valid transition thank to reachablestate
def Transition(first_state, action, last_state):
    Terminal_state = ['Win', 'Lose', 'Even', '21'] # To remember
    house = first_state[1] # This does not change 
    if action == 'Hit':
        hand = first_state[0]
        card = last_state[0] - hand # This is the value of the picked card
        if card == 10:
            return 4/14 # 10, Jack, Queen, King
        if card != 10:
            return 1/14 # The other
    if action == 'Stare':
        H = house + 5.8 * ceil((17-house)/5.8)
        win = 1 if (first_state[0] > H and first_state[0] < 21) or (first_state[0] < 21 and H > 21) else 0
        even = 1 if (first_state[0] == H or (first_state[0] > 21 and H > 21)) else 0
        bj = 1 if first_state[0] == 21 else 0
        lose = 1 - win - even - bj
        if last_state == 'Win':
            return win
        elif last_state == 'Even':
            return even
        elif last_state == '21':
            return bj
        return lose

In [34]:
Hand_values = [i for i in range (1, 32)] # We can greed at 20 and hit obtaining a worst case scenario of 31
House_values = [i for i in range (1, 12)]
States = [(hand, house) for hand in Hand_values for house in House_values]
Terminal_states = ['Win', 'Lose', 'Even', '21']
States.extend(Terminal_states)

We will do value iteration.

In [36]:
import copy 
def convergence(dic1 : dict, dic2 : dict, eps = 1e-6):
    if list(dic1.keys()) != list(dic2.keys()):
        return False
    for k in list(dic1.keys()):
        if abs(dic1[k] - dic2[k]) > eps:
            return False
    return True

Values_0 = {s : -1 for s in States}
Values_1 = {s : 0 for s in States}
Policy = {s : '' for s in States}
gamma = 1
while not convergence(Values_0, Values_1):
    Values_0 = copy.deepcopy(Values_1)
    for state in States:
        v_max = 0
        for act in AvailableAction(state):
            v_loc = 0
            for s in ReachableState(state, act):
                v_loc += Transition(state, act, s)*(Reward(state, act, s)+gamma*Values_0[s])
            if v_loc > v_max:
                v_max = v_loc
                Policy[state] = act
        Values_1[state] = v_max

for s in States:
    if s not in Terminal_states:
        if s[0] <= 21:
            h = s[1]
            H = h + 5.8 * ceil((17-h)/5.8)
            print(f"Hand {s[0]} House {s[1]} E[House] {H}: {Policy[s]}")

Hand 1 House 1 E[House] 18.4: Hit
Hand 1 House 2 E[House] 19.4: Hit
Hand 1 House 3 E[House] 20.4: Hit
Hand 1 House 4 E[House] 21.4: Hit
Hand 1 House 5 E[House] 22.4: Hit
Hand 1 House 6 E[House] 17.6: Hit
Hand 1 House 7 E[House] 18.6: Hit
Hand 1 House 8 E[House] 19.6: Hit
Hand 1 House 9 E[House] 20.6: Hit
Hand 1 House 10 E[House] 21.6: Hit
Hand 1 House 11 E[House] 22.6: Hit
Hand 2 House 1 E[House] 18.4: Hit
Hand 2 House 2 E[House] 19.4: Hit
Hand 2 House 3 E[House] 20.4: Hit
Hand 2 House 4 E[House] 21.4: Hit
Hand 2 House 5 E[House] 22.4: Hit
Hand 2 House 6 E[House] 17.6: Hit
Hand 2 House 7 E[House] 18.6: Hit
Hand 2 House 8 E[House] 19.6: Hit
Hand 2 House 9 E[House] 20.6: Hit
Hand 2 House 10 E[House] 21.6: Hit
Hand 2 House 11 E[House] 22.6: Hit
Hand 3 House 1 E[House] 18.4: Hit
Hand 3 House 2 E[House] 19.4: Hit
Hand 3 House 3 E[House] 20.4: Hit
Hand 3 House 4 E[House] 21.4: Hit
Hand 3 House 5 E[House] 22.4: Hit
Hand 3 House 6 E[House] 17.6: Hit
Hand 3 House 7 E[House] 18.6: Hit
Hand 3 Hou

In [48]:
Policy[(20, 6)]

'Stare'