# Solving Acey Deucey using Reinforcement Learning
[Michael DiSanto](https://www.michaelpdisanto.com) - 2023

## Project Objective
The objective of this project is to design and implement a reinforcement learning-based optimal betting strategy for the card game "Acey-Deucey" (also known as "In Between"). Through the use of object-oriented programming in Python, we aim to create an intelligent agent that can learn and adapt its betting decisions during gameplay. This agent will consider factors such as the current state of the game, the player's hand, and the pot's size to make informed betting choices. By developing this AI player, I seek to optimize the player's betting decisions, ultimately increasing their chances of success and accumulating the most chips by the end of the game. This project will explore the intersection of game theory, machine learning, and strategic decision-making to achieve an effective and competitive Acey Deucey player.

## Acey Deucey Description
In-Between is not very popular at casinos, but is often played in home Poker games as a break from Poker itself. The rules below are for the home game, which is easily adaptable for casino play.

### Rank of Cards
A (high), K, Q, J, 10, 9, 8, 7, 6, 5, 4, 3, 2.

### Object of the Game
The goal is to be the player with the most chips at the end of the game.

### The Ante
Chips are distributed to the players, and each players puts one chip in the center of the table to form a pool or pot.

### The Draw
Any player deals one card face up, to each player in turn, and the player with the highest card deals first.

### The Shuffle, Cut, and Deal
Any player may shuffle, and the dealer shuffles last. The player to the dealer's right cuts the cards. The dealer turns up two cards and places them in the middle of the table, positioning them so that there is ample room for a third card to fit in between.

### The Betting
The player on the dealer's left may bet up to the entire pot or any portion of the number of chips in the pot, but they must always bet a minimum of one chip. When the player has placed a bet, the dealer turns up the top card from the pack and places it between the two cards already face up. If the card ranks between the two cards already face up, the player wins and takes back the amount of his bet plus an equivalent amount from the pot. If the third card is not between the face-up cards, or is of the same rank as either of them, the player loses his bet, and it is added to the pot. If the two face-up cards up are consecutive, the player automatically loses, and a third card need not be turned up. If the two face-up cards are the same, the player wins two chips and, again, no third card is turned up. (In some games, the player is paid three chips when this occurs.)

"Acey-Deucey" (ace, 2) is the best combination, and a player tends to bet the whole pot, if they can. This is because the only way an ace-deuce combination can lose is if the third card turned up is also an ace or a deuce.

After the first player has finished, the dealer clears away the cards and places them face down in a pile. The next player then places a bet, and the dealer repeats the same procedure until all the players, including the dealer, have had a turn.

If at any time, the pot has no more chips in it (because a player has "bet the pot" and won), each player again puts in one chip to restore the pot.

When every player has had a turn to bet, the deal passes to the player on the dealer's left, and the game continues.

https://bicyclecards.com/how-to-play/in-between/

## Import Dependencies

In [2]:
import numpy as np
import pandas as pd

## Acey Deucey Agent Class

In [9]:
class AceyDeuceyAgent:
    def __init__(self):
        self.q_table = {}

    def get_state_key(self, card1, card2):
        return tuple(sorted([card1, card2]))

    def get_q_value(self, state, action):
        return self.q_table.get((state, action), 0.0)

    def update_q_value(self, state, action, new_q_value):
        self.q_table[(state, action)] = new_q_value

    def choose_action(self, state, valid_actions):
        epsilon = 0.2  # increase exploration rate
        if np.random.rand() < epsilon:
            return np.random.choice(valid_actions)
        else:
            q_values = [self.get_q_value(state, action) for action in valid_actions]
            return valid_actions[np.argmax(q_values)]

## Player Function

In [10]:
def play_acey_deucey(agent):
    # Initialize the game
    deck = list(range(2, 15))  # Cards from 2 to Ace
    np.random.shuffle(deck)
    pot = 1
    player_chips = 10

    while player_chips > 0:
        # Deal two cards
        card1, card2 = deck.pop(), deck.pop()

        # Get the state representation
        state = agent.get_state_key(card1, card2)

        # Determine valid actions
        valid_actions = list(range(1, min(pot, player_chips) + 1))

        # Choose an action using the agent's strategy
        action = agent.choose_action(state, valid_actions)

        # Update the pot and player's chips based on the chosen action
        pot -= action
        player_chips += action if card1 < action < card2 else -action

        # Update the Q-value based on the result
        reward = 1 if card1 < action < card2 else -1
        new_state = agent.get_state_key(card1, card2)
        new_q_value = agent.get_q_value(state, action) + 0.1 * (reward + np.max([agent.get_q_value(new_state, a) for a in valid_actions]) - agent.get_q_value(state, action))
        agent.update_q_value(state, action, new_q_value)

        # Restore the pot if needed
        if pot == 0:
            pot = 1
            player_chips -= 1

    print("Game Over. Player's final chips:", player_chips)

## Table Export Function

In [11]:
def export_betting_table(agent):
    card_ranges = [(card1, card2) for card1 in range(2, 15) for card2 in range(2, 15)]
    betting_table = pd.DataFrame(index=range(1, 11), columns=pd.MultiIndex.from_tuples(card_ranges, names=['Card1', 'Card2']))

    for card_range in card_ranges:
        for bet_size in range(1, 11):
            state = agent.get_state_key(*card_range)
            q_value = agent.get_q_value(state, bet_size)
            betting_table.loc[bet_size, card_range] = q_value

    betting_table.to_csv('betting_table.csv')
    print("Betting table exported to betting_table.csv")

## Training Agent

In [12]:
# Create an AceyDeuceyAgent
acey_deucey_agent = AceyDeuceyAgent()

# Train the agent by playing multiple games
for _ in range(1000):
    play_acey_deucey(acey_deucey_agent)

# Export the betting table
export_betting_table(acey_deucey_agent)

Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's final chips: 0
Game Over. Player's 

## Results Analysis: Negative Q-Values in Acey-Deucey

### Explanation #1 - Insufficient Exploration:
   - Negative Q-values may be a result of insufficient exploration during the training phase.
   - The agent might not have explored all possible state-action pairs extensively, leading to a biased or incomplete understanding of the game dynamics.
   
### Explanation #2 - Suboptimal Strategy Exploration:
   - The negative Q-values indicate that the agent has explored and learned certain strategies that lead to unfavorable outcomes in specific game states.
   - It suggests that the agent has encountered situations where the selected actions resulted in net losses.
   - Since this is a casino game, it is reasonable to conclude that playing this game is never favorable.
   - Each time you play, you are expected to lose money. So, it is in one's best interest to not play.

## Updated Objective: Probability of Winning Analysis in Acey-Deucey

Since it was not possible finding an optimal betting strategy for Acey-Deucey in the previous code, the revised objective is to analyze the probability of winning for each range of cards in the card game Acey-Deucey. This will give players a guide to make informed decisions (note: I do not recommend gambling - especially in this game - as both theory and the reinforcement learning show that it is a losing proposition).


## Finding the Acey-Deucey Range Probabilities

In [None]:
import pandas as pd

In [1]:
cards = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] # jack = 11 | queen = 12 | king = 13 | ace = 14
deck = [num for num in cards for _ in range(4)]

In [2]:
len(deck)

52

In [3]:
card_combinations = {}

for i, card1 in enumerate(deck):
    for card2 in deck[i:]:
        combination = (card1, card2)
        card_range = card2 - card1 - 1
        if card_range < 0:
            card_range = 0
        cards_in_range = card_range * 4
        remaining_cards = 52 - 2
        probability = f"{round((cards_in_range / remaining_cards) * 100, 1)}%"
        
        card_combinations[combination] = probability

print(card_combinations)

{(2, 2): '0.0%', (2, 3): '0.0%', (2, 4): '8.0%', (2, 5): '16.0%', (2, 6): '24.0%', (2, 7): '32.0%', (2, 8): '40.0%', (2, 9): '48.0%', (2, 10): '56.0%', (2, 11): '64.0%', (2, 12): '72.0%', (2, 13): '80.0%', (2, 14): '88.0%', (3, 3): '0.0%', (3, 4): '0.0%', (3, 5): '8.0%', (3, 6): '16.0%', (3, 7): '24.0%', (3, 8): '32.0%', (3, 9): '40.0%', (3, 10): '48.0%', (3, 11): '56.0%', (3, 12): '64.0%', (3, 13): '72.0%', (3, 14): '80.0%', (4, 4): '0.0%', (4, 5): '0.0%', (4, 6): '8.0%', (4, 7): '16.0%', (4, 8): '24.0%', (4, 9): '32.0%', (4, 10): '40.0%', (4, 11): '48.0%', (4, 12): '56.0%', (4, 13): '64.0%', (4, 14): '72.0%', (5, 5): '0.0%', (5, 6): '0.0%', (5, 7): '8.0%', (5, 8): '16.0%', (5, 9): '24.0%', (5, 10): '32.0%', (5, 11): '40.0%', (5, 12): '48.0%', (5, 13): '56.0%', (5, 14): '64.0%', (6, 6): '0.0%', (6, 7): '0.0%', (6, 8): '8.0%', (6, 9): '16.0%', (6, 10): '24.0%', (6, 11): '32.0%', (6, 12): '40.0%', (6, 13): '48.0%', (6, 14): '56.0%', (7, 7): '0.0%', (7, 8): '0.0%', (7, 9): '8.0%', (7, 10

Based off of this analysis, assuming a full deck of cards, your probability of winning an acey duecey hand based on the card range is as follows:

In [13]:
data = {'card_range': [], 'probability': []}
for card_combo, prob in card_combinations.items():
    card_range = card_combo[1] - card_combo[0]
    if card_range in data['card_range']:
        break
    data['card_range'].append(card_range)
    data['probability'].append(prob)
card_df = pd.DataFrame(data=data)
card_df

Unnamed: 0,card_range,probability
0,0,0.0%
1,1,0.0%
2,2,8.0%
3,3,16.0%
4,4,24.0%
5,5,32.0%
6,6,40.0%
7,7,48.0%
8,8,56.0%
9,9,64.0%


Player's expected value of chips won assuming player has 20 chips:

In [19]:
card_df['flt_prob'] = card_df['probability'].str.rstrip('%').astype('float') / 100
card_df['expected_return'] = card_df['flt_prob'] * 20
card_df = card_df.drop(columns=['flt_prob'])
card_df

Unnamed: 0,card_range,probability,expected_return
0,0,0.0%,0.0
1,1,0.0%,0.0
2,2,8.0%,1.6
3,3,16.0%,3.2
4,4,24.0%,4.8
5,5,32.0%,6.4
6,6,40.0%,8.0
7,7,48.0%,9.6
8,8,56.0%,11.2
9,9,64.0%,12.8


Even through probability analysis, we see that the Acey Duecey game is a losing proposition. No matter what, even when you have the best hand in the game, your best chance at winning is 88%. However, if you are to take a gamble and play the game, the probability analysis says that you have a better chance of winning a certain hand if your range of cards is at least 8. For example, you should hit when your hand is \[2, 10\] (range=8), but you should pass when your hand is \[2, 9\] (range=7).