# Solving Acey Deucey using Reinforcement Learning
[Michael DiSanto](https://www.michaelpdisanto.com) - 2023

## Project Objective
The objective of this project is to design and implement a reinforcement learning-based optimal betting strategy for the card game "Acey-Deucey" (also known as "In Between"). Through the use of object-oriented programming in Python, we aim to create an intelligent agent that can learn and adapt its betting decisions during gameplay. This agent will consider factors such as the current state of the game, the player's hand, and the pot's size to make informed betting choices. By developing this AI player, I seek to optimize the player's betting decisions, ultimately increasing their chances of success and accumulating the most chips by the end of the game. This project will explore the intersection of game theory, machine learning, and strategic decision-making to achieve an effective and competitive Acey Deucey player.

## Acey Deucey Description
In-Between is not very popular at casinos, but is often played in home Poker games as a break from Poker itself. The rules below are for the home game, which is easily adaptable for casino play.

### Rank of Cards
A (high), K, Q, J, 10, 9, 8, 7, 6, 5, 4, 3, 2.

### Object of the Game
The goal is to be the player with the most chips at the end of the game.

### The Ante
Chips are distributed to the players, and each players puts one chip in the center of the table to form a pool or pot.

### The Draw
Any player deals one card face up, to each player in turn, and the player with the highest card deals first.

### The Shuffle, Cut, and Deal
Any player may shuffle, and the dealer shuffles last. The player to the dealer's right cuts the cards. The dealer turns up two cards and places them in the middle of the table, positioning them so that there is ample room for a third card to fit in between.

### The Betting
The player on the dealer's left may bet up to the entire pot or any portion of the number of chips in the pot, but they must always bet a minimum of one chip. When the player has placed a bet, the dealer turns up the top card from the pack and places it between the two cards already face up. If the card ranks between the two cards already face up, the player wins and takes back the amount of his bet plus an equivalent amount from the pot. If the third card is not between the face-up cards, or is of the same rank as either of them, the player loses his bet, and it is added to the pot. If the two face-up cards up are consecutive, the player automatically loses, and a third card need not be turned up. If the two face-up cards are the same, the player wins two chips and, again, no third card is turned up. (In some games, the player is paid three chips when this occurs.)

"Acey-Deucey" (ace, 2) is the best combination, and a player tends to bet the whole pot, if they can. This is because the only way an ace-deuce combination can lose is if the third card turned up is also an ace or a deuce.

After the first player has finished, the dealer clears away the cards and places them face down in a pile. The next player then places a bet, and the dealer repeats the same procedure until all the players, including the dealer, have had a turn.

If at any time, the pot has no more chips in it (because a player has "bet the pot" and won), each player again puts in one chip to restore the pot.

When every player has had a turn to bet, the deal passes to the player on the dealer's left, and the game continues.

https://bicyclecards.com/how-to-play/in-between/

## Import Dependencies

In [1]:
import random
import numpy as np

## Player Class

In [6]:
class Player:
    def __init__(self):
        self.hand = [random.randint(2, 14) for _ in range(2)]
    
    def draw_card(self):
        return random.randint(2, 14)

    def make_bet(self, current_pot):
        # Implement your betting strategy here
        # You can use RL model to decide the bet amount
        return random.randint(1, current_pot)  # A simple random betting strategy

## Acey Deucey Game Class

In [7]:
class Game:
    def __init__(self, num_players):
        self.num_players = num_players
        self.players = [Player() for _ in range(num_players)]
        self.pot = 0

    def deal_initial_cards(self):
        # Simulate dealing one card to each player to determine the dealer
        dealer = max(self.players, key=lambda p: p.draw_card())
        self.players.remove(dealer)
        self.players.insert(0, dealer)

    def play_round(self):
        for player in self.players:
            print(f"Player {self.players.index(player) + 1}'s turn")
            bet = player.make_bet(self.pot)
            self.pot += bet
            player.hand.append(player.draw_card())
            print(f"Player {self.players.index(player) + 1} drew a card: {player.hand[-1]}")

## Q Learning

In [22]:
def q_learning(num_episodes, learning_rate, discount_factor, epsilon):
    # Q-learning algorithm
    Q = np.zeros((15, 2))  # Q-table for state-action pairs
    for episode in range(num_episodes):
        game = Game(num_players=2)  # Two players
        game.deal_initial_cards()
        for _ in range(2):  # Two rounds per episode
            for state in range(2, 15):
                for action in range(2):
                    if random.uniform(0, 1) < epsilon:
                        next_action = random.randint(0, 1)
                    else:
                        next_action = np.argmax(Q[state, :])
                    next_state = min(state + (2 * random.randint(2, 14)), 14)  # Simulate drawing a card
                    reward = 0  # Implement your reward function here
                    Q[state, action] = (1 - learning_rate) * Q[state, action] + learning_rate * (reward + discount_factor * Q[next_state, next_action])
        epsilon *= 0.99  # Decay epsilon
    return Q

## Simulate Game (1000 Iterations)

In [23]:
num_episodes = 10000
learning_rate = 0.1
discount_factor = 0.9
epsilon = 0.1
Q = q_learning(num_episodes, learning_rate, discount_factor, epsilon)
print("Q-table:")
print(Q)

Q-table:
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


This technique is not working...