# CS6330 Project 1: Reinforcement Learning Blackjack Agent

In [None]:
import blackjack
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

The goal of this project is to build an intelligent Blackjack player using reinforcement (q-) learning.

## Control experiment

For a control experiment, we'll build a Blackjack player who, per basic Blackjack strategy, hits up to 17, then stands. The dealer will also play this way. This is a _simplified_ version of Blackjack, where we'll treat the deck as a "continuous shuffle" shoe, meaning that each card dealt is a random choice from 52 cards, meaning that the same card twice in a row is possible, but very improbable (though it would be an interesting future experiment to observe how the policy changes should the game use 1-, 2-, 4-deck shoes (and so on) with shuffling taking place when the deck is exhausted). We'll simulate 1,000 hands and see how the player performs.

In [None]:
games = list()

for x in range(0, 1000):
    games.append(blackjack.Game().play_hand())

In [None]:
df = pd.DataFrame(games)
df.head()

In [None]:
sns.displot(df['winner'])
plt.title('Distribution of hand winners for 1000 hands; first pass')
plt.show()

In [None]:
dealer_win = df['winner'].value_counts()['dealer'] / 1000. * 100
player_win = df['winner'].value_counts()['player'] / 1000. * 100
print(f"Player win percentage: {(player_win):.2f}%")
print(f"Dealer win percentage: {(dealer_win):.2f}%")
print(f"House edge: {(dealer_win - 50):.2f}%")

### Measuring the house edge

As a casino game, the rules of Blackjack set up to give the dealer (the _house_) an advantage (an _edge_). Meaning that, after enough games, we'll observe the house averaging out to win a certain percentage over 50%, but never under (hence why gambling is always a bad decision). In a real game of Blackjack, i.e with good player strategy, splits, double-downs, 3:2 payouts on natural Blackjacks, etc, this can be well under 5%, but for this first pass, we got 8.6%. Let's measure the house edge in the following experiments to see if a player whose moves are governed by a Q-learning policy can learn to play better, quantitatively measured by a reduction in the house edge.

## Q-learning experiment

TODO

In [None]:
trainer = blackjack.QLearningTrainer()
trainer.optimize_q_table()
trainer.q_table

In [None]:
trainer.compile_policy_from_trained_q_table()