# **RL PROJECT** - **DICE GAME**

In [35]:
import numpy as np
import random
import matplotlib.pyplot as plt

## 1. The Game

In this project, we consider the following “Dice Game”. The objective of the game is to make money by
outscoring the dealer or by rolling doubles. Each round, the player starts with a score of zero. Since this
is an episodic task, discounting is not necessary.

Each turn, the player has to choose between rolling their dice, or betting money on the dealer’s roll. If
they decide to roll themselves, they can either roll one or two dice (simultaneously), which costs them
CHF 1 for one dice, and CHF 2 for two dice. The number(s) shown by the dice are added to the player’s
score. If the player rolls a double (i.e., two dice showing the same numbers), they get an immediate
bonus payout of CHF 10, independent of the rest of the game or their score. If the player reaches a
score of 31 or more, they lose the round and have to pay CHF 10.

If the player chooses to bet on the dealer’s roll, they have to specify a bet-multiplicator of 1, 2, or 3. The
dealer then rolls their dice and the player is paid/has to pay according to the formula
(“Player Score” −“Dealer Dice Result”) ·“Bet-Multiplicator”,
and the round is over

The player has two identical dice, showing the numbers {1,2,3,4,5,6}. The dealer has one dice,
showing the numbers {25,26,27,28,29,30}. All three dice are weighted, such that their highest
number has twice the probability of each of the smaller numbers.

### FlowChart

### Dices

#### Player
---

The Player Dices have the following setup:

    P(D = 1) = p
    P(D = 2) = p
    P(D = 3) = p
    P(D = 4) = p
    P(D = 5) = p
    P(D = 6) = 2p

Therefore, since the Universe is equal to 1 by definition, p should sum to 1 as well:

    p + p + p + p + p + 2p = 1 

    6p + 2p = 1

    8p = 1

    p = 1/8

The Player Dices have the following properties:

    P(D = 1) = 1/8
    P(D = 2) = 1/8
    P(D = 3) = 1/8
    P(D = 4) = 1/8
    P(D = 5) = 1/8
    P(D = 6) = 1/4

#### Dealer
---

The Dealer Dice have the following setup:

    P(D = 25) = p
    P(D = 26) = p
    P(D = 27) = p
    P(D = 28) = p
    P(D = 29) = p
    P(D = 30) = 2p

Therefore, since the Universe is equal to 1 by definition, p should sum to 1 as well:

    p + p + p + p + p + 2p = 1 

    6p + 2p = 1

    8p = 1

    p = 1/8

The Dealer Dice  have the following properties:

    P(D = 1) = 1/8
    P(D = 2) = 1/8
    P(D = 3) = 1/8
    P(D = 4) = 1/8
    P(D = 5) = 1/8
    P(D = 6) = 1/4

---

## 2. The Task

Throughout, the state space should have (at most) one terminal state

1. Considering the game as a Markov decision process, identify the state space S, the action
space A, and the reward set R.

---
ANSWER:

- State Space (S) is defined by the Player Score after a **TURN**, and thus can range from 0 to 31


$$S = \{0, 1, ..., 31\} $$

- The Action Space (A) is defined by the set of possible Actions the player can choose at each **TURN**, and thus can be the following:
   
        1. Roll 1 dice or 2 dices simulteanously
        2. Bet on the Dealer Roll (Specify 1,2,3 in the Bet-Multiplicator)

- The Reward Set (R) can be seen as the reward for each actions, therefore the player can:

    For Rolling the Dice:
        
        1. Gain 8CHF if 2 Dices roll have the same value (10CHF - the initial cost of 2 dices (2CHF))
        2. Lose 1CHF if 1 Dice roll (not payout, except the initial cost of 1 dice (1CHF))
        2. Lose -10CHF if the Player Score reach 31 and above
    
    For Betting on Dealer Dice:
    
        1. (Player Score - Dealer Dice Result) * Bet-Multiplicator CHF, 
        can be gain or loss depending on the Dealer Dice value

---

2. Implement a Python class that represents the game as a reinforcement learning task. The class
should contain all the information about the game state, and should provide a “step” method that
takes an action as input and returns the reward and next state, as well as a “reset” method that
resets the game to its initial state

### Advanced Dice Game Class

In [72]:
class DiceGame:
    
    def __init__(self):
        self.player_rounds = 1
        self.player_payout_rounds = 0
        self.player_payout = 0
        self.player_score = 0
        self.player_dice = range(1, 7)
        self.dealer_dice = range(25, 31)
        self.dice_weights = [1, 1, 1, 1, 1, 2]

    def roll_dice(self,number_of_dice):
        dice_values = []
        for _ in range(number_of_dice):
            dice_values.append(random.choices(self.player_dice, weights=self.dice_weights, k=1)[0])
        return dice_values  
    
    def results(self):
        print(f"""
----------------------------------------- 
              
PLAYER STATUS:
              
    CURRENT ROUND

        Your curent round score is {self.player_score} and with a payout of {self.player_payout}

              
    TOTAL GAME

        Your current round is number {self.player_rounds} with your total cumulated payout of {self.player_payout_rounds}

----------------------------------------- 

              """)
    
    def play_round(self, choice, bet_multiplier = 1, result = True):

        """
choice: (1) For 1 Dice Roll, (2) For 2 Dices roll, (3) For Dealer Bet (Bet-Multiplier = 1 by default)

bet_multiplier: either 1, 2 or 3 (1 by default)

result: Show a summary of the turn and round state of the player

        """

        if choice == 1:
            dice_values = self.roll_dice(1)
            self.player_score += dice_values[0]
            self.player_payout += -1
            print(f"""
                  
TURN CHOICE: 1 Dice Roll 
-----------------------------------------        
You rolled 1 Dice (cost of 1 CHF), and it gave you {dice_values[0]} !
                      """)

        if choice == 2:
            dice_values = self.roll_dice(2)
            dice_total = sum(dice_values)
            self.player_score += sum(dice_values)
            self.player_payout += -2
            print(f"""
                  
TURN CHOICE: 2 Dices Roll 
-----------------------------------------      
                  
You rolled 2 Dice (cost of 2 CHF), and it gave you {dice_values[0]} and {dice_values[1]} ! 
                      """)
            if dice_values[0] == dice_values[1]:
                self.player_payout += 10
                print(f"""
Congratulations! You rolled doubles and received a bonus payout of 10 CHF.
                      """)

        if choice == 3:
            dealer_result = random.choices(self.dealer_dice, weights=self.dice_weights, k=1)[0]
            dealer_payout = 0
            dealer_payout = (self.player_score - dealer_result) * bet_multiplier
            self.player_payout += dealer_payout
            self.player_payout_rounds += self.player_payout
            print(f"""    
                    
TURN CHOICE: You bet on Dealer Roll, with a Bet-Multiplicator of {bet_multiplier}
                  
The Dealer rolled {dealer_result}, the formula is then ({self.player_score} - {dealer_result}) X {bet_multiplier} = {dealer_payout} CHF 

Let's add it to this round payout ! 

-----------------------------------------

ROUND FINISHED
                      """)
            self.player_score = 0
            self.player_payout = 0
            self.player_rounds += 1
        
        if self.player_score >= 31:
            self.player_payout_rounds += self.player_payout - 10
            self.player_payout = 0
            self.player_score = 0
            self.player_rounds += 1
            print(f"""
Oops! You went over 31. You lose 10 CHF !
                  
-----------------------------------------

ROUND FINISHED
                  """)

        if result == True:
            return self.results()
        
    def reset(self):
        self.player_rounds = 1
        self.player_payout_rounds = 0
        self.player_payout = 0
        self.player_score = 0
        print("You have reset the Dice Game !")
                   


In [73]:
game = DiceGame()

In [74]:
game.play_round(2,1,True)


                  
TURN CHOICE: 2 Dices Roll 
-----------------------------------------      
                  
You rolled 2 Dice (cost of 2 CHF), and it gave you 6 and 1 ! 
                      

----------------------------------------- 
              
PLAYER STATUS:
              
    CURRENT ROUND

        Your curent round score is 7 and with a payout of -2

              
    TOTAL GAME

        Your current round is number 1 with your total cumulated payout of 0

----------------------------------------- 

              


### Simplified Dice Game Class

3. Using dynamic programming, compute the value functions under the following policies. Explain the results and represent them graphically.

    “R1”: The player always rolls a single dice

    “R2”: The player always rolls both dice.

    “RR”: If the player’s score is strictly smaller than 20, they roll either one or two dice with equal
    probability. Otherwise, they choose one of the three bet-multiplicators uniformly at random

4. Find the optimal policy using dynamic programming. Represent the action-value function under
the optimal policy graphically. Explain the results and compare them to those of the previous task.

BONUS

Use the class you implemented in the first task for the following Monte Carlo simulation: estimate
the value of the initial state under each of the policies from the previous tasks (“R1”, “R2”, “RR”,
“Optimal”). Illustrate the results and compare them to the results of the previous tasks.