# Part V - Learning
## Project 5b - nim

[Course Link](https://cs50.harvard.edu/ai/)

[Project Instructions](https://cs50.harvard.edu/ai/projects/4/nim/)

## Instructions

See project instruction link for more detailed instructions.  

Recall that in the game Nim, we begin with some number of piles, each with some number of objects. Players take turns: on a player’s turn, the player removes any non-negative number of objects from any one non-empty pile. Whoever removes the last object loses.

There’s some simple strategy you might imagine for this game: if there’s only one pile and three objects left in it, and it’s your turn, your best bet is to remove two of those objects, leaving your opponent with the third and final object to remove. But if there are more piles, the strategy gets considerably more complicated. In this problem, we’ll build an AI to learn the strategy for this game through reinforcement learning. By playing against itself repeatedly and learning from experience, eventually our AI will learn which actions to take and which actions to avoid.

In particular, we’ll use Q-learning for this project. Recall that in Q-learning, we try to learn a reward value (a number) for every (state, action) pair. An action that loses the game will have a reward of -1, an action that results in the other player losing the game will have a reward of 1, and an action that results in the game continuing has an immediate reward of 0, but will also have some future reward.

How will we represent the states and actions inside of a Python program? A “state” of the Nim game is just the current size of all of the piles. A state, for example, might be [1, 1, 3, 5], representing the state with 1 object in pile 0, 1 object in pile 1, 3 objects in pile 2, and 5 objects in pile 3. An “action” in the Nim game will be a pair of integers (i, j), representing the action of taking j objects from pile i. So the action (3, 5) represents the action “from pile 3, take away 5 objects.” Applying that action to the state [1, 1, 3, 5] would result in the new state [1, 1, 3, 0] (the same state, but with pile 3 now empty).

Recall that the key formula for Q-learning is below. Every time we are in a state s and take an action a, we can update the Q-value Q(s, a) according to:

Q(s, a) <- Q(s, a) + alpha * (new value estimate - old value estimate)

In the above formula, alpha is the learning rate (how much we value new information compared to information we already have). The new value estimate represents the sum of the reward received for the current action and the estimate of all the future rewards that the player will receive. The old value estimate is just the existing value for Q(s, a). By applying this formula every time our AI takes a new action, over time our AI will start to learn which actions are better in any state.



In [45]:
import math
import random
import time


class Nim():

    def __init__(self, initial=[1, 3, 5, 7]):
        """
        Initialize game board.
        Each game board has
            - `piles`: a list of how many elements remain in each pile
            - `player`: 0 or 1 to indicate which player's turn
            - `winner`: None, 0, or 1 to indicate who the winner is
        """
        self.piles = initial.copy()
        self.player = 0
        self.winner = None

    @classmethod
    def available_actions(cls, piles):
        """
        Nim.available_actions(piles) takes a `piles` list as input
        and returns all of the available actions `(i, j)` in that state.

        Action `(i, j)` represents the action of removing `j` items
        from pile `i` (where piles are 0-indexed).
        """
        actions = set()
        for i, pile in enumerate(piles):
            for j in range(1, piles[i] + 1):
                actions.add((i, j))
        return actions

    @classmethod
    def other_player(cls, player):
        """
        Nim.other_player(player) returns the player that is not
        `player`. Assumes `player` is either 0 or 1.
        """
        return 0 if player == 1 else 1

    def switch_player(self):
        """
        Switch the current player to the other player.
        """
        self.player = Nim.other_player(self.player)

    def move(self, action):
        """
        Make the move `action` for the current player.
        `action` must be a tuple `(i, j)`.
        """
        pile, count = action

        # Check for errors
        if self.winner is not None:
            raise Exception("Game already won")
        elif pile < 0 or pile >= len(self.piles):
            raise Exception("Invalid pile")
        elif count < 1 or count > self.piles[pile]:
            raise Exception("Invalid number of objects")

        # Update pile
        self.piles[pile] -= count
        self.switch_player()

        # Check for a winner
        if all(pile == 0 for pile in self.piles):
            self.winner = self.player


class NimAI():

    def __init__(self, alpha=0.5, epsilon=0.1):
        """
        Initialize AI with an empty Q-learning dictionary,
        an alpha (learning) rate, and an epsilon rate.

        The Q-learning dictionary maps `(state, action)`
        pairs to a Q-value (a number).
         - `state` is a tuple of remaining piles, e.g. (1, 1, 4, 4)
         - `action` is a tuple `(i, j)` for an action
        """
        self.q = dict()
        self.alpha = alpha
        self.epsilon = epsilon
        
    def update(self, old_state, action, new_state, reward):
        """
        Update Q-learning model, given an old state, an action taken
        in that state, a new resulting state, and the reward received
        from taking that action.
        """
        old = self.get_q_value(old_state, action)
        best_future = self.best_future_reward(new_state)
        self.update_q_value(old_state, action, old, reward, best_future)

    def get_q_value(self, state, action):
        """
        Return the Q-value for the state `state` and the action `action`.
        If no Q-value exists yet in `self.q`, return 0.
        """
        if (tuple(state), action) in self.q:
            return self.q[(tuple(state), action)]
        return 0 
            
        
    def update_q_value(self, state, action, old_q, reward, future_rewards):
        """
        Update the Q-value for the state `state` and the action `action`
        given the previous Q-value `old_q`, a current reward `reward`,
        and an estimate of future rewards `future_rewards`.

        Use the formula:

        Q(s, a) <- old value estimate
                   + alpha * (new value estimate - old value estimate)

        where `old value estimate` is the previous Q-value,
        `alpha` is the learning rate, and `new value estimate`
        is the sum of the current reward and estimated future rewards.
        """
        new_q = old_q + self.alpha * ((reward + future_rewards) - old_q)
        self.q[(tuple(state), action)] = new_q
    
    
    def best_future_reward(self, state):
        """
        Get the best possible action in current state
        
        Given a state `state`, consider all possible `(state, action)`
        pairs available in that state and return the maximum of all
        of their Q-values.

        Use 0 as the Q-value if a `(state, action)` pair has no
        Q-value in `self.q`. If there are no available actions in
        `state`, return 0.
        """
        all_actions = Nim.available_actions(state)
        
        if all_actions:
            max_qval = -math.inf
            for action in all_actions:
                max_qval = max(max_qval, self.get_q_value(state, action))        
            return max_qval
        return 0 
            

    def choose_action(self, state, epsilon=True):
        """
        Given a state `state`, return an action `(i, j)` to take.

        If `epsilon` is `False`, then return the best action
        available in the state (the one with the highest Q-value,
        using 0 for pairs that have no Q-values).

        If `epsilon` is `True`, then with probability
        `self.epsilon` choose a random available action,
        otherwise choose the best action available.

        If multiple actions have the same Q-value, any of those
        options is an acceptable return value.
        """       
        all_actions = Nim.available_actions(state)                            
                                                                 
        max_qval = -math.inf
        max_pile = 0
        max_count = 0
            
        for action in all_actions: 
            current_qval = self.get_q_value(state, action)             
            if current_qval > max_qval:
                max_qval = current_qval
                max_pile = action[0]
                max_count = action[1]
        
        # AI should ideally choose a q_value > 0. If it can't, go ahead
        # and make a random move so that it can make mistakes and learn from them.
        if max_qval > 0:
            print(f'AI Q-Learning Choice: {state, (max_pile, max_count), max_qval}')
            return max_pile, max_count
             
        print('AI Made Random Choice')
        return random.choice(list(all_actions))

def train(n):
    """
    Train an AI by playing `n` games against itself.
    """

    player = NimAI()

    # Play n games
    for i in range(n):
        print(f"Playing training game {i + 1}")
        game = Nim()

        # Keep track of last move made by either player
        last = {
            0: {"state": None, "action": None},
            1: {"state": None, "action": None}
        }

        # Game loop
        while True:

            # Keep track of current state and action
            state = game.piles.copy()
            action = player.choose_action(game.piles)

            # Keep track of last state and action
            last[game.player]["state"] = state
            last[game.player]["action"] = action

            # Make move
            game.move(action)
            new_state = game.piles.copy()

            # When game is over, update Q values with rewards
            if game.winner is not None:
                player.update(state, action, new_state, -1)
                player.update(
                    last[game.player]["state"],
                    last[game.player]["action"],
                    new_state,
                    1
                )
                break

            # If game is continuing, no rewards yet
            elif last[game.player]["state"] is not None:
                player.update(
                    last[game.player]["state"],
                    last[game.player]["action"],
                    new_state,
                    0
                )

    print("Done training")

    # Return the trained AI
    print(player.q)
    return player


def play(ai, human_player=None):
    """
    Play human game against the AI.
    `human_player` can be set to 0 or 1 to specify whether
    human player moves first or second.
    """

    # If no player order set, choose human's order randomly
    if human_player is None:
        human_player = random.randint(0, 1)

    # Create new game
    game = Nim()

    # Game loop
    while True:

        # Print contents of piles
        print()
        print("Piles:")
        for i, pile in enumerate(game.piles):
            print(f"Pile {i}: {pile}")
        print()

        # Compute available actions
        available_actions = Nim.available_actions(game.piles)
        time.sleep(1)

        # Let human make a move
        if game.player == human_player:
            print("Your Turn")
            while True:
                pile = int(input("Choose Pile: "))
                count = int(input("Choose Count: "))
                if (pile, count) in available_actions:
                    break
                print("Invalid move, try again.")

        # Have AI make a move
        else:
            print("AI's Turn")
            pile, count = ai.choose_action(game.piles, epsilon=False)
            print(f"AI chose to take {count} from pile {pile}.")

        # Make move
        game.move((pile, count))

        # Check for winner
        if game.winner is not None:
            print()
            print("GAME OVER")
                    
            winner = "Human" if game.winner == human_player else "AI"
            print(f"Winner is {winner}")
            return

In [46]:
ai = train(10_000)

Playing training game 1
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
Playing training game 2
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
Playing training game 3
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
Playing training game 4
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
Playing training game 5
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
Playing training game 6
AI Made Random Choice
AI Made Random Choice
AI Made Random Choice
AI Made Random Choic

AI Q-Learning Choice: ([1, 3, 1, 1], (0, 1), 0.0106201171875)
AI Q-Learning Choice: ([0, 3, 1, 1], (1, 1), 0.48095703125)
AI Q-Learning Choice: ([0, 2, 1, 1], (1, 1), 0.498046875)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 0.99609375)
AI Made Random Choice
Playing training game 218
AI Q-Learning Choice: ([1, 3, 5, 7], (2, 2), 0.7726100167360284)
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.03863525390625)
AI Q-Learning Choice: ([1, 3, 3, 1], (2, 2), 0.68304443359375)
AI Q-Learning Choice: ([1, 3, 1, 1], (0, 1), 0.25433349609375)
AI Q-Learning Choice: ([0, 3, 1, 1], (1, 1), 0.240478515625)
AI Q-Learning Choice: ([0, 2, 1, 1], (1, 1), 0.7470703125)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 0.998046875)
AI Made Random Choice
Playing training game 219
AI Q-Learning Choice: ([1, 3, 5, 7], (2, 2), 0.7278272251648892)
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.146484375)
AI Q-Learning Choice: ([1, 3, 3, 1], (2, 2), 0.461761474609375)
AI Q

AI Q-Learning Choice: ([1, 3, 4, 2], (1, 2), 0.0037512372029370683)
AI Q-Learning Choice: ([1, 1, 4, 2], (2, 2), 0.7507271612530471)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 2], (0, 1), 0.8123016357421875)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 0.9998779296875)
AI Made Random Choice
Playing training game 382
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.054973849753248635)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.5331534974642791)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 0.01485176367002787)
AI Q-Learning Choice: ([1, 3, 4, 3], (3, 1), 0.6584298359768057)
AI Q-Learning Choice: ([1, 3, 4, 2], (1, 2), 0.0018756186014685342)
AI Q-Learning Choice: ([1, 1, 4, 2], (2, 2), 0.7815143984976173)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 2], (3, 2), 0.9374999999376996)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 0.9999999999708962)
AI Made Random Choice
Playing training game 383
AI Q-Learning Choice: ([1, 3, 

AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 3], (3, 1), 0.8593730926513672)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 2], (3, 1), 0.999969482421875)
AI Made Random Choice
Playing training game 476
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0010048528233863906)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.8569047762195842)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 0.00013411034184608318)
AI Q-Learning Choice: ([1, 3, 4, 3], (2, 2), 0.9134799257636813)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 2, 3], (0, 1), 0.9027271878293845)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 3], (3, 2), 0.7499999850979293)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 0.999999999998181)
AI Made Random Choice
Playing training game 477
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0005694815826162369)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.8851923509916327)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 6.705517092304159e-05)
AI 

AI Q-Learning Choice: ([1, 1, 2, 3], (0, 1), 0.9130597229193285)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 0.9999980926513672)
AI Made Random Choice
Playing training game 588
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 3.584299957928767e-10)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.9507303586492566)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 1.0145758730370102e-11)
AI Q-Learning Choice: ([1, 3, 4, 3], (1, 2), 0.9279044041208528)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 4, 0], (2, 3), 0.9374998807870725)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 0.9999999999999991)
AI Made Random Choice
Playing training game 589
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 1.842878772616234e-10)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.9393173813850547)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 5.072879365185051e-12)
AI Q-Learning Choice: ([1, 3, 4, 3], (1, 2), 0.9327021424539627)
AI Made Random Choice
AI Q-Learning Choice: (

AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.9758332737869772)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 6.476960883671706e-17)
AI Q-Learning Choice: ([1, 3, 4, 3], (1, 2), 0.9944158731451229)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 4, 1], (2, 4), 0.9999990463256576)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 697
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 1.6207465053543812e-15)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.9851245734660501)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 3.238480441835853e-17)
AI Q-Learning Choice: ([1, 3, 4, 3], (1, 2), 0.9972074597353903)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 3], (3, 3), 0.9921874999999981)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 698
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 8.265656548863698e-16)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.991166

AI Made Random Choice
Playing training game 820
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.5846470950755508)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.13386085897836195)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 0.22245272680486125)
AI Q-Learning Choice: ([1, 3, 4, 3], (1, 1), 0.5007922045347046)
AI Made Random Choice
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 3], (3, 3), 0.9999995231628418)
AI Made Random Choice
Playing training game 821
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.40354991094020604)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.31732653175653325)
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 1), 0.11122636340243063)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 4, 2], (2, 4), 0.7481956463307142)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 0.9990234375)
AI Made Random Choice
Playing training game 822
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.25738813717131837)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.158663

AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 944
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.13153639411860768)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.1488406862069176)
AI Q-Learning Choice: ([1, 3, 5, 3], (1, 2), 0.6259152851998677)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 3], (3, 2), 0.9999999962746968)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 945
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.3787258396592377)
AI Q-Learning Choice: ([1, 3, 5, 5], (3, 2), 0.0744203431034588)
AI Q-Learning Choice: ([1, 3, 5, 3], (1, 2), 0.8129576407372823)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 1], (2, 5), 0.5)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 946
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.59584174019826)
AI Q-Learning Choice: ([1, 

AI Q-Learning Choice: ([1, 0, 2, 1], (2, 1), 0.9999351473521756)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 1053
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.00029627238266668023)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.8279587777485052)
AI Q-Learning Choice: ([1, 2, 5, 5], (1, 2), 1.9067541614927602e-05)
AI Q-Learning Choice: ([1, 0, 5, 5], (3, 1), 0.7310664504110773)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 3], (2, 3), 0.9373053228472372)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 1], (2, 1), 0.9999675736760878)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 1054
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.00015766996214080392)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.7795126140797912)
AI Q-Learning Choice: ([1, 2, 5, 5], (1, 2), 9.533770807463801e-06)
AI Q-Learning Choice: ([1, 0, 5, 5], (3, 1), 0.83

AI Q-Learning Choice: ([1, 2, 5, 5], (1, 2), 0.9998478548545524)
AI Q-Learning Choice: ([1, 0, 5, 5], (3, 1), 4.965647477366718e-27)
AI Q-Learning Choice: ([1, 0, 5, 4], (2, 1), 0.9996777377590724)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 4], (3, 3), 0.9999923706054688)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 1153
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.99985354450308)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 2.200956261307612e-25)
AI Q-Learning Choice: ([1, 2, 5, 5], (1, 2), 0.9997627963068124)
AI Q-Learning Choice: ([1, 0, 5, 5], (3, 1), 2.482823738683359e-27)
AI Q-Learning Choice: ([1, 0, 5, 4], (2, 1), 0.9998350541822706)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 2], (2, 1), 0.9997467652510834)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 2], (3, 1), 0.9999999031314796)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choi

AI Q-Learning Choice: ([0, 0, 5, 1], (2, 5), 0.99609375)
AI Made Random Choice
Playing training game 1311
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0031359033351681994)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.7130592472084298)
AI Q-Learning Choice: ([1, 2, 5, 5], (0, 1), 0.00024110664720624375)
AI Q-Learning Choice: ([0, 2, 5, 5], (1, 2), 0.7392174660207065)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 5], (3, 4), 0.5)
AI Made Random Choice
Playing training game 1312
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0016885049911872217)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.7261383566145682)
AI Q-Learning Choice: ([1, 2, 5, 5], (0, 1), 0.00012055332360312187)
AI Q-Learning Choice: ([0, 2, 5, 5], (1, 2), 0.6196087330103532)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 2], (2, 3), 0.9980468749995879)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made Random Choice
Playing training game 1313
AI Q-Learning Choice: ([1, 3, 5

AI Q-Learning Choice: ([1, 2, 5, 4], (2, 1), 0.9716155007773899)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 2], (2, 3), 0.9921129882138583)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 0], (1, 1), 0.9999990581954987)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 1464
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.00407833397018499)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.9309810141410357)
AI Q-Learning Choice: ([1, 2, 5, 5], (3, 1), 0.0004999425900480046)
AI Q-Learning Choice: ([1, 2, 5, 4], (2, 1), 0.981864244495624)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 4], (3, 1), 0.9842073985781779)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 0], (1, 2), 0.9999847412109375)
AI Made Random Choice
Playing training game 1465
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0022891382801164973)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.9564226293183299)
AI Q-Learning Choice:

AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 4], (3, 4), 0.99993896484375)
AI Made Random Choice
Playing training game 1592
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0009752574561593254)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.9364544091815644)
AI Q-Learning Choice: ([1, 2, 5, 5], (3, 1), 6.622910857088151e-05)
AI Q-Learning Choice: ([1, 2, 5, 4], (0, 1), 0.9615983509767236)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 5, 2], (2, 5), 0.874999999994543)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 0.9999999999990905)
AI Made Random Choice
Playing training game 1593
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.0005207432823651035)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 0.9490263800791441)
AI Q-Learning Choice: ([1, 2, 5, 5], (3, 1), 3.3114554285440756e-05)
AI Q-Learning Choice: ([1, 2, 5, 4], (0, 1), 0.9182991754856333)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 3, 4], (3, 3), 0.9374803599821222)
AI Made Random Choice
AI

Playing training game 1760
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.9530254800547424)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 8.940696778996732e-08)
AI Q-Learning Choice: ([1, 2, 5, 5], (1, 1), 0.9481493954210827)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 0], (2, 4), 0.999755859375)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 1761
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 2), 0.9505874377379125)
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 1), 4.470348389498366e-08)
AI Q-Learning Choice: ([1, 2, 5, 5], (1, 1), 0.9739526273980413)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 4, 5], (3, 1), 0.9850724136778245)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 2, 4], (3, 2), 0.9921873993462582)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 2, 1], (2, 2), 0.999969482399905)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training ga

Playing training game 1891
AI Q-Learning Choice: ([1, 3, 5, 7], (2, 4), 0.1669924501713064)
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 1), 0.6631603008851124)
AI Q-Learning Choice: ([1, 3, 1, 6], (1, 1), 0.04905478646868211)
AI Q-Learning Choice: ([1, 2, 1, 6], (1, 1), 0.8124295175075531)
AI Q-Learning Choice: ([1, 1, 1, 6], (2, 1), 0.006396472454070989)
AI Q-Learning Choice: ([1, 1, 0, 6], (0, 1), 0.9435100257396698)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 3], (3, 3), 0.9999998807907104)
AI Made Random Choice
Playing training game 1892
AI Q-Learning Choice: ([1, 3, 5, 7], (2, 4), 0.10802361831999426)
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 1), 0.7377949091963327)
AI Q-Learning Choice: ([1, 3, 1, 6], (1, 1), 0.02772562946137655)
AI Q-Learning Choice: ([1, 2, 1, 6], (1, 1), 0.8779697716236115)
AI Q-Learning Choice: ([1, 1, 1, 6], (2, 1), 0.0031982362270354947)
AI Q-Learning Choice: ([1, 1, 0, 6], (0, 1), 0.9717549532651901)
AI Made Random Choice
AI Q-Learning Choice: ([0, 

AI Q-Learning Choice: ([0, 1, 3, 0], (2, 3), 0.9999847412109375)
AI Made Random Choice
Playing training game 2045
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.47100916438288243)
AI Q-Learning Choice: ([1, 3, 5, 6], (1, 2), 0.12183496245517063)
AI Q-Learning Choice: ([1, 1, 5, 6], (0, 1), 0.7814230397218196)
AI Q-Learning Choice: ([0, 1, 5, 6], (2, 2), 0.019284186486516183)
AI Q-Learning Choice: ([0, 1, 3, 6], (3, 1), 0.9570155627705677)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 5], (3, 2), 0.968749433629772)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 2], (2, 1), 0.999999999747876)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 2046
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.626216102052351)
AI Q-Learning Choice: ([1, 3, 5, 6], (1, 2), 0.0705595744708434)
AI Q-Learning Choice: ([1, 1, 5, 6], (0, 1), 0.8692193012461936)
AI Q-Learning Choice: ([0, 1, 5, 6], (2, 2), 0.009642093243258092)
AI 

Playing training game 2187
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.0008725527493376726)
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9816912564722007)
AI Q-Learning Choice: ([0, 3, 5, 6], (2, 2), 5.803659143339924e-05)
AI Q-Learning Choice: ([0, 3, 3, 6], (2, 2), 0.9864962977749805)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 0, 6], (3, 3), 0.9843746936217029)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 3], (3, 3), 0.9999999998835847)
AI Made Random Choice
Playing training game 2188
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.0004652946703855359)
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9840937771235906)
AI Q-Learning Choice: ([0, 3, 5, 6], (2, 2), 2.901829571669962e-05)
AI Q-Learning Choice: ([0, 3, 3, 6], (2, 2), 0.9854354956983418)
AI Made Random Choice
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 2], (2, 1), 0.9999999962644369)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing trainin

AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.35170440918830176)
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.06251689503552783)
AI Q-Learning Choice: ([0, 3, 5, 6], (2, 1), 0.3417968671232944)
AI Q-Learning Choice: ([0, 3, 4, 6], (3, 3), 0.24949454513262026)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 4, 3], (2, 2), 0.9961626303256553)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 0.9999999999999998)
AI Made Random Choice
Playing training game 2279
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 0.3467506381557981)
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.15600572008407404)
AI Q-Learning Choice: ([0, 3, 5, 6], (2, 1), 0.1708984335616472)
AI Q-Learning Choice: ([0, 3, 4, 6], (3, 3), 0.6228285877291377)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 3], (2, 1), 0.9999955241275488)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 2], (2, 1), 0.9999999999998769)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made

AI Q-Learning Choice: ([0, 1, 3, 1], (2, 2), 0.9999999981373513)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 2411
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 1.9764447507568517e-10)
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.996123684965289)
AI Q-Learning Choice: ([0, 3, 5, 6], (2, 1), 9.40408555765868e-12)
AI Q-Learning Choice: ([0, 3, 4, 6], (1, 1), 0.9970702699846975)
AI Q-Learning Choice: ([0, 2, 4, 6], (1, 1), 2.1876071434851388e-13)
AI Q-Learning Choice: ([0, 1, 4, 6], (3, 1), 0.9666652182476936)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 4, 4], (1, 1), 0.9999617475508009)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 3], (2, 1), 0.9999999649157643)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 3], (3, 2), 0.9999999999999716)
AI Made Random Choice
Playing training game 2412
AI Q-Learning Choice: ([1, 3, 5, 7], (3, 1), 1.0352428031667193e-10)
AI Q-Learning Choice: ([1, 3, 5, 6],

AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made Random Choice
Playing training game 2608
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.8433939718148615)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 0], (2, 2), 0.9995113667486097)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 0], (2, 3), 0.9999995231628418)
AI Made Random Choice
Playing training game 2609
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.9214526692817356)
AI Made Random Choice
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 4], (3, 1), 0.9960442202500133)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 0.9999999923185418)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 2], (3, 1), 1.0)
AI Made Random Choice
Playing training game 2610
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.4607263346408678)
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 5), 0.49802211012500663)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 1], (1, 1), 0.999999973

AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 0], (1, 1), 0.9999999999425169)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 2782
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.025774613590237472)
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.833038915147681)
AI Q-Learning Choice: ([1, 2, 4, 7], (3, 1), 0.002914016751561258)
AI Q-Learning Choice: ([1, 2, 4, 6], (3, 1), 0.9528141974065489)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 5], (3, 3), 0.7499999343410546)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 2], (3, 1), 0.9999999999881752)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 2783
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.014344315170899364)
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.8929265562771149)
AI Q-Learning Choice: ([1, 2, 4, 7], (3, 1), 0.001457008375780629)
AI Q-Learning Choice: ([1, 2, 4, 6]

AI Q-Learning Choice: ([0, 1, 0, 5], (3, 5), 0.9999980926513672)
AI Made Random Choice
Playing training game 2885
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.6269112796494958)
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.24346211113091548)
AI Q-Learning Choice: ([1, 2, 4, 7], (3, 1), 0.008325861138309965)
AI Q-Learning Choice: ([1, 2, 4, 6], (0, 1), 0.7342329002635836)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 4, 6], (3, 1), 0.9977112289114097)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 4, 0], (2, 4), 0.9999923706054688)
AI Made Random Choice
Playing training game 2886
AI Q-Learning Choice: ([1, 3, 5, 7], (1, 1), 0.31761857039390284)
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.48884750569724955)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 0], (2, 1), 0.9995097424224)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 0], (0, 1), 0.9999999995390407)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Pla

AI Q-Learning Choice: ([0, 3, 3, 7], (2, 1), 0.48583117038844753)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 5], (3, 4), 0.999877928695659)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 0, 1], (1, 3), 0.9999999999708962)
AI Made Random Choice
Playing training game 3051
AI Q-Learning Choice: ([1, 3, 5, 7], (0, 1), 0.25042503280356915)
AI Q-Learning Choice: ([0, 3, 5, 7], (2, 2), 0.23196917962123903)
AI Q-Learning Choice: ([0, 3, 3, 7], (2, 1), 0.7428545495420533)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 4], (3, 3), 0.8747022745081523)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 3052
AI Q-Learning Choice: ([1, 3, 5, 7], (0, 1), 0.4966397911728112)
AI Q-Learning Choice: ([0, 3, 5, 7], (2, 2), 0.11598458981061952)
AI Q-Learning Choice: ([0, 3, 3, 7], (2, 1), 0.8087784120251028)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 0.9999999993456175)
AI Made Random Choice
AI

AI Q-Learning Choice: ([0, 1, 3, 1], (2, 2), 0.999999999992724)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 3154
AI Q-Learning Choice: ([1, 3, 5, 7], (0, 1), 6.386003314656867e-11)
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9966727548420041)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 7], (3, 7), 0.99609375)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 3155
AI Q-Learning Choice: ([1, 3, 5, 7], (0, 1), 3.193001657328434e-11)
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9963832524210021)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 4, 7], (3, 1), 0.7494997952720865)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 6], (3, 4), 0.999969482421875)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 2], (3, 1), 1.0)
AI Made Random Choice
Playing training game 3156
AI Q-Learning Choice: ([1, 3, 5, 7], (0, 1), 1.59650082

AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 7], (3, 7), 0.9999923705883292)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 0.999999999992724)
AI Made Random Choice
Playing training game 3330
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 2), 0.9920804428628947)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 5], (3, 1), 0.9987806609668826)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 4], (3, 2), 0.9999974836163896)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 1], (2, 2), 0.9999999999971578)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 3331
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 7], (3, 5), 0.9999389609660733)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 0.999999999996362)
AI Made Random Choice
Playing training game 3332
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.9998151940315709)


Playing training game 3557
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 0.9999999999916422)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 0.9999999999999716)
AI Made Random Choice
Playing training game 3558
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.9999969297105691)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 1], (1, 3), 0.9999999999853946)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 3559
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.998200190319063)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 5], (3, 5), 0.99993896484375)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 3560
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 4], (1, 3), 0.9999844410507026)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 0], (2, 5), 0.9998779

AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 5], (3, 3), 0.9980468730572962)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 0.9999999999995453)
AI Made Random Choice
Playing training game 3679
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 0.9999995301348702)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 3, 1], (2, 1), 0.9999999050744849)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 3680
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 0.9999982738356159)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 0.9999999606968392)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 1], (1, 1), 0.9999999999999996)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 3681
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 4, 7], (1, 1), 0.9967630986285011)
AI Made Rando

AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 3841
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 7], (3, 3), 0.9999999716121859)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 4], (3, 1), 0.9999999999989836)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 3842
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 7], (3, 5), 0.9999999999990792)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 0.9999999999999964)
AI Made Random Choice
Playing training game 3843
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 0.9999999694425383)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 0.999999995087048)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 3], (3, 2), 0.9999999999999858)
AI Made Random Choice
AI Q-Le

AI Made Random Choice
Playing training game 4077
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 1.0)
AI Made Random Choice
Playing training game 4078
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 7], (3, 3), 0.9999998780359691)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 1], (2, 4), 0.9999999999708962)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 4079
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9987849073708639)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 1, 6], (3, 4), 0.999755859252566)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 2], (3, 1), 0.9999999981373549)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 4080
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Ra

Playing training game 4192
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999703512082203)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 5, 5], (3, 1), 0.9998108081556438)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 4], (2, 1), 0.9999998794835793)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 0], (2, 3), 0.9999999981373549)
AI Made Random Choice
Playing training game 4193
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 0], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 4194
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 0], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 4195
AI Made Random Choice
AI Q-Learning Choice: ([

AI Q-Learning Choice: ([1, 2, 4, 4], (1, 1), 0.9998954232580708)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 4], (3, 3), 0.999969482421875)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 4419
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 0], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 4420
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.999999999890492)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 0], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 4421
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 0.9999999988592728)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 1], (3, 1), 0.9999999562314413)
AI Made Ra

AI Q-Learning Choice: ([0, 2, 0, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 4661
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9998865126117573)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 7], (3, 5), 0.99993896484375)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 4662
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 4663
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 0.9999999989636191)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 3], (0, 1), 0.9999999999881024)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 3], (3, 3), 1.0)
AI Made Random Choice
Playing training game 4664
AI Made Random Choice
AI Q-Learning Choice: ([0,

AI Made Random Choice
Playing training game 4823
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.9999999999993595)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 1, 3], (2, 1), 0.9999999999999037)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 0, 1], (1, 3), 1.0)
AI Made Random Choice
Playing training game 4824
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 7], (3, 7), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 0], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 4825
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 7], (3, 3), 0.9999999992623587)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 4], (3, 4), 1.0)
AI Made Random Choice
Playing training game 4826
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999931366567999)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 3], (2, 2), 0.9999956078775428)
AI Made R

AI Q-Learning Choice: ([1, 3, 4, 7], (1, 1), 0.9998919080303559)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 3, 7], (3, 7), 0.9999980926510914)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 3, 0], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 4996
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.999999999984803)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 3], (3, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 4997
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.9999999999924015)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 3], (0, 1), 0.9999999999999925)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 4998
AI Made R

AI Q-Learning Choice: ([0, 0, 4, 5], (3, 1), 0.9999999887780067)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 4], (3, 4), 0.9999999999999929)
AI Made Random Choice
Playing training game 5222
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999999462871035)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 5], (3, 5), 0.9999999997671694)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 5223
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 0.9999999999978975)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 3, 1], (2, 1), 0.9999999999991793)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 5224
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1)

AI Made Random Choice
Playing training game 5453
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.999999999999801)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 5454
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.9999999999999005)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 0], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made Random Choice
Playing training game 5455
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 5456
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 0.9999999999999503)
AI Made Random Choice
AI Q-Learning Choi

AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 5626
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999505282082796)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 7], (3, 6), 0.9999999701779518)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 5627
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 7], (3, 7), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 5628
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 0.9999999999987582)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 3], (3, 2), 1.0)
AI Made Random C

AI Q-Learning Choice: ([0, 2, 5, 5], (1, 2), 0.9999624464295888)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 5], (3, 3), 0.99993896484375)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made Random Choice
Playing training game 5797
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999984747919609)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 6], (3, 5), 0.9999694797820387)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 1, 1], (1, 2), 0.9999999999999998)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 5798
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 7], (3, 7), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 0], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 5799
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Ra

AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 3], (0, 1), 0.9999999999999999)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 1], (1, 1), 0.9999999999999928)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 5968
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9997683595765697)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 2], (2, 3), 0.9999999999998292)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 1], (1, 2), 0.9999999999997726)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 5969
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 0.9999999999994703)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 3], (3, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training gam

AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 6138
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9998945169443387)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 1], (2, 3), 0.9999997053416962)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 6139
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 4], (1, 3), 0.999999999999994)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 0], (2, 5), 0.9999999999999858)
AI Made Random Choice
Playing training game 6140
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.999648329713685)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 3, 7], (3, 7), 0.9999999999417923)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 0], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: (

AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 5, 0], (2, 3), 0.9999961853027057)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 6311
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999950906983048)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 4, 6], (1, 1), 0.9998282609326783)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 4, 0], (2, 2), 0.9999997615814209)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 6312
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 6313
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
A

Playing training game 6427
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 2), 0.9999997173404682)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 4, 5], (3, 1), 0.9999992212143054)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 4], (0, 1), 0.9999999999999749)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 4], (3, 3), 1.0)
AI Made Random Choice
Playing training game 6428
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 0.9999999999999978)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 1], (1, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 6429
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 1], (1, 2), 1.0)
AI Made Random Choice
P

AI Q-Learning Choice: ([1, 3, 4, 7], (1, 1), 0.9999994352588919)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 1], (2, 2), 0.9999999999997848)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 6656
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 6657
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 1.0)
AI Made Random Choice
Playing training game 6658
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 4], (1, 3), 0.9999999999999857)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 4], (2, 1), 0.9999999999999982)
AI Made Random Choice
A

AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.999999999621814)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 3], (2, 2), 0.9999997117486119)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 3, 3], (3, 1), 0.9999996839698713)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 2], (3, 1), 0.9999999999708962)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 6888
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 7], (3, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 1], (1, 2), 1.0)
AI Made Random Choice
Playing training game 6889
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 6890
AI

AI Q-Learning Choice: ([0, 0, 3, 4], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 0], (2, 1), 1.0)
AI Made Random Choice
Playing training game 7047
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 0.9999999999999999)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 3], (3, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7048
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 7049
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7050
AI Made Random Choi

AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 2, 0], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7217
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 1], (1, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 7218
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999149281409048)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 5, 6], (3, 2), 0.99999874813663)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 4], (3, 3), 0.9999999925494194)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 7219
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 1, 3], (0, 1), 1.0)
AI Made Random Cho

AI Q-Learning Choice: ([1, 2, 1, 3], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 2], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 1], (1, 2), 1.0)
AI Made Random Choice
Playing training game 7399
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 2, 0], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 7400
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999991357870746)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 4, 6], (1, 1), 0.9999792957318367)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 6], (3, 3), 0.9999389648413497)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 3], (3, 3), 1.0)
AI Made Random Choice
Playing training game 7401
AI Made Random Choice
AI Q-Learning Ch

AI Q-Learning Choice: ([1, 2, 4, 1], (2, 2), 0.9999999999999967)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 7570
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999989265806313)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 2], (2, 3), 0.9999999999999989)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 2], (3, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7571
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999999907473474)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 5], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 7572
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 1.0)
AI Made Random Choice
A

AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 1.0)
AI Made Random Choice
Playing training game 7805
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 4], (1, 3), 0.9999999999999998)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 4], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 7806
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9999928790459095)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 7], (3, 2), 0.9999983412731319)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 5], (3, 3), 0.9999961853027344)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 7807
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999903707378798)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 2, 7], (3, 6), 0.9999999999981797)
AI Made Random Choice
AI Q-Learning Choice:

AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 7980
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 3, 1], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7981
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 0], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 0], (1, 3), 1.0)
AI Made Random Choice
Playing training game 7982
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 1], (1, 1), 1.0)
AI Made Random Choice
Playing training game 7983
AI Made Rand

AI Q-Learning Choice: ([1, 3, 4, 7], (1, 1), 0.9999957706580245)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 7], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 3], (3, 3), 1.0)
AI Made Random Choice
Playing training game 8120
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 8121
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 8122
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.999999422491372)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 3], (2, 4), 0.9999999999999996)
AI Made Random Choice
AI Q-Learning Cho

AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 1, 3], (3, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8332
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 5], (1, 2), 0.9999999909461261)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 3, 5], (3, 2), 0.9999999994235468)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 3, 1], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8333
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 7], (3, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 3], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 0], (2, 2), 1.0)
AI Made Random Choice
Playing training game 8334
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 0], (2, 1), 1.0)
AI Made Random Choice
AI Q-Lear

AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1, 5], (3, 4), 0.9999847412109375)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 8445
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999772685410302)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 2], (2, 4), 0.9999999999995434)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 2], (3, 2), 1.0)
AI Made Random Choice
Playing training game 8446
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 4], (1, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 2, 4], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 3], (3, 3), 1.0)
AI Made Random Choice
Playing training game 8447
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9999917427710462)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 7], (3, 5), 0.9999999999708962)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 0, 2],

AI Q-Learning Choice: ([1, 1, 1, 4], (3, 4), 0.9999999995343387)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8618
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 3, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 3, 0], (2, 3), 1.0)
AI Made Random Choice
Playing training game 8619
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 0], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8620
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999999872807579)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 4, 7], (3, 3), 0.999999991234853)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 4, 4], (1, 1), 0.9999999999999625)
AI Made Random Choice
AI

AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 0], (0, 1), 1.0)
AI Made Random Choice
Playing training game 8774
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 3], (3, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 8775
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.999999954717595)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 5, 6], (3, 2), 0.9999999454514623)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 5, 0], (2, 5), 0.9999999997671694)
AI Made Random Choice
Playing training game 8776
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 0, 2], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8777
AI Made Random Choice
AI Q-Learning Choi

Playing training game 8937
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 3, 1], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 8938
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9999971716206886)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 5, 5], (1, 2), 0.9999998541056266)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 5], (3, 1), 0.9999999999998281)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 1], (2, 4), 1.0)
AI Made Random Choice
Playing training game 8939
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999995042213459)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 3], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 0, 0], (1, 1), 1.0)
AI Made Random Choice
P

Playing training game 9159
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999975596228261)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 1], (2, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 1], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 1, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 9160
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999999974875933)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 2, 5], (3, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 9161
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 3, 7], (3, 6), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 3, 1], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 1, 1], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 1

AI Q-Learning Choice: ([1, 3, 4, 7], (1, 1), 0.9999999984755852)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 4, 2], (2, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 1, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 9274
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999999237382133)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 7], (3, 2), 0.9999997570445627)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 4], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 4, 3], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 3], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 9275
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 2], (1, 1), 1.0)
AI Made Random 

AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999994618899813)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 3, 6], (3, 6), 0.9999998807904771)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 1, 0], (1, 3), 0.9999999999999998)
AI Made Random Choice
Playing training game 9428
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 3], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 9429
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9999999987356618)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 5, 7], (3, 3), 0.9999999295569499)
AI Made Random Choice
AI Q-Learning Choice: ([0, 1, 3, 4], (3, 2), 0.9999999944083837)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 2], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 2], (3, 1), 1.0)
AI Made Random Choice
Playing training gam

AI Made Random Choice
Playing training game 9653
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 7], (1, 1), 0.9999903798767735)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 5, 5], (1, 2), 0.9999999969975444)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 2], (2, 3), 0.9999999981373549)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 1], (2, 2), 1.0)
AI Made Random Choice
Playing training game 9654
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 0, 2], (3, 1), 1.0)
AI Made Random Choice
Playing training game 9655
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 1, 7], (3, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 3], (1, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 0, 0], (1, 2), 1.0)
AI Made Random Choice
Playing training game 9656
AI Made Random Choice
AI Q-Learning Cho

AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 2], (3, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 9797
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 2], (2, 5), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 0, 1], (1, 2), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 0, 1], (3, 1), 1.0)
AI Made Random Choice
Playing training game 9798
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 7], (3, 2), 0.9999999983197208)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 5, 0], (2, 4), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([1, 1, 0, 0], (1, 1), 1.0)
AI Made Random Choice
Playing training game 9799
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999957997515533)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 4, 6], (1, 1), 0.9994695392338402)
AI Made Random Choice
AI Q-Learning Choice: ([0, 2, 2, 6], (3, 6), 0.9999997615814209)
A

Playing training game 9979
AI Made Random Choice
AI Q-Learning Choice: ([1, 2, 5, 7], (2, 1), 0.9999999544607885)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 7], (3, 2), 0.999999947744717)
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 4, 4], (0, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 1, 4], (3, 4), 1.0)
AI Made Random Choice
Playing training game 9980
AI Made Random Choice
AI Q-Learning Choice: ([1, 3, 5, 6], (0, 1), 0.9999915652437518)
AI Made Random Choice
AI Q-Learning Choice: ([0, 3, 5, 5], (1, 3), 0.9999999317789666)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 5], (3, 2), 0.9999999925492697)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 3, 1], (2, 3), 1.0)
AI Made Random Choice
Playing training game 9981
AI Made Random Choice
AI Q-Learning Choice: ([1, 0, 5, 7], (3, 3), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 5, 4], (2, 1), 1.0)
AI Made Random Choice
AI Q-Learning Choice: ([0, 0, 2, 4], (3, 2), 1.0)
AI Made R

In [47]:
play(ai)


Piles:
Pile 0: 1
Pile 1: 3
Pile 2: 5
Pile 3: 7

AI's Turn
AI Made Random Choice
AI chose to take 4 from pile 2.

Piles:
Pile 0: 1
Pile 1: 3
Pile 2: 1
Pile 3: 7

Your Turn
Choose Pile: 2
Choose Count: 1

Piles:
Pile 0: 1
Pile 1: 3
Pile 2: 0
Pile 3: 7

AI's Turn
AI Q-Learning Choice: ([1, 3, 0, 7], (3, 5), 1.0)
AI chose to take 5 from pile 3.

Piles:
Pile 0: 1
Pile 1: 3
Pile 2: 0
Pile 3: 2

Your Turn
Choose Pile: 1
Choose Count: 1

Piles:
Pile 0: 1
Pile 1: 2
Pile 2: 0
Pile 3: 2

AI's Turn
AI Q-Learning Choice: ([1, 2, 0, 2], (0, 1), 1.0)
AI chose to take 1 from pile 0.

Piles:
Pile 0: 0
Pile 1: 2
Pile 2: 0
Pile 3: 2

Your Turn
Choose Pile: 3
Choose Count: 1

Piles:
Pile 0: 0
Pile 1: 2
Pile 2: 0
Pile 3: 1

AI's Turn
AI Q-Learning Choice: ([0, 2, 0, 1], (1, 2), 1.0)
AI chose to take 2 from pile 1.

Piles:
Pile 0: 0
Pile 1: 0
Pile 2: 0
Pile 3: 1

Your Turn
Choose Pile: 3
Choose Count: 1

GAME OVER
Winner is AI
