# Dots & Boxes

## Introduction

This project is about making use of a Q-learning reinforcement algorithm to implement a game-playing agent which learns how to play a game of [3x3](https://www.wikihow.com/images/thumb/c/cb/Win-at-the-Dot-Game-Step-3.jpg/aid608874-v4-900px-Win-at-the-Dot-Game-Step-3.jpg) [Dots and Boxes](https://en.wikipedia.org/wiki/Dots_and_Boxes) optimally.

## 3x3 Dots & Boxes

Dots & boxes is a 2-player game.

The starting state is an empty grid of dots (16 dots in case of a 3x3 size board). Both players take turns making a move; a move consists of adding either a horizontal or vertical line between two unjoined adjacent dots. If making a move completes a 1x1 box, then the player who made that move wins that particular box (essentially, gets a point); the player also retains their turn. The game ends when there are no more available moves left to make. The player with the most points number of points is the winner of the game.

Determining how to store and represent the game is a bit tricky, since both the dots and their intermediate edges are valid to the game state. However, representing both dots and edges is not feasible since doing so requires either multiple lists or nested ones, both of which are not unviable to use as input parameters to the neural network.

One can, however, observe that the dots are constant for every state. Hence, a game state can be represented solely by its edges. All edges in the game are represented as a list (of length 24, since there are 24 edges in a 3x3 size game), with 0 denoting that an edge does not exist, and one denoting otherwise.

The edge ordering being considered is:

**&#183;**&nbsp;&nbsp;&nbsp; 0 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 1 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 2 &nbsp;&nbsp;&nbsp;**&#183;**  
12 &nbsp;&nbsp;&nbsp;&nbsp; 13 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 14 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15  
**&#183;**&nbsp;&nbsp;&nbsp; 3 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 4 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 5 &nbsp;&nbsp;&nbsp;**&#183;**  
16 &nbsp;&nbsp;&nbsp;&nbsp; 17 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 18 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 19  
**&#183;**&nbsp;&nbsp;&nbsp; 6 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 7 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp;&nbsp; 8 &nbsp;&nbsp;&nbsp;**&#183;**  
20 &nbsp;&nbsp;&nbsp;&nbsp; 21 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 22 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 23  
**&#183;**&nbsp;&nbsp;&nbsp; 9 &nbsp;&nbsp;&nbsp;**&#183;**&nbsp;&nbsp; 10 &nbsp;&nbsp;**&#183;**&nbsp;&nbsp; 11 &nbsp;&nbsp;**&#183;**  

The index of a value in this list represents the corresponding edge, e.g. if edge 0 exists in a state, then index 0 in the list has value 1, and vice versa.

# A module containing simulation functions to train an AI agent

### Functions

1. * `train_simulation(environment, train_agent, target_agent, n_games, update_step, test_agents=None, test_games=None)`: 
Runs a simulation to train an agent. If test agents are provided, tests the agent at an interval of 1000 iterations and logs the results.

        * `environment`: game environment
        * `train_agent`: learning agent
        * `target_agent`: opponent agent
        * `n_games`: number of training games to run
        * `update_step`: number of games played until opponent model is updated
        * `test_agents`: array of test agents
        * `test_games`: number of games to play against test agents
        * `return`




2. * `output_comparison(training_games=10000, test_games=1000)`:
Runs a comparison against 2 agents that are identical aside from the output activation function.
    * `training_games`: Number of games to play
    * `test_games`: Number of test games to play

In [1]:

from ai_agents import DQNLearner
from naive_players import SimplePlayer
from simulation_utils import *
import os


def train_simulation(environment, train_agent, target_agent, n_games, update_step, test_agents=None, test_games=None):
    
    # Create a test environment
    if test_agents is not None:
        test_env = clone(environment)

    # Set the players to the environment
    environment.player1 = train_agent
    environment.player2 = target_agent

    # Get the directory to save/load models and log information
    model_dir = model_path(environment.size)
    log_file = log_path(environment.size)

    # For Debugging
    print ("Model Directory is: {}".format(model_dir))
    print ("Log file is: {}".format(log_file))

    # Start log file if it doesn't exist, otherwise load from last game
    if not os.path.exists(log_file) or os.stat(log_file) == 0:
        with open(log_file,'a') as file:
            game_start = 1
            file.write('Game Number,Test Agent,Win Percentage,Draw Percentage,Loss Percentage\n')
    else:
        last_line = recent_game(log_file)
        if last_line != "Game Number":
            game_start = int(recent_game(log_file)) + 1
        else:
            game_start = 1

    train_agent.initialize_network()
    target_agent.initialize_network()

    # Load previous model if it exists
    try:
        train_agent.load_model(model_dir + '-' + str(game_start - 1))
        target_agent.load_model(model_dir + '-' + str(game_start - 1))
        print("Load Succeeded")
    except:
        print("Attempted load and failed")

    # Debugging
    print ("Starting at game {}".format(game_start))

    # Begin training games
    for game_number in range(game_start, n_games + 1):

        # Switch who goes first every other round
        environment.player1 = train_agent
        environment.player2 = target_agent
        if game_number % 2 == 0:
            switch_players(environment)

        environment.play()

        # Write to logs every 1 games
        if game_number % 200 == 0 and test_agents:
            print("Game {} Test Results".format(game_number))
            with open(log_file, 'a') as file:
                for agent in test_agents:
                    win_percentage, draw_percentage, loss_percentage = test(test_env, train_agent, agent, test_games)
                    file.write('{},{},{},{},{}\n'.format(game_number, agent, win_percentage, draw_percentage, loss_percentage))
                    print()

        
        # Play games agains the old model
        if game_number % update_step == 0:

            # Give the target agent the most recent model
            print("Saving current model")
            path = train_agent.save_model(model_dir, global_step=game_number)

            # Load model into target
            print ("Loading model into target")
            target_agent.load_model(path)
            print ()

    print ("Finished!")


def output_comparison(training_games=10000, test_games=1000):

    training_env = DotsAndBoxes(3)
    training_env2 = clone(training_env)
    test_env = clone(training_env)

    tanh_player = DQNLearner('tanh output', alpha=1e-6, gamma=0.6)
    linout_player = DQNLearner('linear output', alpha=1e-6, gamma=0.6)
    training_opponent = Player('training opponent')
    training_opponent2 = Player('training opponent 2')

    training_env.player1 = tanh_player
    training_env.player2 = training_opponent
    tanh_player.initialize_network(output='tanh')

    training_env2.player1 = linout_player
    training_env2.player2 = training_opponent2
    linout_player.initialize_network(output='linear')

    test_random = Player('Random')
    test_moderate = SimplePlayer('Moderate', level=1)
    test_advanced = SimplePlayer('Advanced', level=2)

    log_file = '.{0:s}Analysis{0:s}output_comparison.txt'.format(os.sep)
    with open(log_file, 'w') as file:
        file.write('Learning Agent,Test Agent,Win %, Draw %, Loss %\n')

    for game_number in range(1, training_games+1):

        # Switch who goes first every other round
        training_env.player1 = tanh_player
        training_env.player2 = training_opponent

        training_env2.player1 = linout_player
        training_env2.player2 = training_opponent2

        # Switch starting positions
        if game_number % 2 == 0:
            switch_players(training_env)
            switch_players(training_env2)

        training_env.play()
        training_env2.play()

        if game_number % (training_games/20) == 0:
            print("Running Tests at game {}".format(game_number))
            for test_agent in (test_random, test_moderate, test_advanced):
                for player in (tanh_player, linout_player):
                    print("Testing player: {}".format(player))
                    wins, draws, loss = test(test_env, player, test_agent, test_games)
                    with open(log_file, 'a') as file:
                        file.write('{},{},{},{},{}\n'.format(player, test_agent, wins, draws, loss))
                    print()

    print ("Training Completed!")

if __name__ == '__main__':
    game_size = 3
    train_agent = DQNLearner('train',alpha=1e-6,gamma=0.6)

    target_agent = DQNLearner('target')
    target_agent.learning = False

    # Load the testing agents
    test_agent1 = Player(name='random_player')
    test_agent2 = SimplePlayer(name='moderate_player', level=1)
    test_agent3 = SimplePlayer(name='advanced_player', level=2)
    
    env = DotsAndBoxes(game_size)
    n_games = 10000
    update_step = 200
    test_games = 1000
    
    train_simulation(env, train_agent, target_agent,
                                    n_games, update_step,
                                    [test_agent1,test_agent2,test_agent3], test_games)
    


Model Directory is: .\models\size3\
Log file is: .\models\size3\logs.txt
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Attempted load and failed
Starting at game 1
Game 200 Test Results
Current win percentage over agent random_player: 60.80%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 39.20%

Current win percentage over agent moderate_player: 0.10%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 99.90%

Current win percentage over agent advanced_player: 0.50%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.50%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-200

Game 400 Test Results
Current win percentage over agent random_player: 


Game 2400 Test Results
Current win percentage over agent random_player: 76.40%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 23.60%

Current win percentage over agent moderate_player: 1.10%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 98.90%

Current win percentage over agent advanced_player: 0.60%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.40%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-2400

Game 2600 Test Results
Current win percentage over agent random_player: 72.40%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 27.60%

Current win percentage over agent moderate_player: 1.00%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent mo

Current win percentage over agent advanced_player: 1.50%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 98.50%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-4800

Game 5000 Test Results
Current win percentage over agent random_player: 81.30%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 18.70%

Current win percentage over agent moderate_player: 1.00%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 99.00%

Current win percentage over agent advanced_player: 0.70%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.30%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-5000

Game 5200 Test Results
Current win percentage over agent ran

Current win percentage over agent moderate_player: 3.20%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 96.80%

Current win percentage over agent advanced_player: 0.40%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.60%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-7400

Game 7600 Test Results
Current win percentage over agent random_player: 92.60%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 7.40%

Current win percentage over agent moderate_player: 3.20%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 96.80%

Current win percentage over agent advanced_player: 0.80%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.20


Game 10000 Test Results
Current win percentage over agent random_player: 94.50%
Current draw percentage over agent random_player: 0.00%
Current loss percentage over agent random_player: 5.50%

Current win percentage over agent moderate_player: 3.70%
Current draw percentage over agent moderate_player: 0.00%
Current loss percentage over agent moderate_player: 96.30%

Current win percentage over agent advanced_player: 0.90%
Current draw percentage over agent advanced_player: 0.00%
Current loss percentage over agent advanced_player: 99.10%

Saving current model
Loading model into target
INFO:tensorflow:Restoring parameters from .\models\size3\-10000

Finished!
