# Pyrat Deep Learning Processing

## Setup Environment

Required libraries for Data Preprocessing

In [3]:
# Import libraries

import os
import numpy as np
import tqdm
import ast
import scipy
import scipy.sparse
import sys
import pickle

Required libraries for Network Training

In [160]:
# Import libraries

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import inspect

Required libraries for Visualization

In [55]:
import matplotlib.pyplot as plt
%matplotlib inline

Define the path for the saved Pyrat Games. If you change your Pyrat repo location UPDATE THIS!

In [4]:
# Set your path to the saves folder here
directory =  'D:\PyRat-1\saves\\'

Set the name for the pickled file containing the Pyrat dataset preprocessed for supervised learning.

In [148]:
dataset_name = "pyrat_dataset.pkl"

## Creating Pyrat Games

If you have not done so already, you need the latest version of PyRat. To obtain it, clone the [official PyRat repository](https://github.com/BastienPasdeloup/PyRat-1). 

PS: You will need to have pygame installed in your machine, open a terminal and run:

<pre>pip install pygame</pre> 

In the context of the AI course, we are going to simplify the rules of PyRat a bit.
In fact, we are going to remove all walls and mud penalties. Also, we are not going to consider symmetric mazes anymore.

As such, a default game is launched with the following parameters. Please try now (note that you may have to type python instead of python3): 

<pre>python pyrat.py -p 40 -md 0 -d 0 --nonsymmetric</pre>

An empty labyrinth will appear.

Please check out all the options offered by the pyrat software, by running : 

<pre>python pyrat.py -h</pre>

Importantly, there are options to change the size of the map, the number of cheese, which will be very useful later to benchmark your own solutions. 

In the supervised and unsupervised projects, we are going to look at plays between two greedy algorithms. Generating 1000 such games while saving data is easily obtained with PyRat. 

Open another terminal to launch the next command line. Generating 1000 games will take a few minutes.

<pre>python pyrat.py --width 21 --height 15 -p 40 -md 0 -d 0 --nonsymmetric --rat AIs/manh.py --python AIs/manh.py --tests 1000 --nodrawing --synchronous --save</pre>

The 1000 generated games will be in the "saves" folder. Each time you execute the command new games are added to the saves folder. You have to manually delete the old games if you do not want to use them (for example, if you change the size of the labyrinth or if you want to train your IA on new games).

As bonus, to run a cute visual simulation to understand Pyrat with the players controlled by the greedy approach AI you can run the following command:

<pre>python pyrat.py --width 20 --height 20 -p 40 -md 0 -d 0 --nonsymmetric --rat AIs/manh.py --python AIs/manh.py</pre>

## Preprocessing Tools

### Constant Definitions

Preprocessing Constant Definitions

In [5]:
PHRASES = {
    "# Random seed\n": "seed",
    "# MazeMap\n": "maze",
    "# Pieces of cheese\n": "pieces"    ,
    "# Rat initial location\n": "rat"    ,
    "# Python initial location\n": "python"   , 
    "rat_location then python_location then pieces_of_cheese then rat_decision then python_decision\n": "play"
}
 
MOVE_DOWN = 'D'
MOVE_LEFT = 'L'
MOVE_RIGHT = 'R'
MOVE_UP = 'U'
 
translate_action = {
    MOVE_LEFT:0,
    MOVE_RIGHT:1,
    MOVE_UP:2,
    MOVE_DOWN:3
}

### Function Definitions

**Define a function** to process a Pyrat game file.

In [6]:
"""
    This function receives a pyrat file save and returns its parameters.
"""

def process_file(filename):
    f = open(filename,"r")    
    info = f.readline()
    params = dict(play=list())
    while info is not None:
        if info.startswith("{"):
            params["end"] = ast.literal_eval(info)
            break
        if "turn " in info:
            info = info[info.find('rat_location'):]
        if info in PHRASES.keys():
            param = PHRASES[info]
            if param == "play":
                rat = ast.literal_eval(f.readline())
                python = ast.literal_eval(f.readline())
                pieces = ast.literal_eval(f.readline())
                rat_decision = f.readline().replace("\n","")
                python_decision = f.readline().replace("\n","")
                play_dict = dict(
                    rat=rat,python=python,piecesOfCheese=pieces,
                    rat_decision=rat_decision,python_decision=python_decision)
                params[param].append(play_dict)
            else:
                params[param] = ast.literal_eval(f.readline())
        else:
            print("did not understand:", info)
            break
        info = f.readline()
    return params

Process a sample game to **understand** its contents

In [149]:
sample_game = process_file("sample_game8x8")
sample_game

{'play': [{'rat': (0, 0),
   'python': (7, 7),
   'piecesOfCheese': [(3, 7),
    (6, 3),
    (4, 5),
    (2, 3),
    (3, 2),
    (1, 4),
    (2, 4),
    (4, 0),
    (5, 3),
    (6, 6),
    (5, 5),
    (3, 5),
    (1, 5),
    (0, 4),
    (1, 1),
    (0, 2),
    (0, 5),
    (6, 0),
    (5, 6),
    (3, 6),
    (7, 2),
    (7, 3),
    (2, 7),
    (0, 7),
    (5, 2),
    (1, 3),
    (7, 4),
    (4, 7),
    (7, 6),
    (4, 2),
    (4, 1),
    (5, 0),
    (6, 5),
    (7, 5),
    (4, 3),
    (1, 2),
    (2, 5),
    (5, 1),
    (6, 2),
    (3, 0)],
   'rat_decision': 'R',
   'python_decision': 'D'},
  {'rat': (1, 0),
   'python': (7, 6),
   'piecesOfCheese': [(3, 7),
    (6, 3),
    (4, 5),
    (2, 3),
    (3, 2),
    (1, 4),
    (2, 4),
    (4, 0),
    (5, 3),
    (6, 6),
    (5, 5),
    (3, 5),
    (1, 5),
    (0, 4),
    (1, 1),
    (0, 2),
    (0, 5),
    (6, 0),
    (5, 6),
    (3, 6),
    (7, 2),
    (7, 3),
    (2, 7),
    (0, 7),
    (5, 2),
    (1, 3),
    (7, 4),
    (4, 7),
    (4, 2

**Understand** how to interpret the size of a maze from a saved game file.

In [72]:
# The key 'rat' from a saved game file data dictionary contains the initial coordinates of the rat
# The rat always starts at bottom left corner (0,0)
print( f'Rat initial position {sample_game["rat"]}' )

# The key 'python' from a saved game file data dictionary contains the initial coordinates of the python
# The python starts at the top right corner (mazeWidth-1,mazeHeight-1)
print( f'Python initial position {sample_game["python"]}' )

# Get the maze size from a single saved game file
sampleWidth =  sample_game["python"][0] + 1
sampleHeight = sample_game["python"][1] + 1

print(f'Maze size: {sampleWidth, sampleHeight}')

Rat initial position (0, 0)
Python initial position (7, 7)
Maze size: (8, 8)


**Define a function** to create a canvas.

In [144]:
"""
    The goal of this function is to create a canvas, which will be the vector used to train a classifier. 
    As we want to predict a next move, we will create a canvas that is centered on the player, so that we create a translation invariance.
"""

def convert_input(player, maze, opponent, mazeHeight, mazeWidth, piecesOfCheese):
    
    # We consider twice the size of the maze to simplify the creation of the canvas. 
    # The canvas is initialized as a numpy tensor with 3 dimensions, the third one corresponding to "layers" of the canvas. 
    # Here, we just use one layer, but you can define other ones to put more information on the play (e.g. the location of the opponent could be put in a second layer).
    
    #im_size = (2*mazeHeight-1,2*mazeWidth-1,1)
    im_size = (2*mazeWidth-1, 2*mazeHeight-1, 1)

    # We initialize a canvas with only zeros.
    canvas = np.zeros(im_size)

    # Coordinates of the player in the original maze
    (x,y) = player
    
    # Center of the canvas, which represents the view of the centered player
    canvas_x_center = mazeWidth - 1
    canvas_y_center = mazeHeight - 1
        
    # Fill in the first layer of the canvas with the value 1 at the location of the cheeses, relative to the position of the player (i.e. the canvas is centered on the player location).
    for (x_cheese,y_cheese) in piecesOfCheese:
        canvas[ canvas_x_center+(x_cheese-x), canvas_y_center+(y_cheese-y), 0] = 1

    return canvas

Vizualize more clearly what a single play contains to **understand** next preprocessing functions.

In [106]:
# Print the first play of the sample game
sample_game["play"][0]

{'rat': (0, 0),
 'python': (7, 7),
 'piecesOfCheese': [(3, 7),
  (6, 3),
  (4, 5),
  (2, 3),
  (3, 2),
  (1, 4),
  (2, 4),
  (4, 0),
  (5, 3),
  (6, 6),
  (5, 5),
  (3, 5),
  (1, 5),
  (0, 4),
  (1, 1),
  (0, 2),
  (0, 5),
  (6, 0),
  (5, 6),
  (3, 6),
  (7, 2),
  (7, 3),
  (2, 7),
  (0, 7),
  (5, 2),
  (1, 3),
  (7, 4),
  (4, 7),
  (7, 6),
  (4, 2),
  (4, 1),
  (5, 0),
  (6, 5),
  (7, 5),
  (4, 3),
  (1, 2),
  (2, 5),
  (5, 1),
  (6, 2),
  (3, 0)],
 'rat_decision': 'R',
 'python_decision': 'D'}

**Define a function** to vectorize a Pyrat game data dictionary.

In [107]:
"""
    This function vectorizes a Pyrat game data dictionary.
"""
def dict_to_x_y(end, rat, python, maze, piecesOfCheese, rat_decision, python_decision,
                    mazeWidth, mazeHeight):
        # We only use the winner
        if end["win_python"] == 1: 
            player = python
            opponent = rat        
            decision = python_decision
        elif end["win_rat"] == 1:
            player = rat
            opponent = python        
            decision = rat_decision
        else:
            return False
        if decision == "None" or decision == "": #No play
            return False
        x_1 = convert_input(player, maze, opponent, mazeHeight, mazeWidth, piecesOfCheese)
        y = np.zeros((1,4),dtype=np.int8)
        y[0][translate_action[decision]] = 1
        return x_1,y

**Understand** the vectorized version of the sample game data dictionary.

In [172]:
# Create a vectorized version of the sample game data dictionary
sample_x_y = dict_to_x_y(**(sample_game["play"][0]),
                         maze = sample_game["maze"],
                         end = sample_game["end"],
                         mazeWidth = sampleWidth,
                         mazeHeight = sampleHeight)

# Ensure that the created canvas matches the formula ( 2*mazeHeight-1, 2*mazeWidth-1, 1 )
print(f'Maze size:   {sampleWidth, sampleHeight}')
print(f'Canvas size: {(sample_x_y[0]).shape}')

Maze size:   (8, 8)
Canvas size: (15, 15, 1)


In [173]:
# Check who is the winner
print("This game's winner is...")
print(f"Rat: {sample_game['end']['win_rat']}")
print(f"Python: {sample_game['end']['win_python']}\n")

# Print the canvas (only first layer, so index-0 of 3rd dimension)
print(sample_x_y[0][:,:,0])

This game's winner is...
Rat: 0
Python: 1

[[0. 0. 1. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 1. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [146]:
# Print the player's choice
sample_x_y[1]

array([[0, 0, 0, 1]], dtype=int8)

### Main Preprocessing

The following code parses the saves directory to generate a database file called 'pyrat_dataset.pkl'.
It is taken exactly as it is presented in the generate_dataset.py file for supervised learning.

The saved pickled file contains a sparse matrix representation of both the canvas and the decision vector since this data contains mostly zero-valued elements and sparse matrices are more memory-efficient representations of a matrix than multidimensional arrays.

For a good introduction on sparse matrices refer to the next [link](https://machinelearningmastery.com/sparse-matrices-for-machine-learning/#:~:text=Matrices%20that%20contain%20mostly%20zero,non%2Dzero%2C%20called%20dense).

In [145]:
games = list()

for root, dirs, files in os.walk(directory):
    for filename in tqdm.tqdm(files):
        if filename.startswith("."):
            continue
        game_params = process_file(directory+filename)
        games.append(game_params)
        
# Check if all games are on mazes of same dimension
mazeWidth=games[0]["python"][0] + 1
mazeHeight=games[0]["python"][1] + 1
print(mazeWidth, mazeHeight)
for game in games :
    if game["python"][0] + 1 != mazeWidth or game["python"][1] + 1 != mazeHeight :
        print("Saves directory contains games of various dimensions")
        exit()
        
x_1_train = list()
y_train = list()
wins_python = 0
wins_rat = 0
        
for game in tqdm.tqdm(games):
    if game["end"]["win_python"] == 1: 
        wins_python += 1
    elif game["end"]["win_rat"] == 1:
        wins_rat += 1
    else:
        continue
    plays = game["play"]
    for play in plays:
        x_y = dict_to_x_y(**play,maze=game["maze"],end=game["end"], mazeWidth=mazeWidth, mazeHeight=mazeHeight)
        if x_y:
            x1, y = x_y
            y_train.append(scipy.sparse.csr_matrix(y.reshape(1,-1)))
            x_1_train.append(scipy.sparse.csr_matrix(x1.reshape(1,-1)))

print("Greedy/Draw/Random Greedy, {}/{}/{}".format(wins_rat,1000 - wins_python - wins_rat, wins_python)) 

pickle.dump([x_1_train,y_train, mazeWidth, mazeHeight], open("pyrat_dataset.pkl","wb"))

100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:40<00:00, 24.55it/s]


20 20


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:18<00:00, 53.40it/s]


Greedy/Draw/Random Greedy, 435/113/452


## Train the Network

### Prepare the Network's Inputs

Load the pickled dataset containing the plays canvas (x) and the associated decision vectors (y) as scipy sparse matrices representations.

In [215]:
### This line reloads the pyrat_dataset that was stored as a pkl file by the generate dataset script. 
x, y, mazeWidth, mazeHeight = pickle.load(open(dataset_name,"rb"))

As the dataset was stored using scipy sparse array to save space, we convert it back to torch dense array. 
Note that you could keep the sparse representation if you work with a machine learning method that accepts sparse arrays.

In [216]:
x = scipy.sparse.vstack(x).todense()
y = scipy.sparse.vstack(y).todense()



In [209]:
y[0:30]

matrix([[1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 0, 1],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 1, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 1, 0],
        [0, 0, 1, 0]], dtype=int8)

In [210]:
# For reshape a single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.
x = torch.FloatTensor(x).reshape(-1,(2*mazeHeight-1)*(2*mazeWidth-1))  # (number of moves, size of the canvas)
y = torch.argmax(torch.FloatTensor(y), dim=1)  # (number of moves,)

# This is the number of features contained in a canvas (canvasWidth * canvasHeight)
canvas_size = x.shape[1]

print(f'Number of features/maze-cells contained in a canvas (canvasWidth * canvasHeight): {canvas_size}')
print(x.shape, y.shape)

Number of features/maze-cells contained in a canvas (canvasWidth * canvasHeight): 1521
torch.Size([67359, 1521]) torch.Size([67359])


In [211]:
y[0:30]

tensor([0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 3, 0, 0, 0, 0,
        2, 0, 0, 2, 2, 2])

Split data for cross validation.

In [167]:
## Split your data into x_train, x_test, y_train, y_test.
n = int(x.shape[0] * 80/100)  # number of examples in the train set
x_train = x[:n]
x_test  = x[n:]
y_train = y[:n]
y_test  = y[n:]

Now you have to train a classifier using supervised learning and evaluate it's performance.

To begin with, define a neural network with two hidden layers. In pytorch, this correspond to only adding two layers of type "Linear".

You need to make sure that the size of the input of the first layer correspond to the width of your X vector. 
Feel free to try different number of layer and other non linear function.

In [174]:
class Net(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Net, self).__init__()
        self.input_layer = nn.Linear(input_dim, hidden_dim)
        self.hidden_layer1 = nn.Linear(hidden_dim, hidden_dim)
        self.hidden_layer2 = nn.Linear(hidden_dim, hidden_dim)
        self.output_layer = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        x = torch.relu(self.input_layer(x))
        x = torch.relu(self.hidden_layer1(x))
        x = torch.relu(self.hidden_layer2(x))
        x = self.output_layer(x)
        return x