Introduction to Artificial Intelligence - Supervised Learning lab Session Part b
--
At the end of this session, you will be able to : 
- Learn the basics of pytorch in this [tutorial](Pytorch_tutorial.ipynb) (this can be done in parallel that the previous step if you work by groups of two)
- Apply supervised learning on PyRat datasets and traini a classifer to predict the next movement to play.

In [2]:
# The tqdm package is useful to visualize progress with long computations. 
# Install it using pip. 
import tqdm
import numpy as np
import ast
import os
import sys
import random
import inspect


Pytorch tutorial
--
Go [here](Lab4a_Pytorch_tutorial.ipynb) and perform the pytorch tutorial before moving to part b (this one). 

Playing PyRat using Machine Learning by training a classifier to predict the next movement to play (or - Supervised Baseline for Pyrat Challenge)
--

Note that we now take a step further with respect to Lab_1a and try to predict the next movement to play given a maze configuration. We therefore need to generate a new training dataset (X=canvas, y=next movement) with pyrat games to train a model. In particular we will train a deep neural network. Note that you will have to define a model in pytorch, so you have to do the pytorch tutorial first. 

The canvas here represents the state of the game and it corresponds to the vector that will be used to train the classifier. As we want to predict a next move, the canvas is twice the size of the maze and is centered on the player, so that we create a translation invariance.

Have a look at the file `generate_SL_dataset.py`. It generates a dataset (`SupervisedLearning_experience.pt`) for training a classifier to predict the next move given a game configuration. The canvas (state of the game) is generated by the function `build_state` and is stored in memory together with the corresponding action at each turn of the game. `build_state` outputs a one layer canvas, but you can define other layers to put more information on the play (e.g. the location of the opponent could be put in a second layer). 


In [None]:
### Load the pyrat_dataset that was stored as a .pt file by the generate_SL_dataset.py script. 

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pickle
import random

BATCH_SIZE=50
N_EPOCHS=20

## CELL TO BE COMPLETED ##
SL_dataset = # path to Supervised Learning dataset 
data=torch.load(SL_dataset)

#how many examples?


# an example of the canvas (corresponding to first game) is
x = data[0]["state"]
print(x.shape)
# the corresponding label (one-hot encoded 'action') 
y = data[0]["action"]
print(y)

# maze and canvas size


In [None]:
def make_batch (data, batch=BATCH_SIZE):

    """
        This function builds batches from the dataset to train the model on.
        Each batch is a pair (data, target), where each element has batch size as first dimension.
        In:
            
            * experience:       List of experience situations encountered across games.
                
        Out:
            * data:    Batch of data.
            * targets: Targets associated with the sampled data.
    """

    # Get indices
    batch_size = min(batch, len(data))
    indices = random.sample(range(len(data)), batch_size)

    # Create the batch
    X = torch.zeros(batch_size, data[0]["state"].shape[0]*data[0]["state"].shape[1])
    y = torch.zeros(batch_size,dtype=torch.int64)
    
    for i in range(batch_size):
        
        # Data is the sampled state
        X[i] = torch.flatten(data[indices[i]]["state"])  # flatten canvas (input to model)   
        y[i] = data[indices[i]]["action"]
    
    return X,y

In [None]:
### Now you have to train a classifier using supervised learning and evaluate it's performance. 
### Let's try a neural network.

## Split your data into x_train, x_test, y_train, y_test.

n = int(len(data) * 80/100)  # number of examples in the train set
train_data=data[:n]
test_data=data[n:]



In [None]:
## Define a neural network with two hidden layers. In pytorch, this correspond to only adding two layers of type "Linear".
## You need to make sure that the size of the input of the first layer correspond to the width of your X vector. 

## CELL TO BE COMPLETED ##
class Net(nn.Module):
    def __init__(self, in_features):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(in_features, 512)
        ## ADD LAYERS HERE ##
       
    def forward(self, x):
        ## ADD FORWARD FUNCTIONS HERE ##
        
        return x



In [None]:
# instantiate your model spefying the in_features (size of a "flattened" input)

net=Net(train_data[0]['state'].shape[0]*train_data[0]['state'].shape[1])


In [None]:
## CELL TO BE COMPLETED ##
## Define a loss function and optimizer



In [None]:

## CELL TO BE COMPLETED ##
## Train the network
n_batch=len(data)//BATCH_SIZE


for epoch in range(N_EPOCHS):
    running_loss = 0
    # get the inputs
    for b in range(n_batch):
        inputs,labels = make_batch(train_data)
        # ZERO THE PARAMETERs GRADIENT
       
        # FORWARD + BACKWARD + OPTIMIZE
        
        
        # statistics
        running_loss += loss.item()
    
    print('[%d] loss: %.3f' % (epoch + 1, running_loss/n_batch ))
    

print('Finished Training')

In [None]:
## Check performances
## Training accuracy
correct = 0



with torch.no_grad():
    inputs,labels = make_batch(train_data, batch=len(train_data))
    outputs = net(inputs)
    _, predicted = torch.max(outputs.data, 1)
    correct += (predicted == labels).sum().item()

print('Training accuracy of the network: %d %%' % (100 * correct / len(train_data)))
        
## Test accuracy
correct = 0
total = len(test_data)
with torch.no_grad():
    inputs,labels = make_batch(test_data, batch=len(test_data))
    outputs = net(inputs)
    _, predicted = torch.max(outputs.data, 1)
    correct += (predicted == labels).sum().item()

print('Test accuracy of the network: %d %%' % (100 * correct / len(test_data)))



In [None]:
## Save the weights

torch.save(net.state_dict(), 'AI/trained_model_weights.pth')

### Remarks on training a NN 

If the training accuracy is about 20%, it means the network predicts the result as good as chance (5 possible choices: North, South, East, West, Nothing).

When you train a neural network, you have to analyze your results. If, after the training, your training accuracy is far from 100%, your network is underfitting (high bias). Try to train the network longer (more epochs, bigger/smaller learning rate, batch size). Or, define a bigger network (more hidden layers, bigger out_features).  If, your test accuracy is far from your training accuracy, your network is overfitting (high variance). Try to regularize your optimization (look at L2 regularization, weight decay, drop out, early stopping...).
Try to use more data.


# Test in PyRat

Now, it's time to test if your AI is able to beat an opponent. Open the `supervised_pyrat_player.py` file, and update the `TRAINED_MODEL_PATH` constant to set the path to the classifier you want to use. Also modify/complete the code corresponding to the parts

########
#### TODO ####   
########

Then run the `supervised_player.py` with the following command, changing the needed parameters. Make sure you use the same settings (width/height/number of cheeeses) as during training:

<br />`python supervised_pyrat_player.py`


# How well does the trained classifier play against the greedy? 

Now it's up to you to explore other possibilities to make a better player. A few starting points: 
- Change the "canvas" to add more information, such as the position of the other player. 
- Find more clever strategies to cross validate training, in order to enable a better estimate of generalization
- Work on simpler versions of the problem (smaller maze, less cheese, ..) , to develop a better understanding of learning.
- Generate datasets using another algorithm than the greedy (eg, a variant that surely beats the greedy).


In [None]:
## CELL TO BE COMPLETED ##
# You can use the simulations.run_several_games function to test the performances of your trained model vs a greedy or a random player

import sys     # These lines correct a bug occuring in Notebooks.
sys.argv=['']  # It's not perfect, but it works.

import os
lab_commons_path = os.path.join(os.getcwd(), "..", "..")
if lab_commons_path not in sys.path:
    sys.path.append(lab_commons_path)

import lab_commons.make_2_player_matches as simulations
import lab_commons.AI.greedy as greedy_player
import lab_commons.AI.random as random_player
import supervised_pyrat_player

program_1 = supervised_pyrat_player 
program_2 = greedy_player
program_3 = random_player



