# Neural Network Q Learning, a Tic Tac Toe player that learns - kind of

In the previous part we implemented a player that uses a table to learn the Q function. This worked quite well. In particular because Tic Tac Toe only has very few states and each state only has very few possible moves. For a more complicated game, such as go or chess, the tabular aproach would not be feasible. 

What, if we could write a program that mimics the behaviour of the Q function without actually having to store the exact value for every state and action? Obviously, after a bit of practice, humans are able to spot certain patterns in the game position and general rules of how to react to them.

Exactly this is the idea when using a Neural Network. Neural Networks are usually doing one of two things:

1) Classify the input: E.g. if the input is a picture, the output could be what kind of animal is in the picture.
2) Mimic a complex function (also called regression): Given an input value for the complex function, correctly predict what the output would be. 

We will use the second one and teach a Neural Network to mimic the Q function. Ideally using significantly less space than the tabular Q function does.

## Preparations

In order to execute the code in this notebook you will have to install Tensorflow. At the time of writing this, the insturction to do so can be found [here](https://www.tensorflow.org/install/). If the link should no longer work, a quick Google should be able to point you in the right direction.

When installing Tensorflow you will have to decide between two options: Install with, or without GPU support. If you do not have a modern GPU, this choice is simple: Install without GPU support. If you do have a modern GPU, you can try top install the GPU version, but this is much more complex and difficult. Only do so if you are comfortable with complex installations and able to deal with the fallout if you get stuck at some stage and have to roll back what you have done so far. Should you succeed however, the code in this notebook will run noticably fast than without GPU. 

## A short introduction to Artificial Neural Networks

Artificial Neural Networks are made up of *Nodes*. A *Node* takes 1 or more inputs. It combines the inputs linerally, i.e. it multiplies each input $i_x$ with a dedicated weight $w_x$ for that input and adds them all up. If then applies an activation function $f_a$ on the result and sends it to one or more ouputs $O$:

![Title](./Images/NN_Node.png)

Nodes that have no connections coming in are called *Input Nodes* and nodes having no connections coming out are called $Ouput Nodes$. Nodes are arranged in layers, with the first layer consisting of Input Nodes, the last layer consisting of Output Nodes. The other layers are also often referred to as *Hidden Layers* as the user will only ever interact with the Input and Output layers.

A simple network with one hidden layer may look like this (source [Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/300px-Colored_neural_network.svg.png)):

![Title](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/300px-Colored_neural_network.svg.png)

By setting the weights in the nodes *just right* the neural network can mimick other complex functions with the same number of input and output values. 

The special things about an artifical neural network is, that it can be trained to learn those weights. By giving the Neural Network feedback on how good its output was, it can adjust its weights in a process called *Gradient Descent*. For networks with more than one hidden layer it also uses a process called *Backpropagation* to apply the gradient layers that are further away from the output layer. 

For a more detailed introduction to Artificial Neural Networks, see [Wikipedia](https://en.wikipedia.org/wiki/Artificial_neural_network), or any number of other sources a quick Google will yield.

## A short introduction to TensorFlow

[TensorFlow](https://www.tensorflow.org/) is an Open Source Machine Learning framework from Google. It allows us to specify, train, and run Artificial Neural Networks at a very high level in Python. I will not give a detailed introduction in how it works, or how to use it here. I will comment the code that we use as I go, but if you get stuck and it all just doesn't make any sense, please read some of the introduction resources at [TensorFlow Get Started](https://www.tensorflow.org/get_started/) or similiar TensorFlow tutorials and the come back and give it another go.

## A Neural Network to play Tic Tac Toe

To teach a Neural Network to play Tic Tac Toe, we need to define the following things:

* The Topology of the network, i.e. how do the input and output layers look like. How many hidden layers and how big?
* A *loss function*. The loss function will take the output of the Neural Network and return a value indicating how good that output was.
* A training part which will try to adjust the weights in the Neural Network as to minimize the loss function.

### The basic Tic Tac Toe Q learning Graph

We will experiment with some different graphs, but the basic shape will always be as follows:

* An input layer which takes a game state, i.e. the current board, as input. 
* One or more hidden layer.
* An output layer which will ouput the Q value for all possible moves in that game state.
* As loss function we will use [Mean Squared Error](https://www.tensorflow.org/api_docs/python/tf/losses/mean_squared_error) which is a generic and popular loss function for regression, i.e. learning to mimick another function. 
* The input for the loss function will be the output of the Neural Network and our updated estimate of the Q function by applying the discounted reward. I.e. the loss will be the difference between the output of the Neural Network and our estimate of the Q function after applying the discounted reward.
* We will mostly use the [Gradient Descent Optimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) for training - i.e. to adjust the weights in the Neural Network. There are other reasonable, or potentially even better, options as well. Feel free to experiment - report back how it went.

<div class="alert alert-block alert-info">
There are many different other ways how we could do this:<br/>
We could feed an action into the network with the state and have a single output value indicating the values of this action. <br/>
We could also just have single output value encoding the value of the state and use that as a proxy for the State / Action pairs that lead to that state.<br/>
And many more.
</div>

### The input layer

There are many options how we can encode the input to the Neural Network. We could just have a single node and feed the unique hash value of the the board into it. Or we could feed and array into it with each element encoding the value of the piece on it. Conventional wisdom however seems to be that Neural Networks work best with binary arrays as input. Blindly trusting this, our input will be an array of 27 (= 3 * 9) bits with the first 9 bits set to 1 at the positions of the crosses, the next 9 bits set to 1 at the position of the Naughts, and the final 9 bits set to 1 at the empty positions. 

<div class="alert alert-block alert-info">
To be completely open here: I don't think I even tried any of the other options. Feel free to give it a go and let me know how it went. In particular if you find that one of the other options actually works better.
</div>

### The output layer

The output layer will have 9 nodes / output values, one for each position of the board. The value of a node will be the Q value of the corresponding move. 

## Time to look at the code

The first version of our Neural Network Q-Learning Player can be found in the file [SimpleNNQPlayer](./tic_tac_toe/SimpleNNQplayer.p 