# Machine Learning Project to Compare Learning Methods

*by Alex Laswell and Damian Armijo, May 9, 2018*

## Introduction

Initially we had chosen to simply code a game of snake in python that uses the Q Reinforcement Method described in class and the final programming assignment. However, in researching reinforcement learning methods and the Q method in particular, it became apparent that this horse has been beat to death.

Instead, we chose to take some of the better models that we found and preform an analysis of which method preformed the best out of the following three:

#### An Artificial Neural Network

Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" (i.e. progressively improve performance on) tasks by considering examples, generally without task-specific programming ... An ANN is based on a collection of connected units or nodes called artificial neurons (a simplified version of biological neurons in an animal brain). Each connection (a simplified version of a synapse) between artificial neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and then signal artificial neurons connected to it.

#### Deep Reinforcement Learning (DQN)

The DeepMind system used a deep convolutional neural network, with layers of tiled convolutional filters to mimic the effects of receptive fields. Reinforcement learning is unstable or divergent when a nonlinear function approximator such as a neural network is used to represent Q. This instability comes from the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy and the data distribution, and the correlations between Q and the target values.

#### State-action-reward-state-action (SARSA)

State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note[1] with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name Sarsa, proposed by Rich Sutton, was only mentioned as a footnote.

This name simply reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S1", the action the agent chooses "A1", the reward "R" the agent gets for choosing this action, the state "S2" that the agent enters after taking that action, and finally the next action "A2" the agent choose in its new state. The acronym for the quintuple (st, at, rt, st+1, at+1) is SARSA.[2]

## Methods

The following code repositories / concepts were utilized for testing and analysis. 

* [Neural Network by Slava Korolev](https://towardsdatascience.com/today-im-going-to-talk-about-a-small-practical-example-of-using-neural-networks-training-one-to-6b2cbd6efdb3)
* [DQN by Yuriy Guts](https://github.com/YuriyGuts/snake-ai-reinforcement)
* [SARSA by Pranesh Srinivasan](http://spranesh.github.io/rl-snake/)

We both worked on this together every step of the way; researching which methods and code repositories we wanted to evalutate, running the code in the testing environment, compiling the data, and writting up our findings. We meet at least once each week, sometimes twice, and really just treated this as a team project. 

## Results

Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" (i.e. progressively improve performance on) tasks by considering examples, generally without task-specific programming.

In our testing, we found that SARSA peformed better than either of the other methods. However it does take a lot more time to  train than either of the other methods.  

Additionally, DQN does better than the simple nerual network, which was consistant with our expectations. However, we both were pretty confident that this was going to be the best method, and our findings proved otherwise. When the training time was shortened, both Q-Learning and SARSA preformed about the same, with Q-Learning even out preforming the SARSA sometimes, but this is only if the training time is very short (like 20-25 minutes), anything after that and SARSA takes over. 

All of the algorithms did show learning even with just a handfull of iterations, 2-3  minutes of training or about one hundred games. Which again was consistent with our expectations. 

## Conclusions

What I learned.  What was difficult.  Changes I had to make to timeline.

### References

* [Neural Network by Slava Korolev](https://towardsdatascience.com/today-im-going-to-talk-about-a-small-practical-example-of-using-neural-networks-training-one-to-6b2cbd6efdb3)
* [DQN by Yuriy Guts](https://github.com/YuriyGuts/snake-ai-reinforcement)
* [SARSA by Pranesh Srinivasan](http://spranesh.github.io/rl-snake/)
* [Wikipedia](https://en.wikipedia.org/wiki/)

In [2]:
import io
from nbformat import current
import glob
nbfile = glob.glob('Project Report Example.ipynb')
if len(nbfile) > 1:
    print('More than one ipynb file. Using the first one.  nbfile=', nbfile)
with io.open(nbfile[0], 'r', encoding='utf-8') as f:
    nb = current.read(f, 'json')
word_count = 0
for cell in nb.worksheets[0].cells:
    if cell.cell_type == "markdown":
        word_count += len(cell['source'].replace('#', '').lstrip().split(' '))
print('Word count for file', nbfile[0], 'is', word_count)

Word count for file Project Report Example.ipynb is 401
