## Lab (Midterm) Report - Predicting Lane Matchups in League of Legends

## Due 23 October 2016 (Sunday, 11:59pm)


### Abstract:

The purpose of this project is to create a usable application for League of Legends that obtains two opposing players' data from previous games and uses this information to predict who would defeat the other. In the beginning of the semester, I had a different project, but after looking through the available data provided by Riot (developer of LoL), it was deemed impossible to accurately obtain the pieces I needed. My objectives afterwards were then to understand and implement time series regression and then tie it into building an RNN (LSTM specifically). I was successfully able to apply a time series regression to 5 functions using an MLP and LSTM using TensorFlow and Keras. I noticed that it took significantly less epochs using the LSTM than the MLP, which will provide a good basis for implementing it into League of Legends. Unfortunately, in trying a noisy sine, the models were able to capture the periodic behavior, but it also included noisy predictions, even when applying regularization. Thus, it is essential I fix this so that I can accurately work on the LoL data; once done, I can finally implement the models I've built for my personal project.

### Introduction:

With the release of the Riot API, many 3rd party applications have been created to cater to League of Legends players that provide enemy statistics, histories, and pregame setups [6]. It has been my goal to create my own software to provide me with statistics that matter to me: e.g. will I be able to defeat the player I am up against? My project should do just that; using past data provided by Riot, I want to be able to create a predictor that gives me insight as to how the match will play out. Because players tend to learn as they play more and more games, it seems natural to implement a model that works for sequences of times.

When we do learn over time, we use our past experiences to determine a new decision in the present; analogously, if we have data points at past time indeces, we could use those to predict the very next data point, if given evidence that they exhibit a time series pattern. One such example would be in the world of finance; patterns that repeat over certain periods of time occur more often than not, and being able to predict those with neural networks can be very useful [7]. Thus, to begin in applying it to my own problem, I have to understand the concepts of time series regression using a standard multilayer perceptron on simple periodic functions.

After doing so, then I can enter the realm of recurrent neural networks. What makes a recurrent neural network different from an MLP is that the hidden layer contains loops. [2] I.e., we assume that each input at a certain time produces an output in a hidden neuron that increases the predictive ability of the network in future time indeces:


<img src="./Pictures/rnn.jpg">

This concept encapsulates the "memory" aspect of time sequence problems. Because the structure diverges from an MLP, it was necessary to create a new backpropagation algorithm for RNN's: Backpropagation through time (BPTT). However, due to the nature of backpropagation, if we include a large time step window, the "memory" from an early time index will have little effect on the error (due to many operations of multiplication), thus training would take much too long to optimize results [9]. 

This is where Long Short-term memory (LSTM) networks come into play: instead of having loops inside a hidden layer, the structure of neurons in a hidden layer is fundamentally changed. The neurons are replaced with LSTM cells, that contain 3 components: an input gate, output gate, and forget gate. This "forget" gate, powered by a sigmoid layer determines whether or not certain pieces of information should be kept for future time steps. In doing so, even through large time steps, data that has been deemed important will not be replaced by lesser information during the process [8]. The following figure gives a visual of the structure:

<img src="./Pictures/lstm_structure.png">

### Methodology

**Time Series Regression**:

I separated this into 2 models: MLP and LSTM. Before working on the LSTM, I had to have a functional MLP for simple periodic functions:

* $sin(x)$
* $sin(x) + cos(x)$
* $sin(x)$ [with noise]
* $|x|*sin(x)$
* $x + sin(x)$

For each function, I ran through the following parameters:

* 1 Hidden Layer
    * 4 Hidden Neurons
    * ReLU Activation 
* $<=$3 Input Neuron(s)
* 1 Output Neuron
* .001 or .0001 Learning Rate (.0001 for the last 2 to remove NaN results in TF)
* Sum Square Difference Cost
* 1000 Epochs


To do this, I utilized the libarary TensorFlow, which makes implement deep learning models easy in Python [1].

When I was working through each individual function, I assumed that the time window of these sequences were related to the length of the period; thus I naiively implemented networks that needed many input neurons that were not needed. After working with LSTM's, I noticed that it would work for only 1 input, thus I tried smaller time windows for the MLP models, and found that we needed at most 3 neurons for a good result. In addition, I applied way too many iterations, when the changes in error were very miniscule at the end. Thus, I limited them from an original 10,000 iterations to only 1000, and as seen later, the performance is not hurt. 

Also, for the sin(x) with noise, I added an L2 regularization term in an attempt to have a non-noisy prediction.

Of course, it is important to use separate training and testing data, so for this, I trained each function the range of $[-2\pi,2\pi]$ and tested on [$2\pi$,$4\pi$].

Afterwards, I worked on implementing an LSTM:

Using the following site and Keras, I was able to transfer my functions into an LSTM model, which had the following parameters [3], [4]:

* Time step: 1
* 1 LSTM Layer
* 1 Output Neuron
* Mean Squared Error Cost
* 50 Epochs

Most of the code was in the original source file provided in the site, however, because they were using a different dataset, I had to change that to work for the periodic functions. In addition, I played around with changing the number of epochs for each function, and for the most part, each needed only 50 epochs until miniscule changes between increasing epoch size emerged. (Shown in next section)

Similarly, I trained in the range of $[-2\pi,2\pi]$ and tested on $[2\pi, 4\pi]$.

### Results and Discussion

**MLP**

The following graphs were generated using an MLP network with the specified parameters from the previous section, with time frames between 1 and 3:

<img src="./Pictures/mlp_sin.png">

<img src="./Pictures/mlp_sin+cos.png">

<img src="./Pictures/mlp_noisy_sin.png">

<img src="./Pictures/mlp_abssin.png">

<img src="mlp_x_sin.png">


Between all of these figures, you can notice that small time frames were enough to produce results that mimic the given periodic functions, with 1000 epochs. This is significant, in that the complexity of the MLP does not have to increase to many input neurons for the time series regression to work. One of the biggest issues, still, is with the noisy_sin function; I added L2 regularization to this figure, and it still looks like the predicted results were noisy as well. To further improve my models, I should be able to create a method to remove the noisy data (don't let it overfit). This is essential, because no data in real life is perfect, and if I want to apply a similar model to my project, it should be able to account for outliers of the data. However, overall, by increasing the complexity of the sinusoid, the MLPs were still able to fit the functions. 

**LSTM**

Now, the following were generated using the LSTM networks, where each figure contains the graphs made with varying amount of epochs per function:

<img src="./Pictures/lstm_sin.png">

<img src="./Pictures/lstm_sin+cos.png">

<img src="./Pictures/lstm_noisy_sin.png">

<img src="./Pictures/lstm_abssin.png">

<img src="./Pictures/lstm_x_sin.png">

In comparison to the MLP models, the LSTM networks were able to capture the correct points for each function with significantly less epochs (more than 10x less!) As seen, for the functions $sin(x)$, $sin(x)+cos(x)$ and $noisy_sin(x)$, there is very little difference between the graphs generated by 50 epochs and 100 epochs; therefore to save computations, 50 epochs would've been enough to generate almost similar results. However, for the last 2 functions, applying only 50 epochs would not have been enough to reach an accurate result: for $|x|*sin(x)$, it is evident that applying the 100 epochs resulted in significantly better results than applying only 50 epochs. For $x + sin(x)$, though, I noticed a mixture of both behaviors: there was very little difference between 50 and 100 epochs of the LSTM, but neither were not accurate enough. In response, I let it run for 1000 epochs to see if that would improve results; unfortunately, the results were not enough, and the difference was even more miniscule. Thus, I hypothesize that the training data was not big enough to capture the increasing nature of the function (this is based on the fact that the accuracy of the prediction decreases at the end of the results). By modifying this parameter, I get:

<img src="./Pictures/lstm_x_sin_new.png">

Which is significantly better! It only needs around 100 epochs now, in comparison to its inaccuracy with 1000.


For the noisy sinusoid, I still couldn't get it to not overfit, even when I added the regularization parameters in Keras. As stated, this is important to fitting to real data, therefore I need to have this fixed if I plan on using the LSTM structure in my project.

### Conclusions and Future Work ###

This first half of the semester allowed me to understand and apply the new ANN structures that are closely related to time predictions, an aspect that can be a large indicator for my League of Legends project. For time series regression, I have noted that a network with LSTMs can fit to periodic data much more quickly (about 10x) than standard MLPs with similar time windows.  One part of these trials that I could not go through, however, were the noisy data - I may not be programming it right in my code, because even the built in regularizers could not prevent the noisy predictions. I would like to solve that ASAP as I know my LoL data will come with a lot of noise. By working on this foundation of knowledge and experience with simple functions, I can now branch out and try more real life data, and at the same time prepare my data collection for the project. Despite all the theory work, I know that in the end, this has made my work much easier, since I now just have to fix my data and apply it to my current network architectures.

The goals I now have in mind for improvements in the second half are:
* Discuss how to fix the noise issue.
* Collect Riot API data (I already have a schematic to gather what I need).
* Fix the data to fit the LSTM network.
* Determine if other models were more suitable.
    * This is important, because my original project is NOT regression, but rather classification.
    * I also found another project that compared their League of Legends predictions (team win %) on several other models [5].
* See if a combination of models could prove to have the best result.

### References ###

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,
Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,
Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia,
Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster,
Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker,
Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas,
Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke,
Yuan Yu, and Xiaoqiang Zheng.
TensorFlow: Large-scale machine learning on heterogeneous systems,
2015. Software available from tensorflow.org.

[2] Britz, Denny. "Recurrent Neural Networks Tutorial, Part 1 – Introduction ..." WildML. Google Brain http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

[3] Brownlee, Jason. "Time Series Prediction with LSTM Recurrent Neural Networks ..." Machine Learning Mastery. http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

[4] Chollet, Francois, "Keras." Github. 2015. 
https://github.com/fchollet/keras

[5] Huang, Thomas, David Kim, and Gregory Leung. "League of Legends Win Predictor."
http://thomasythuang.github.io/League-Predictor/

[6] Kica, Artian. Analysis of Data Gathered from League of Legends and the Riot Games API. Diss. Worcester Polytechnic Institute, 2016.

[7] Kaastra, Iebeling, and Milton Boyd. "Designing a neural network for forecasting financial and economic time series." Neurocomputing 10.3 (1996): 215-236.

[8] Olah, Christopher. "Understanding LSTM Networks — Colah’s Blog." http://colah.github.io/posts/2015-08-Understanding-LSTMs/

[9] Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." ICML (3) 28 (2013): 1310-1318.
