# 1. Summary
This report represents my analysis for the NFL Big Data Bowl 2021 Competition. My work was mainly inspired by the brilliant and highly successful work of the [OpenAI Five](https://en.wikipedia.org/wiki/OpenAI_Five) project in which machine learning was used in teaching the team of the game bots for Dota 2 video game. Finally this bots were able to defeat professional esport teams.
I was very interested in trying this approach for the real game and am very grateful for provided opportunity.

The results are as follows:
* It is possible to use Reinforcement Learning (RL) approach in analysis of football plays.
* The one of the core concepts of RL - *state-value function* - modelled by the neural network do can predict outcome of the play.
* The accuracy of the prediction is better than random guessing even when using rather simple neural networks.
* Even without finishing RL task - teaching intelligent agent - having the model that can predict outcome of the play can be useful.
* At the end I present couple of results that have no relation to RL: how tracks clusterization might look and some revealed correlations.

My work has rather qualitative results than quantative. It is more about the instrument that can assist in getting answers rather than about direct getting answers.

One might ask: Why reinforcement learning? Why so difficult?
Here are my reasonings:
* *There is no best defense for all situations.* The best defense is always relative to offense actions. Simple mental experiment as example. What would be if defense choose some would-be-best symmetric formation while offense choose asymetric formation with the most players on one side? Well, from my point of view most probably offence would break through the flank and get touchdown.
* *Football data are very complex.* How many combinations of players on the field offensive team can use? How many formations and locations they can use? What is down? How much time left? And so on and so forth. Number of all possible variants is astronomical. On the other hand capacity of human [working memory](https://en.wikipedia.org/wiki/Working_memory) is limited. So the help of AI could be of much use.
* *Reinforcement learning is very powerful technique.* It seems to me it can be of use in many situations (in increasing order of time scale, amount of compute and pretentiousness): in player training to make good decisions on the field, in preparing defense for a given play, in choosing players for a game.

# 2. Reinforcement Learning (RL)
The ultimate goal of the RL is to teach an ***agent*** to take ***actions*** in an ***environment*** to maximize some ***reward***. One way of doing this is to use the ***state-value function*** that describes value of the state in terms of the expected reward. Once getting this state-value function it is possible to use *greedy-search* (over all possible actions) to choose the best action to perform transition to the next state with the maximum expected reward.

In low-dimensional discrete tasks the tables can be used for storing the state-value function. For high-dimensional and continuous tasks like in football some approximation should be used. The neural networks is a good choice for this.

# 3. Reward for training RL model
The goal of the game is to win and do not loose. However in this competition we are concentrating on plays. We want to use such reward per play that lead to the goal.

The most obvious choice for reward from given play data is ***offencePlayResult***. The other candidate is ***epa*** ("expected points added on the play, relative to the offensive team"). Actually these variables have significant correlation and experiments showed that both variables can be used as reward with similar results.

Both ***offencePlayResult*** and ***epa*** relate to offensive team. But that should not be confusing because the goal of defense team is to prevent offence team to win. And maximization problem simply becomes minimization problem. From mathematical point of view there is no difference.

# 4. Model architecture
The diagram below shows the model and input variables used for making predictions. Some remarks:
* This is simple model. As a consequence it is not of high accuracy of prediction. It was done intentionally for being able to make many fast experiments in a limited time for getting qualitative result of approach.
* The model makes prediction for every moment of the play on the basis of the data at this given moment.
* Model has no specific order of players. Every defense player data can be at every appropriate input of the model. The same for offence. As a result we should require prediction invariance for such permutation of players.
* Before training all variables were normalized to be in [0; 1] range (except of sin, cos and outputs which were normalized to be in [-1; 1]).
* Neural network contains no recurrent connections or convolution layers.

<p align="center">
    <img src="http://artemefimov.ru/nfl/network_diagram.png" width="700">
</p>

# 5. Data used for making predictions
 
Some remarks about information ***used*** in the model :
* *Information about only 7 defense players and 6 offence players.* These numbers are most frequent in given tracking data. And such model can still be used for plays with more tracked players. You should simply choose arbitrarily 7 defense and 6 offence players. Or more complex calculate average value of predictions from all possible combinations of 7+6 players from given tracked players. It is easier than model with more players that should be trained with partially missed data.
* *Speed characteristics of players "s75" and "s99".* The idea was that the first one represents speed of the run and the second one represents ability of afterburner. Actually these are per-player speed statistics from all given tracking data - 0.75-quantile and 0.99-quantile of the speed. It turned out they are very important for making predictions.

What information ***is not used*** in the model:
* *Football rules in any explicit forms.*
* *Information about down.* Actually this information is of very importance. But experiments showed that including down information turns out to very vast overfitting (without regularizing) in relation to weights of connections from corresponding inputs. Instead of that it is more practical to train 3 different models for the down 1, 2 and 3 (the 4th down turned out to be very different).
* *Players' positions.* Such decision seems to have more benefits than drawbacks. The pros are: no difficulties about coding positions, less-dimensional input, simpler model, faster training, less tendency to overfitting. The con is: rejecting some information that could be useful.
* *Possible players actions except of moving.* Model do not know what player will do when he will close to player of opposite team. This is up to player discretion.


# 6. Training of the model
### 6.1 Training approach ###
The training can be described as stochastic batch gradient descent. The trick is that batch data consist of all data choosed for training, but this data should be shuffled before each iteration by means of players permutation. This trick provides us with 7! * 6! &asymp; 3.6 million more data for training. And such shuffling guarantee irrelevance of the trained model to players order.

### 6.2 Data splitting ###
All available data was splitted in 3 parts:
* *Test data* - data from games of weeks 14-17. This data was used only after finishing of training to confirm that trained model actually *learned something about the game* and has prediction accuracy better than random guessing.
* *Validation data* - data from games of weeks 3, 6, 11 (were randomly choosed). This data was used to monitor the training, to early stop and prevent overfitting.
* *Training data* - data from games of the rest of weeks. This data was actually used for gradient calculation and updating network weights.

### 6.3 Training history ###
Typical plot of training history is shown in figure below. There is noticeable difference between train and validation errors because of that train and validation data sets are from completely different games. Nevertheless they both are decreasing during the training without signs of overfitting that testify about generalization ability of the model.
<img src="http://artemefimov.ru/nfl/training_curve.png">

# 7. Analysis of trained model
After getting the trained model of *state-value function* we obviously want to check it for correctness. The traditional RL approach is to realize *policy* of action selection for *agent*, to put the agent in *environment* and to observe the result is it become better or not. Obviously this is not the option in this work.

The other option is to analyze what trained model can say us and to check is it consistent with common sense. Bearing in mind that this given model is simple, has limitations and is not of high accuracy.

### 7.1 Prediction error depending on time from start of play ###
This dependence is shown in figure below. Maximum of uncertainty corresponds to the moment around 40% of total play time. It seems to be average moment before quarterback make pass. After that uncertainty decreases as the situation becomes clearer. The increasing of uncertainty after around 80% of total play time seems to be due to possible inaccuracy of the pass since it is unknown is it complete pass or not until ball reach pass receiver. So in the whole this figure looks like the truth.
<img src="http://artemefimov.ru/nfl/prediction_error_per_play_time.png">

### 7.2 Prediction during play ###
It is of interest to analyze how prediction changes during the play. Obviously this can be done only for specific play.

The video below shows progress of prediction during the play between LA Rams and NO Saints that ended with touchdown.
<p alig="left">
<video src="http://artemefimov.ru/nfl/3460out.mp4" width="859" hegith="508" autoplay controls preload />
</p>

What is very interesting to observe:
* The prediction begins to increase &asymp;0.5 sec ***before*** the pass was actually thrown.
* At the moment when pass is received prediction is already 9.01 yards while pass receiver is actually only 5 yards from line of scrimmage.
* After receiving the pass the prediction continues to fast increase until the very end of play.

Barplots on the right side of the video illustrates additional opportunities provided by using neural networks as models. Namely the possibility of determining influence of every input variables on prediction. Actually these influences are defined by the partial derivatives of model output with respect to model inputs. They seems of no much meaning in this particular play. Just illustration of possible visualization.

# 8. Words of warning
Dealing with historical data one should be careful in interpreting the results. No matter what models and approaches are used. Any conclusions should be verified in practice. Experiment is criterion of truth in scientific method.

In the language of statistics it sounds like: ["Correlation does not imply causation"](https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation). Every correlation might have multiple explanations.

In the RL it develops in that state-value function changes when agent changes his policy of taking actions. In the limit, it converges to some optimal state-value function. Until then state-value function has limited predictive capabilities.

# 9. Bonus. Tracks clusterization
At beginning of work I was thinking about the question "What schemes defense employs?". To answer this question I have developed algorithm that clusterize tracks of players.

The video below shows progress of tracks clusterization. Linebackers' tracks are used as an example. Relative to initial ball position at (0;0). Only first 5 seconds of tracks are used. Because very often pass happens  after 5 seconds of play. And it seems there should be big difference between tracks before and after pass.

The conclusion is that even when dealing with linebackers' tracks only the number of clusters remains big. If we will multiply this number by about the same numbers of clusters of other positions we will get unreal number of variants to deal with. Decreasing number of clusters will result in a significant loss of information which is unacceptable.
<p align="left">
    <video src="http://artemefimov.ru/nfl/LB.mp4" width="800" autoplay controls preload />
</p>


# 10. Bonus 2. Play results vs Defense personnel
At the very beginning of work I have checked some correlations and have had interesting findings (even without using tracking data). They are shown in figure below.

<img src="http://artemefimov.ru/nfl/confidence_intervals.png">

# 11. How AI assistance might look like
At the end I would like to share my vision how AI assistance in football might look like.

I was inspired to this sketch by the very first game of 2018 regular season.

<img src="http://artemefimov.ru/nfl/AI_assistance.png">

## p.s.
All media from this submission are available also at [this dataset](https://www.kaggle.com/arteme/media-of-my-submission-for-nfl-big-data-bowl-2021).