# AI-Based Evaluation of NFL Pass Defenders
 ***NOTE: To comply with competition rules, this report was condensed to 2,000 words and 7 visualizations. Some pertinent information regarding additional technical details, discussions, and player evaluations were cut from this condensed version. See our [full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full) for the additional information. Also, see our [GitHub repo](https://github.com/chipkajb/nfl_big_data_bowl_2021) to access the code used in our analysis.***

In [None]:
import os, sys
sys.path.append('../usr/lib/visualization')
from visualization import *

# 1. Summary

This report presents a novel approach to evaluating NFL pass defenders by leveraging AI-based techniques. Current evaluation techniques fail to take many aspects of pass defense into account. Our evaluation methodology seeks to resolve this difficulty by taking all relevant skills into consideration, including tracking, pass breakup, true coverage, shutdown, takeaway, and playmaking abilities.

We design and train a deep neural network using player tracking data to estimate pass completion probability. We develop three new advanced metrics using our AI model to more accurately evaluate a player's pass coverage ability. Using these three metrics, as well as three other existing metrics, we evaluate all pass defenders in the 2018 NFL regular season. Finally, we discuss possible applications of our work.

# 2. Motivation

The game of football presents a unique difficulty to accurately quantify a player's contribution to their team's overall success. For instance, the play shown below demonstrates the elite quickness and agility of Darius Slay. However, on the score sheet, this play is simply marked as an incompletion. The difficulty of the play, or the raw ability required to make the play, is never quantified.

<img src="https://raw.githubusercontent.com/chipkajb/miscellaneous/master/nfl_big_data_bowl_2021/slay_video.gif">

In this report, we address the challenge of evaluating a pass defender's ability by first quantifying the difficulty of defending a given pass, and then quantifying the player's performance given the specific play's conditions. In the following sections, we go into the details of our methodology.

# 3. AI Model

Our analysis centers around the use of an AI model to estimate pass completion probability. This model is a deep neural network that we design, train, calibrate, and validate from scratch.

The first step to developing this model is to understand the data that is available to us. We use player tracking data from the 2018 NFL regular season to perform our analysis. Below is an animation showing the player tracking data for the previous play involving Darius Slay.

<img src="https://raw.githubusercontent.com/chipkajb/miscellaneous/master/nfl_big_data_bowl_2021/slay_animation.gif">

From this animation, we can see several factors that contribute to the pass result. Using this insight, we decide to use the following parameters to develop our AI model.

1. Pass distance down field (longitudinal)
2. Pass distance across field (lateral)
3. Proximity of target receiver to nearest defender when ball arrives
4. Proximity of ball to target receiver when ball arrives
5. Proximity of passer to nearest blitzer when ball is thrown
6. Proximity of target receiver to sideline when ball arrives
7. Passer's running speed when ball is thrown
8. Time it takes from when ball is snapped to when ball is thrown

For additional comments on these input parameters, see [section 7.1 of the full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full#7.1-Comments-on-input-parameters-to-AI-model).

### 3.1 Data collection

In order to train the AI model, we first sifted through the player tracking data to generate a new dataset which includes the eight parameters listed above, as well as other metadata such as play result and route. The dataset comprises of nearly 18,000 passes and was split into a training set and validation set. For additional details regarding the training process, see [section 7.2 of the full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full#7.2-Training-the-AI-model).

### 3.2 Calibration

A common issue in deep learning is that neural networks tend to be over-confident in their estimation of probabilities. Therefore, in order for our AI model to more accurately reflect realistic probabilities, we calibrated our model using [temperature scaling](https://arxiv.org/pdf/1706.04599.pdf). This calibration process involved tuning our AI model to most accurately reflect the actual pass completion percentages seen in the validation dataset.

### 3.3 Validation

The final stage of development for the AI model is to evaluate its performance. Our model correctly predicts the outcome of 80% of all passes in the dataset. More importantly, however, our model accurately predicts the pass completion probability, as shown in the figure below. This figure is generated by using our model to predict the completion probability for all passes in the dataset. These predictions are then grouped by predicted pass completion probability (e.g. 0-10%, 10-20%, etc.). Then, for each group, the actual completion percentage for those passes is calculated based on the plays' results. Finally, the actual completion percentage is plotted against the model's predicted completion probability. A model that matches reality will form a diagonal line. As shown in the figure below, our AI model predicts the actual pass completion probability with a high degree of accuracy.

<img src="https://github.com/chipkajb/miscellaneous/raw/master/nfl_big_data_bowl_2021/calibration_graph.png">

### 3.4 Testing
To better understand our model, we examine the amount that each input parameter contributes to the model's output. To do this, we find the average value ($\mu$) and standard deviation ($\sigma$) of all eight input parameters. Then, while holding all other parameters constant at their average value, we vary one parameter by one standard deviation. We do this for all input parameters and compare the resulting completion probabilities to determine the amount each parameter contributes to the model's output. The results of this exercise are summarized below.

1. Longitudinal pass distance: $\mu=14.6$, $\sigma=10.4$, contribution: $6.1\%$
2. Lateral pass distance: $\mu=11.1$, $\sigma=7.1$, contribution: $27.6\%$
3. WR-DB separation: $\mu=3.8$, $\sigma=3.1$, contribution: $32.3\%$
4. WR-ball proximity: $\mu=1.4$, $\sigma=1.3$, contribution: $3.7\%$
5. QB-blitzer proximity: $\mu=6.4$, $\sigma=3.3$, contribution: $1.0\%$
6. WR-sideline proximity: $\mu=13.8$, $\sigma=7.5$, contribution: $21.6\%$
7. QB speed: $\mu=1.8$, $\sigma=1.7$, contribution: $2.8\%$
8. Time to throw: $\mu=2.8$, $\sigma=1.0$, contribution: $5.0\%$

From these results, we see that the greatest factor in determining the result of a pass is the WR-DB separation, therefore, emphasizing the importance of a defender's tracking ability and reinforcing the significance of our analysis. For additional remarks regarding this exercise, see [section 7.3 of the full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full#7.3-Remarks-on-AI-model-testing-exercise).

# 4. Key Metrics

Our analysis uses six key metrics to evaluate a player's pass coverage ability. Each of these metrics measures a specific aspect of pass defense. When considered in combination, these metrics give a more detailed view of a player's true pass defense ability. The first three metrics are existing measures commonly used for player evaluation. The last three metrics are new, innovative measures that we pioneer using our AI model.

### 4.1 Definitions

1. Incompletion Rate (INC Rate) is the percentage of passes that result in an incompletion when targeting a given defender. This metric measures a player's "shutdown" ability.

2. Interception Rate (INT Rate) is the percentage of passes that result in an interception when targeting a given defender. This metric measures a player's "takeaway" ability.

3. Expected Points Added (EPA) is the expected points (EP) that a given play contributes. In other words, it is a way to measure the value of an individual play in terms of points. Since our analysis focuses on pass defense, we award the play's EPA to the targeted pass defender. In our analysis, we use this metric to measure a defender's "playmaking" ability.

4. Expected Incompletion Rate (EIR) is the percentage of passes that our AI model predicts will result in an incompletion when targeting a given defender. Since the only input parameter of our AI model that the pass defender can control is the WR-DB separation, this metric measures the player's "tracking" ability.

5. Incompletion Rate Above Expectation (IRAE) is the difference between the actual incompletion rate (INC Rate) and the expected incompletion rate (EIR). Since EIR is produced by our validated AI model and takes into consideration the conditions of each individual pass, we can expect it to be an accurate estimate of the actual incompletion rate. Although discrepancies between INC Rate and EIR may be attributed to things such as inaccuracies in the AI model and unforeseen events like drops or spectacular catches, this likely will not occur consistently, as our AI model has been shown to accurately predict pass completion probability overall. Therefore, consistent discrepancies between INC Rate and EIR for a given defender indicates that the main contributing factor must be the way the defender plays once the ball reaches the receiver. Therefore, this metric is used to measure a player's "pass breakup" ability.

6. Incompletion Probability Added (IPA) is the incompletion probability that a defender contributes on a given play above the "average defender" according to our AI model. This value is computed by first using the AI model to calculate the completion probability for a given pass, then the WR-DB separation value is replaced with the median WR-DB separation value for all plays in the dataset with the same route, and a new completion probability is computed. The difference between these two values is the defender's IPA for that play. In other words, it is the additional probability of an incompletion that a defender contributes above a defender with median tracking performance, while all other aspects of the play are held constant. This metric is inspired by the Wins Above Replacement (WAR) metric in baseball, which attempts to measure how many more wins a player is worth above a replacement-level player. It is also inspired by True Shooting Percentage (TS%) in basketball, which attempts to measure a player's shooting efficiency considering not all shots are of equal difficulty, and therefore should not be weighted equally. Similarly, IPA captures the fact that not all passes are equally difficult to defend, but rather, the specific conditions of the pass ought to be considered. Therefore, IPA is used to measure a player's "true coverage" ability.

### 4.2 Example

To better understand these new metrics, let's look at an example. Shown below is a sample from our dataset for the previous play involving Darius Slay. On this play, there was just $0.32$ yards of separation between the receiver and Darius Slay.

In [None]:
slay_play_data = get_play_data(week=14, gameId=2018120911, playId=1584)
print(slay_play_data)

When this data is fed into our model, we see that the predicted pass incompletion probability is $84.4\%$.

In [None]:
comp_prob, incomp_prob = run_model(slay_play_data)
print_probabilities(comp_prob, incomp_prob)

However, when we use the data from the same play but replace Darius Slay's WR-DB separation with the median value for all HITCH routes ran in 2018, we see that the predicted pass incompletion probability drops to $30.9\%$. Therefore, Darius Slay's Incompletion Probability Added (IPA) for that specific play is $84.4\% - 30.9\% = 53.5\%$. In other words, that specific pass is $53.5\%$ more likely to fall incomplete, compared to the median NFL defender, because of Darius Slay's performance.

In [None]:
slay_play_data["WR-DB proximity"] = 2.40 # median WR-DB separation for HITCH routes
comp_prob, incomp_prob = run_model(slay_play_data)
print_probabilities(comp_prob, incomp_prob)

# 5. Player Evaluation

Using our AI model to predict pass completion probabilities, we gain a more detailed view of a player's pass defense ability. The following three figures explore how specific players measure with respect to our metrics. In each figure, we highlight the players that excel in each aspect of pass defense.

In [None]:
scores_df = pd.read_csv('../input/nfl-bdb-data/cb_scores.csv')
plot_playmaking_skills(scores_df)

In [None]:
plot_coverage_skills(scores_df)

In [None]:
plot_ball_skills(scores_df)

In the table below, we rank the top 30 Cornerbacks who were targeted at least 45 times throughout the 2018 season. The table shows each players' rating in each of the six key categories (with their rank for each category in parenthesis). The Raw Score is the sum of the player's rankings for the six metrics. Finally, the Overall Score is a normalized version of the Raw Score that is then put on a scale from 0-100. Players who were selected to the Pro Bowl that year are marked with an asterisk.

In [None]:
rankings = get_final_cb_rankings_table(n_players=30)
rankings

# 6. Application

Our analysis provides a detailed look at a player's ability to defend the pass by leveraging AI. Although we focused this report on evaluating Cornerbacks, our analysis can be applied to any player in pass coverage (see [sections 7.4 and 7.5 of the full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full#7.4-Top-coverage-Safeties) for Safety and Linebacker coverage rankings). This is especially valuable considering the lack of useful evaluation techniques for Linebackers in pass coverage. Overall, our analysis is valuable for many reasons. The most obvious application is for player evaluation. Using our methodology, we get a detailed and accurate measure of a player's pass coverage performance, which can be used as a tool for grading players and selecting All-Pro teams. Additionally, our analysis could be used by Front Office members when it comes to making decisions regarding contract negotiations, free agency, and trades. It is also useful for player development, as our analysis shows the specific aspects of pass defense in which a player requires the most improvement. Finally, our analysis could also be used to bolster coaching. If a coach understands the specific aspects in which a player excels or underperforms in pass coverage, then that can inform the coach's decisions regarding practice drills, player matchups, game situations, and even play-calling. For a discussion on the limitations of our analysis, see [section 7.6 of the full report](https://www.kaggle.com/chipkajb/ai-based-evaluation-of-nfl-pass-defenders-full#7.6-Limitations-of-analysis).