<a id="introduction"></a>
# Introduction
In contrast to the run-heavy, smash-mouth style of past eras, passing is the dominant force in modern NFL offenses.  Defenses have lagged behind, with the average points scored per team-game steadily rising from 18.7 in 1992 to 24.8 in 2020, per Pro Football Reference.  So what is the best way to defend the pass?  Defensive coordinators run a variety of defensive schemes; some are zone-heavy, others are man-heavy.  Even among zone and man coverages, which ones are most effective?  **Our analysis indicates that Coverage 4 Zone is best as measured by EPA, especially for 3rd down & long.**       

We begin our analysis by training a convolutional neural network (CNN), which we dub CoverNet, to identify coverages based on "images" of the paths of defensive and offensive players throughout the play.  CoverNet has **71% accuracy** for identifying the specific coverage and **85% accuracy** for identifying man vs. zone coverage.  We use CoverNet to impute the defensive playcall for (nearly) all plays in the dataset.  Finally, we use a decision tree approach to adjust EPA for biases and determine which coverages, if any, have superior average EPA.  

Our analysis yields the following:  
* Cover 4 Zone has 0.058 lower average adjusted EPA compared to all other plays *(significant at the 5% level)*.  
* When conditioning on 3rd & 7+ yardage, Cover 4 Zone has 0.117 lower adjusted EPA. 

We also dive deeper into mechanisms, and find that this is primarily due to Cover 4 Zone's strong performance against deep passes.  Our results highlight the primacy of the deep ball, and the pivotal role of preventing explosive plays. 




# CoverNet Accuracy
We report CoverNet's performance when predicting the specific defensive playcall and classifying man vs. zone coverage.  When extracting man or zone classifications from CoverNet, we label Cover 0/1/2 Man as man coverage, and Cover 2/3/4/6 Zone as zone coverage.  CoverNet has 71% test accuracy when classifying specific coverage playcalls and, despite not being directly trained for the task, 85% accuracy when classifying man vs. zone.
> CoverNet's most common error is misclassifying Cover 1 Man as Cover 3 Zone.  We speculate that Cover 1 Man and Cover 3 Zone may look similar when outside receivers run deep sideline routes.  In Cover 1 Man, outside corners will follow these receivers deep, and both plays have one high safety.  Therefore, both plays often have three deep defensive backs.  Because Cover 3 Zone is more common than Cover 1 Man, the network prefers to shade predictions towards Cover 3 Zone when the play is ambiguous. 
        

## Specific Coverage Confusion Matrix

In [None]:
from IPython.display import Image
Image("../input/convnetfig/conf_playcall.png")

Man vs. Zone Confusion Matrix

In [None]:
from IPython.display import Image
Image("../input/convnetfig/conf_mz.png")

<a id="track_image"></a>
# Tracking Data as Images
CNNs are primarily used to classify image and video data, so classifying football plays is a natural application.  Plays, as represented in football playbooks, diagram the expected paths and locations of players on the field.  Thus, images of the paths of players should be highly diagrammatic as well and should be natural inputs to a CNN classifier.  Another advantage of using a CNN is that we do not need to engineer features, as the convolutional layers will automatically learn informative features to feed into the fully connected layers of the network.     

Before converting the tracking data to images, we first clean and standardize them to improve the model's generalization:
- Keep up to 3 seconds of play post-snap to mitigate effects of QB scrambles and broken plays
- Stop plays after a "pass-ending event" such as a pass thrown, a sack, or a penalty, among others
- Remove plays that are too short (less than 0.5 seconds long)
- Normalize the orientation of all plays with respect to the position of the football before the snap.
        * Center plays on the pre-snap position of the football
        * Flip some plays horizontally so that all plays are going "right"
        * Flip some plays vertically so the side with more receivers is always towards the "top" of the field
- Standardize the "size" of the play images
        * Scale too-large plays to fit within [-15, 30] and [-27, 27] yards of the pre-snap football location

Then, for each play, we convert its player position data into an image-like data format.  More specifically, we "color" a 51x55x6 tensor for each play based on player positions.
- The first and second dimensions capture the on-field location relative to the football's presnap location, with the field divided into 1 yard x 1 yard squares.  If a player is present in a square at any point during the play, it is marked with a value of 1, otherwise it is marked with 0.
- The third dimension contains 6 "channels" to separately track the on-field positions of different position groups: **QB, WR, RB & TE, DB, LB, and DL**.  This is analogous to separately tracking RGB pixel values of images in 3 channels.  The motivation behind separating position groups is that the movements of different positions may have independent information that can help determine the coverage.  If we combine all the players into a single channel, the network may have difficulty keying in on which positions are most predictive of the coverage.  For example, a linebacker dropping to the deep/intermediate middle versus a safety dropping to the middle may distinguish Tampa 2 from Cover 3 Zone. A defensive lineman dropping into coverage is almost always indicative of zone coverage.    


# Training CoverNet

CoverNet attempts to predict the specific playcall from seven options: Cover 0 Man, Cover 1 Man, Cover 2 Man, Cover 2 Zone, Cover 3 Zone, Cover 4 Zone, and Cover 6 Zone.  Because Prevent Zone playcalls are very rare (there is only one Prevent Zone label in week 1), we do not attempt to predict them.

The figure below shows CoverNet's architecture, but, briefly, it has 3 convolutional, 2 pooling, and 2 fully connected layers.  We found that a deeper network with high capacity and aggressive dropout yielded better accuracy on the validation set than shallower networks.  We train CoverNet with the pre-labeled week 1 defensive playcalls, splitting this sample into training (60%), validation (20%), and test (20%) sets.  The sample splits are randomly assigned and stratified by playcall.  We train the model on the training set, using the validation set to tune the dropout rate and early stopping.    


In [None]:
from IPython.display import Image
Image("../input/convnet-fig/convnet_fig.png", width=1000)

# Adjusted EPA (aEPA)
A key statistic in analyzing the outcome of a play is EPA, or the expected points added. However, comparing different coverage’s effects on EPA is not necessarily straight forward as there are many endogenous factors that can impact a coverage’s effect on EPA. A key concern is the likelihood a play is called as the predictability of a play may impact its likelihood of success and subsequently EPA. In our exploratory analysis we see this is non-trivial as field position impacts the frequency a type of coverage is called. Specifically, we found zone coverage is called more frequently when the offense is far from the endzone and man is called more frequently in field goal range and the red zone.  To help adjust the EPA, we consider clustering data based on pre-play information and game state such as yards-to-endzone, down, yards to go, quarter, offense formation, personnel information, point differential, time till the end of the game, and whether the offense is the home team. We also chose these features because they were used by *nflFastR* to predict expected points (EP).

To perform the clustering, for each team's defense, we estimated a decision tree to predict the expected EPA given particular game states and pre-play information. We used CART which partitions the data into clusters based on our feature information and generates a prediction by returning the average of the response (in this case EPA) in the cluster. We use the predictions compute the residuals of the predicted and actual EPA. We call this new metric the adjusted EPA (aEPA). We define it formally below:
$$aEPA_{i}=EPA_{i}-\mathbb{E}[EPA|X_{i}]$$
where $\mathbb{E}[EPA|X_{i}]$ is estimated using our decision tree and $i$ indexes plays.

We give an example below, which is the decision tree for the Seattle Seahawks defense.  The top number in each blue box is the average EPA in the corresponding bucket, and the bottom number is the proportion of plays falling into the bucket.


In [None]:
from IPython.display import Image
Image("../input/seattle-tree/seattle_tree.png", width=1250)

# Evaluating Coverages
We start by evaluating the performance per aEPA for each coverage (omitting Cover 0 Man and Prevent Zone due to small sample size).  The figure below shows the average aEPA, 1st down conversion probability (Conv%), and the frequency of the defensive for all plays and conditional on long yard-to-go situations.  The overall results show that Cover 4 Zone has about 0.06 lower EPA on average compared to the second best defensive coverage (The difference between the mean aEPA for Cover 4 Zone and all other coverages is significant at the 5% level).  Additionally, we see that the the 1st down conversion rate of Cover 4 Zone is 3.1% lower then the next best coverage.  Conditioning on long yard-to-go situations suggests that a significant proportion of the outperformance of Cover 4 Zone can be attributed to 3rd & long situations, yielding a first down conversion percentage that is 4.2% lower than the next best coverage.

In [None]:
from IPython.display import Image
Image("../input/heatmaps/overall_and_down.png", width=1250)

Why does Cover 4 Zone outperform, on average?  Cover 4 Zone has the most deep defenders among common coverages, so, intuitively, it should perform well against deep passes.  We investigate this in the figure below by comparing the aEPA, completion percentage (Comp%) and frequency of pass for different coverages conditional on varying pass distances (air yards of pass).  We find that Cover 4 Zone strongly outperforms against deep passes.  For passes thrown more than 20 yards, Cover 4 Zone has 0.18 lower aEPA and 4.1% lower completion percentage compared to the next best coverage.  Interestingly, QBs do not seem to throw deep less against Cover 4 Zone compared to other coverages, despite the fact that it is more effective against deep passes.  

In [None]:
from IPython.display import Image
Image("../input/heatmaps/distance.png", width=1250)

# Conclusion
Our contributions can be summarized as follows:
1.	In our report, we demonstrated how to utilize player position data to identify coverages using a CNN. Key was to convert tracking data into an image that shows the trajectories of paths and assigning different positions as different channels of the image. Using our CNN classifier, we able to classify the specific coverages for the entire 2018 season to 71% accuracy and able to identify man vs zone coverage to 85% accuracy without having to specifically generate features. 
2.	We proposed a new metric called adjusted EPA (aEPA) to analyze the performance of different coverage schemes which helps control for additional game state information. 
3.	Our analysis considered how coverages performed in metrics such as aEPA, completion percentage, and 1st down conversion percent. We show across the metrics that Cover 4 performs well in general and especially well in long yardage situations where passes are expected.