By Sahil Bolar [(kaggle.com/sahilbolar)](https://kaggle.com/sahilbolar) and Nate Rowan [(kaggle.com/naterowan)](https://kaggle.com/naterowan). Feel free to check us out on Twitter as well: [Sahil](https://twitter.com/sahil_bolar), [Nate](https://twitter.com/nate_rowan17).

In [None]:
# Import these to run code cells showing figures

from IPython.display import Image
import pandas as pd

# Introduction
In the modern era of the NFL, a defense's ability to consistently stop the pass is one of the more important contributors to team success. This involves both forcing incompletions and limiting the yards after catch (YAC) potential of the receiver by tackling them. Though tackles are an easily accessible and charted measure of production, this metric has a few issues:
1. tackles are a volume stat (so players can obtain more tackles through playing more snaps, rather than being more efficient per play).
2. it is unclear if a player is giving up tackles that they should have made. 

Our project attempts to address these issues by creating an expected tackles metric. Comparing the actual number of tackles made to this metric yields a tackles over expected value that can be divided by a defender's total number of snaps to obtain a per play metric. For our analysis, we treat forcing a player out of bounds as a tackle.

# Methodology
To create our expected tackles model, we only considered two features: 
- at the time of the catch, how close is the defender to the receiver (relative to other defenders)?
- was the catch made in the red zone?

The first feature is fairly intuitive. The nearest defender is much likelier to make the tackle than the second nearest defender, and so on. Rather than using the actual distance between the receiver and a given defender -- which can vary based on soft zone coverages vs tight man coverage -- we use their ordinality (i.e. nearest defender, second nearest, third nearest). After the third nearest defender, there is a much smaller dropoff in likelihood to make the tackle, so we split the remaining likelihood of making the tackle between the rest of the defenders equally. The following code cell gives a clearer picture.

In [None]:
Image('../input/bdb2021figures/demo.gif');

If the .gif isn't displaying properly in the notebook, feel free to download the .gif from our [github repo](https://github.com/naterowan00/big-data-bowl-2020/blob/main/pictures/demo.gif). Sorry for the inconvenience!

The second feature, whether the catch was made in the red zone, is because the likelihood of no tackle being made (i.e. the receiver walks in for a touchdown) drastically rises in this area of the field.  

In [None]:
Image('../input/bdb2021figures/tackle_hist.png')

Within the redzone, it's more likely that no tackle occurs. As a result, defenders' expectations of making the tackle slightly drop across the board in this area. The cell below shows how the specific probabilities change for each tackler.

In [None]:
Image('../input/bdb2021figures/tackle_responsibilities.png')

Given these probabilities, we can sum up how many tackles a defender is expected to make across all their plays. If we sum up the total number of tackles each defender actually makes, and subtract this from the expected tackles, we arrive at our tackles over expected metric. Finally, to normalize across different snap counts, we can divide by number of snaps to arrive at our tackles over expected per play.

# Results


The cells below show the best and worst defenders in tackles over expected per play. There is a snap count filter, since lower snap count players have higher variance due to a smaller sample size.

In [None]:
tackles_over_exp = pd.read_csv('../input/bdb2021data/tackles_over_exp.csv')
tackles_over_exp = tackles_over_exp.drop('Unnamed: 0', axis=1)

In [None]:
# top n defenders
threshold = 150
n = 10

tackles_over_exp = tackles_over_exp.sort_values('toe_per_play', ascending=False)
tackles_over_exp.loc[tackles_over_exp['num_plays'] > threshold].head(n)

In [None]:
# bottom 10 defenders
threshold = 150
n = 10

tackles_over_exp = tackles_over_exp.sort_values('toe_per_play', ascending=True)
tackles_over_exp.loc[tackles_over_exp['num_plays'] > threshold].head(n)

We wanted to see if this metric discriminates between DBs and LBs, since we might expect DBs to be closer to the receiver more often. It appears that dividing up the tackle responsibilities as we did does not discriminate between positions, so it may be an effective way of comparing tackling abilities across positions.

In [None]:
Image('../input/bdb2021figures/toe_per_play_by_pos.png')

# Conclusion
Tackling is an important measure of a defender's effectiveness, and current charting methods fall short in several key ways. This metric addresses these issues by dividng up tackle responsibilities between defenders and normalizing over snap counts. Going forward, we can evaluate this metric by comparing it against PFF tackling grades. We can improve the model by adding more features (such as x and y distance to defender, speed/displacement along the defender's direction vector, etc.) and implementing machine learning techniques to learn the importance of each feature in determing tackler probabilities. Furthermore, looking from the offense's perspective, this metric can be adapted to see who can force missed tackles often.