# **Re**current **P**ressure **P**robabilities (RePP) to Predict Pass Rusher Impact as Plays Develop

**Jon Skaza & Matt Guthrie**

*Metric Track*

> A quarterback has never completed a pass when he was flat on his back…Great pass coverage is a direct result of a great pass rush, and a great pass rush is simply a relentless desire to get to the QB      

<b>\- Buddy Ryan<sup>1</sup></b>

# Introduction

Aside from Quarterbacks (QB), elite pass rushers–Defensive Ends (DE), Outside Linebackers (OLB), or, simply, Edge Rushers (EDGE)—are arguably the most coveted talents in the NFL. In fact, in each of the last 9 NFL Drafts, the #1 overall pick has either been a QB (6) or a DE (3). If we instead consider the top 2 overall picks during the same timespan, the breakdown is QB (10) DE (6), LT (1), and RB (1)$.^2$ Simply put, NFL teams place a large amount of stock in finding and evaluating pass rush talent.

In this project, we introduce an analytical framework, which we coin *Recurrent Pressure Probabilities (RePP)* that we hope can provide more depth to the often-cited count statistics *Sacks (SK)*, *QB Knockdowns (QBKD)*, *QB Hurries (HRRY)*, or when considered jointly, *Pressures (PRSS)*. Specifically, we aim to provide two novel contributions through our approach:

1. "Near real-time” probabilities of getting a *PRSS* for each rusher throughout the course of a play

2. Identifying large “probability accelerating moments”—moments during a player’s pass rush that led to a significant increase in the probability of obtaining a *PRSS*

# Methods

Our motivating factor was to leverage the inherently sequential nature of the *Next Gen Stats* player tracking data to learn behaviors with predictive power with respect to *PRSS*$.^3$ We present a long short-term memory (LSTM) artificial neural network (ANN) designed to process and learn from sequences of data via feedback connections${.^4}$ To illustrate the power of leveraging the sequences, we compare the sequential approach to a “naive” logistic regression model that does not account for temporal relations in the data.

### Data

Using the player tracking data, *Pro Football Focus* scouting data, and general NFL game information, we create a feature matrix, $X$, with dimensionality $R \times T \times P$, where $R$ represents the number of pass rushes in the dataset at the *player-level*, $T$ represents the number of frames observed in the longest play in the data, and $P$ corresponds to the number of predictors$.^5$ Some predictors vary frame-to-frame (e.g., the pass rusher's $x$ and $y$), while others remain constant throughout the course of the play (e.g., $Quarter$). The outcome, $y$, is a boolean vector indicating a *PRSS* on the pass rush for player $i$ on passing play $s$ in game $g$.

$$
y = 
\begin{bmatrix}
y_{1,1,1} \\
y_{1,1,2} \\
y_{g,s,i} \\
\vdots
\end{bmatrix}
\quad
X = 
\begin{bmatrix}
\begin{bmatrix}
x_{1,1,1} & ... & x_{1,1,P} \\
& \vdots & \\
x_{1,T,1} & ... & x_{1,T,P} \\
\end{bmatrix} \\
\vdots\\
\begin{bmatrix}
x_{R,1,1} & ... & x_{R,1,P} \\
& \vdots & \\
x_{R,T,1} & ... & x_{R,T,P} \\
\end{bmatrix} \\
\end{bmatrix}
$$

Further, we implement two sampling methods on $X$ and $y$. First, we oversample cases of $PRSS = 1$ to create a balanced dataset. Second, we augment the dataset with random sub-sequences from each play. This reflects both objectives described in the introduction; to provide probabilities of a play's outcome at different stages of a play, we want the model to observe partial plays during training.

The table below summarizes the *frame-level* features included for each unique individual pass rush.

Feature Category|Features|
--------|-----------
Coordinates | Pass Rusher, LT, LG, C, RG, RT, QB, Ball
Situational | Quarter, Down, Yards to Go, Absolute Yard Line, Score Difference
Speed | Pass Rusher, QB 
Derived | Pass Rusher Dist from QB, QB in Tackle Box, # Blockers 


### "Naive" Logistic Regression Classifier

As mentioned above, we train a logistic regression as a baseline comparison model that does not account for temporality. We thus remove a dimension from $X$. $X_{logistic}$ becomes $R \times (T*P)$. Furthermore, the $y$ vector becomes length $T*R$.

### LSTM Classifier

In the ANN, we used a masking layer followed by a long short-term memory (LSTM) layer and a dense sigmoid output layer. The masking layer was included to handle missing data, namely time steps in plays containing a frame count $< T$. The input shape for the model was $T \times P$, where $T$ is the number of time steps and $P$ is the number of features. The LSTM layer had 64 units. The model was compiled using the binary cross-entropy loss function and the Adam optimizer.

To improve the generalizability of our model, we used early stopping with a patience of 5 epochs and a minimum delta of 0.005 on the validation AUC. If the validation AUC did not improve by at least 0.005 after 5 epochs, training was stopped and the best weights were restored. The model was trained for a maximum of 150 epochs on the input data $X$ and labels $y$, with a validation split of 0.1.

The masking layer is used to mask values from time steps in plays with a frame count $< T$.

As mentioned, the LSTM layer is useful for processing sequential data, as it is able to retain information about past events and use it to inform processing of subsequent events. The LSTM layer in our model has 64 units and an input shape of $T \times P$.

The final layer is a dense, fully-connected layer with 1 unit and a sigmoid activation function. The sigmoid activation function maps the output to a value between 0 and 1, which can be interpreted as a probability.

The model was trained on 80% of the dataset using the Adam optimization algorithm. The remaining 20% was used as a holdout set.

![model_diagram.png](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/model_diagram.png?raw=true)

# Results

As expected, the LSTM classifier, which is able to learn from intra-play sequences of player and ball movement, considerably outperforms the "naive" logistic regression model. We can see this by comparing ROC curves and their corresponding AUCs.


![roc.png](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/roc.png?raw=true)

# Example Usage

We can use the trained LSTM network to output and visualize the pressure probabilities, which we refer to as *RePP*, on select plays. Here, we hand-select two T.J. Watt sacks from his historic 2021 season.

The first sack comes in OT against the Seahawks, a game in which the Steelers won 23-20. On this particular pass rush, we see nothing too fancy from Watt. He simply beats the RT with speed. Accordingly, we see no sudden jumps in probability of getting a QB pressure. Instead, the curve rises steadily as he runs by the blocker and homes in on the QB.



![watt_sack_4396_w_overlay.gif](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/watt_sack_4396_w_overlay.gif?raw=true)

![rush_4396.gif](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/rush_4396.gif?raw=true)

The second example is a bit more interesting. It is another sack, this time against the Browns. Watt employs a slick spin move to evade the RT. Prior to the move, the model suggests relatively low pressure probability. Up until that point, we see a RT doing his job, standing in between Watt and the QB. Since QB pressures don't happen terribly often and the play protection looks typical, we assign low probabilities. Then, Watt completes his spin move, creating a direct line to the QB. This is what we would consider a "probability accelerating moment", i.e., a large jump in pressure probability in a small amount of time.

![watt_sack_1984_w_overlay.gif](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/watt_sack_1984_w_overlay.gif?raw=true)

![rush_1984.gif](https://github.com/jskaza/nfl-big-data-bowl-2023/blob/master/images/rush_1984.gif?raw=true)

# Discussion

As demonstrated throughout, we see practical usage for the *RePP* framework. The two use cases that we proposed are predicting pressures in "near real-time" and discovering "probability accelerating moments". One could imagine watching a broadcast that has a slow-motion replay of a sack with pressure probability overlayed on top of each frame. Teams could leverage the framework to identify pass rushers with sneaky moves, as identified via large jumps in the probability function, that contribute to high *PRSS* rates. We believe that more work could be done to strategically identify such moves using the framework. We imagine it would be useful to analyze rates of change in the recurrent probability sequences. There could be other use cases for this framework, whether for different pass rush outcomes or even other aspects of the game.

We believe that there are several other directions that could be taken to improve upon the *RePP* framework. For one, as with many highly parameterized ANNs, there is room for additional tuning. There may be gains in model performance from this alone. Something else that could be interesting would be incorporating credibility ranges into the model's predictions. Another avenue would be to learn player-specific behaviors. The model presented is designed to learn league-wide pass rush behavior. However, each player has his own pass rush tendencies. The LSTM framework can be extended to accommodate this${.^6}$ Finally, one could adapt the idea of predicting play outcomes to different model architectures designed to learn from sequential data, such as Transformers or Google's Temporal Fusion Transformers (TFT)$.^{7,8}$


## Appendix

[1] https://profootballtalk.nbcsports.com/2016/06/29/buddy-ryans-philosophy-quarterbacks-must-be-punished/

[2] https://www.pro-football-reference.com/draft/

[3] https://nextgenstats.nfl.com/

[4] https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory?redirectedFrom=fulltext

[5] https://www.pff.com/

[6] https://arxiv.org/abs/2008.07870

[7] https://arxiv.org/abs/1706.03762

[8] https://www.sciencedirect.com/science/article/pii/S0169207021000637



**Code is available on [GitHub](https://github.com/jskaza/nfl-big-data-bowl-2023)**