# Snap Timing Efficiency (STE): A Novel Metric to Analyze how Efficiently Teams use Snap Timing in Conjunction with Pre-Snap Motion

Author: [Shashank Manjunath](https://shashankmanjunath.github.io/) \
Metric Track

## Introduction

In American football, pre-snap motion is intended to either create advantageous matchups for offensive players, or to confuse the defense in the defense by putting an offensive player in an unexpected spot. These motions can be used to draw defenders away from the point of attack, to allow larger or faster receivers to get matchups on smaller defenders, or any other myriad of advantages one could exploit to gain positive yards on offense. However, at some point in a pre-snap motion, there is *maximal* confusion or advantage for the offense against the defense. Maybe this is right as a receiver is being handed off from one player to another in a zone defense concept, or maybe this is right when a defensive player realizes that their assigned receiver is in motion in a man defense concept. We want to measure how good teams are at exploiting this moment of maximal confusion using the timing of the snap.

## Snap Timing Efficiency Metric

We develop a novel metric, Snap Timing Efficiency (STE), to analyze how our well teams time their snap with their pre-snap motion on passing plays.
[Inspired by prior work](https://fivethirtyeight.com/features/our-new-metric-shows-how-good-nfl-receivers-are-at-creating-separation/), we aim to quantify how "confusion" leads to offensive yards by measuring receiver separation from closest defender at the time of ball arrival to the receiver [1].
We predict separation using an XGBoost model using only information known to the offense prior to the snap, such as play direction, route depth, defensive alignment, offensive formation, route type, route depth, and "break" in the route after the ball is thrown.
We additionally use the targeted receiver's position, speed and direction as the ball is snapped. 
However, when testing this metric, we can use the player's position and direction throughout their motion, and even simulate their position and direction after their motion to predict Using this metric, we can predict a receiver's separation based on their pre-snap position.

| Feature Name | Feature Description | Feature Type |
| :---: | :---: | :---: |
| Height | Height of player in inches | Scalar |
| Weight | Weight of player in lbs | Scalar |
| Position | Position of player | One-Hot Encoded |
| Route Type | Route ran by player | One-Hot Encoded |
| Other routes ran | Routes ran by other players involved in the play | Multi-hot Encoded |
| Break in Route | "Break" in route ran by specified player between ball throw and ball catch | Scalar |
| Route Depth | Maximum depth achieved by player before ball reception | Scalar |
| x-position | x-position of player in yards | Scalar |
| y-position | y-position of player in yards | Scalar |
| Speed | Speed of player in yards/second | Scalar |
| Distance | Distance traveled from prior timepoint | Scalar |
| Orientation | Orientation of player | Scalar in $[0, 1]$ |
| Direction | Direction of player motion | Scalar in $[0, 1]$ |
| Team | Team that player is on | One-Hot Encoded |
| Play Direction | Direction of Play (left or right) | One-Hot Encoded |
| Down | Down of play | Scalar |
| Yards to go | Yards to go to 1st down | Scalar |
| Percentage of game elapsed | Percentage of game which has been played | Scalar in $[0, 1]$ |
| Pre-snap Team Score | Score pre-snap for team the player is on | Scalar |
| Pre-snap Opposition Team Score | Score pre-snap for opposing team | Scalar |
| Absolute Yard Line Number | Distance of line of scrimmage from end zone | Scalar |
| Pre-snap Team Win Probability | Win Probability of team the player is on | Scalar |
| Pre-snap Opposition Win Probability | Win Probability of Opposing Team | Scalar |
| Pass Length | Length of pass on play | Scalar |
| playAction | Whether play was play action or not | ${0, 1}$ |
| Dropback Distance | Distance of Quarterback dropback | Scalar |
| timeToThrow | Time QB took to throw the ball | Scalar |
| timeInTackleBox | How long the QB was in the tackle box for this play | Scalar |
| Run/Pass Option | Whether the play was an RPO play | ${0, 1}$ |
| Number of Routes | Number of Routes run on the play | Scalar |
| offense Formation | Formation of the Offense | One-Hot Encoded |
| Receiver Alignment | Alignment of Receivers | One-Hot Encoded |
| Dropback Type | Type of Dropback | One-Hot Encoded |
| Pass Location | Location of Pass | One-Hot Encoded |
| Pass Coverage Type | Pass Coverage Type | One-Hot Encoded |
| Man or Zone | Defense is in Man or Zone | One-Hot Encoded |
| Posession Team | Team on Offense | One-Hot Encoded |
| Defensive Team | Team on Defense | One-Hot Encoded |

We calculate the "break" in the route by measuring the deviation of the player from their position predicted by kinematic equations taken when the ball is thrown to when the ball arrives at the receiver.

**TODO**: Figure describing "break" in route

 ### XGBoost Model

We train our XGBoost model on the targeted receiver's data on the final timepoint before the ball is snapped. Therefore, this model is a "single-frame" model, using only a single player's position data for a single frame, though it does contextualize the play by using information about the number of other receivers and the other routes ran by receivers. We choose hyperparameters based on grid search of viable parameters.

We train our model on data from weeks 1-7 and test in on weeks 8 and 9. To quantify our model performance, we use the Mean Absolute Error (MAE) metric calculated as follows:

$$
\text{MAE} = \frac{1}{N} \sum\limits_{i=1}^N  |y_i^\text{true} - y_i^\text{pred}|
$$

where $N$ is the number of samples in our dataset, $y_i^\text{true}$ is our true label and $y_i^\text{pred}$ is our predicted label. We additionally use Coefficient of Determination ($R^2$) to measure performance, calculated as follows:

$$
R^2 = 1 - \frac{SS_\text{res}}{SS_\text{tot}}
$$

where $SS_\text{res} = \sum_{i=1}^N (y_i^\text{true} - y_i^\text{pred})^2$ and $SS_\text{tot} = \sum_{i=1}^N (y_i^\text{true} - \bar y)^2$ where $\bar y$ is the average of the true labels.

We train our model on data from weeks 1 through 7 and test it on data from weeks 8 and 9. We achieve a train MAE of 1.483 and a train $R^2$ of 0.343. We achieve a test MAE of 1.691 and a test $R^2$ value of 0.207.

**TODO:** Analyzing feature importances, we find that 

### Metric Calculation

Once we have predicted the receiver's position at all pre-snap positions, we can calculate the difference in final prediction between the "optimal" position, which is the maximum predicted separation.
Since we use only single timepoints from the targeted receiver for prediction, we predict the separation at each time point using our XGBoost model.
We can therefore analyze the difference between the "optimal" time (where separation was predicted to be maximized) as compared to the actualy snap time using the following equation:

$$
\text{STE} = \left(\frac{s_\text{snap}}{s_\text{max}}\right)\left(\frac{t_\text{max} + 1}{t_\text{snap} + 1}\right)
$$ 

where $s_\text{max}$ is the maximum separation, $s_\text{snap}$ is predicted separation at time of snap, $t_\text{snap}$ is the time of snap in seconds, and $t_\text{max}$ is the time of maximum predicted separation. This metric ranges from 0 to 1, and has several desirable properties. When our snap time is close to maximum separation time and predicted snap separation is close to optimal (maximum) snap separation, STE is close to 1. Note that $s_\text{max} \geq s_\text{snap}$ and $t_\text{snap}  \geq t_text{max}$, so the maximum value of $STE$ is 1.

On the other hand, when the maximum predicted snap separation is much higher than the predicted separation at snap time and the the snap time is far from the optimal snap time, i.e. snap time when separation is maximized, we STE is small, close to zero. However, due to practical limits in this dataset, the minimum observed value is 0.0009 by **TODO team**.

We apply a 3-step windowed moving average filter (low pass filter) to remove some of the high-frequency noise in the predictions caused by small fluctuations in the data rather than true variation in the resulting separation before calculating $s_\text{max}$, $s_\text{snap}$, $t_\text{max}$, and $t_\text{min}$. Additionally, we only calculate STE if there are at least 5 valid "frames" between line set and play snap.

## Example Play

## Which Teams have high STE?

## Conclusion

## References

[1] https://fivethirtyeight.com/features/our-new-metric-shows-how-good-nfl-receivers-are-at-creating-separation/

## Appendix