# NFL Big Data Bowl 2023: Introducting WERC



## Introduction

Some of the game’s most important plays happen on the line, yet, there is a severe lack of metrics to isolate and evaluate the performance of offensive linemen. Doing so is an incredibly challenging task; to the untrained eye, a lineman's show of force may appear violent, chaotic, and barbaric. As we have learned by talking and brainstorming with Super Bowl Champion and 3-time Pro Bowl Offensive Lineman Brandon Brooks, linemen have one of the most technical positions on the field.



## Problem

As any team does, we began by brainstorming potential ways to evaluate the performance of offensive linemen, which led us to the discovery of two fundamental challenges:

1. **We cannot consider the outcome of a play when we evaluate linemen performance.** It is entirely possible for the offensive line to perform poorly even though the play itself might be a success, or vice versa.
2. **We cannot compare linemen performance to some optimal performance.** Optimal performance implies knowledge of what else is happening on the field, however, offensive linemen cannot see what is happening around them. They cannot see where the quarterback is, nor where the other players are. 



## Solution

We present a new statistic: the **W**eighted **E**ncoding of **R**isk **C**urve (**WERC**). This statistic provides a novel way of evaluating the performance of the entire offensive line, irrespective of the play result. The approach taken is summarized in the following steps:
1. **Build a model to evaluate the risk level of a given field state.** The field state represents the positions, orientations, accelerations, and speeds of every player at a given moment. The model will predict the risk level given the current field state. We refer to this model as Model A and refer to its output as the risk level.
2. **Compile each play's risk curve.** We use Model A to calculate the risk level of a play at every time $t$. Then we compile the risk levels, in chronological order, so that we obtain a curve of the risk level throughout the play.
3. **Train a model to create an encoding of the risk curves.** Given each play's risk curve, which denotes the level of risk at every moment of the play, the model will create an encoding such that the variance between high-risk and low-risk plays is maximized. We refer to this model as Model B and refer to its output as the risk encodings.
4. **Obtain a weighted score from the risk curve encodings.** We perform matrix multiplication between the risk encodings and the singular values of Model B, among other calculations described below, to score the performance of the offensive line.



## Evaluating Risk Levels (Model A)

We use a gradient tree booster to predict the risk level at time $t$ given the current field state. We include the following features: time since start of play, distance to quarterback, speed, and acceleration for each of the 6 closest defenders to the quarterback. These features are ordered such that the distance to quarterback, speed, and acceleration of each player appears in order with respect to the distance to quarterback.

We train Model A on a subset of the plays: only those that end in a completion, sack, or scramble. We train on this subset because we want to train our model on plays that clearly reveal the risk level. We create a binary target variable which denotes whether the play ends in a failure (sack or scramble) or a success (completion).

We acknowledge the problem that we previously outlined and we feel that this is an appropriate use of the play outcome. Our goal is not to predict the outcome of a given play, but rather to obtain the risk level at time $t$. Thus, when it is obvious that the play ends in a failure or a success, the Model A will correctly predict the outcome. When the future outcome is more ambiguous, Model A does not always predict the correct outcome, which is the expected and desired behavior.



## Compiling Risk Curves

We use Model A to predict the class probability of each play (not the subset defined above) at every time $t$. The probability of failure is what we define as the risk level. For each play, we collect the risk levels at every time $t$, and order them chronologically. This gives us the risk curve of every play. Below, you will see the risk curve for the average play.

<img src= "https://github.com/johnwesleyappleton/NFLBigDataBowl2023/blob/main/avg_risk_curve.png?raw=true" alt="Figure 1"/>



## Creating Encodings of Risk Curves (Model B)

We must now determine and evaluate which risk curves display a good offensive line performance, and which risk curves display a poor offensive line performance. To do so, we use Principal Component Analysis to translate our risk curves to encodings that maximize the variance between successful and unsuccessful plays.

We train Model B on the same subset of plays as Model A: only those that end in a completion, sack, or scramble. We train on this subset because we want Model B to clearly identify the differences between successful and unsuccessful plays. On average, these risk curves will be very different, and so the variance between successful and unsuccessful plays will be maximized.

The inputs to Model B are risk curves of average length (we calculate the mean length of a play). Plays that are shorter than the average length are ignored for training. Plays that are longer than the average length are cut off at the mean length.



## Obtaining a Weighted Performance Score

We use Model B to transform each risk curve to its encoding. We encounter 3 cases:
1. If the risk curve is shorter than the mean length, we insert risk levels of zero to the beginning of the play until it is as long as the average play. Adding zeros maintains the desired behavior because it leads to no change in the resulting encoding. Additionally, since we do not train Model B on any plays shorter than the mean, Model B is unaffected by short plays.
2. If the risk curve is the same length as the mean length, we can input it to Model B without modification.
3. If the risk curve is longer than the mean length, we trim the play up until the mean length and input the first section. We keep the second section for future use.

After obtaining the encoding of each risk curve, we perform matrix multiplication with the singular values of Model B to obtain an initial score. To account for plays that were shorter than the mean length, we do the following; if a play was shorter than the mean length, divide the score by the length of the play. Else, divide the score by the mean length. At the end, we obtain a score per unit of time.

We must also account for plays that were longer than the expected length. By definition, the expected length of a play is the length that a play is expected to last. Thus, we consider anything past the expected length to be bonus. This bonus is calculated by multiplying the score per unit of time, by $1-r$, where denotes the risk level. Since a lower total score is indication of a better performance, we subtract this bonus from the score on the first section of the play.

After completing these steps, we are left with our final score. The smaller the score, the better the performance.



## Example 1: Good Linemen Performance

Let's see WERC in action. In this example, the offensive linemen were awarded a score of -12.8, which corresponds with that of a good performance. Below, we can see the risk curve of this play.

<img src= "https://github.com/johnwesleyappleton/NFLBigDataBowl2023/blob/main/good_risk_curve.png?raw=true" alt="Figure 1"/>

Below, we can see an animation of the play.


We see that the WERC score of -12.8 accurately reports the linemen performance of this play.



## Example 2: Poor Linemen Performance

In this example, the offensive linemen were awarded a WERC score of 7.5, which corresponds with that of a poor performance. Below we can see the risk curve of this play.

<img src= "https://github.com/johnwesleyappleton/NFLBigDataBowl2023/blob/main/bad_risk_curve.png?raw=true" alt="Figure 1"/>

Below, we can see an animation of the play.


We see that the WERC score of 7.5 accurately reports the linemen performance of this play.



## Future Directions

Future directions for WERC are to fine tune and adjust the range of possible values. With insights from football experts, we can better adjust the range of possible values such that the final performance score becomes more intuitive and easy to use.



====================

Code for this project: [GitHub repo.](https://github.com/johnwesleyappleton/NFLBigDataBowl2023)

Questions and inquires: johnwesleyappleton@gmail.com