# Introduction

Despite its involvement in almost every play, judging a player's talent to make tackles still remains much more of an art than a science. This begs the question - how should we measure a player's tackling ability?

## Current Metrics

A common approach is to simply count the number of tackles a player gets in a season. But there's an obvious issue with this:

**Cumulative tackles do not account for how often a player is put in a position to make a tackle.**

If player A has 100 tackles but was targeted 150 times, and player B only has 20 yet was only targeted 20 times, who would we say is the better tackler? 

So the next step is Missed Tackles Percentage (MIS%) - taking the ratio of successful tackles to total tackle attempts. 

But this leads to the natural question - how do we define a tackle attempt (and subsequent tackle failure)? The extreme cases are obvious, like if a player wraps himself around the ball carrier and slides off unabated.

But what about the non-extreme cases? Imagine a wide receiver who catches the ball in the middle of the field with a lone safety between him and the endzone. Since the safety is the only player in the ball carrier's path, this is clearly a great tackling opportunity, right? Or should it depend on how much space is available to the receiver? Or if he caught the ball in stride?

**Missed Tackles Percentage does not differentiate failed tackles by the quality of the opportunity.**

It treats tackle opportunities as a binary process (1: opportunity, 0: no opportunity). A lot of nuance in player tackling ability is lost in this discretization.



# Tackles over Expected (ToE)

The previous metrics are flawed because they do not account for the context of the situation surrounding the tackle. So what's the alternative?

Similar to how we now use expected points and expected completion percentage, we can better measure tackling ability using Tackles over Expected, defined as **the difference between the tackles a given player makes and the average number of tackles a player would make in the same states of the plays** (factoring in variables such as location, speed, position). This would then tell us how well a player compares to his peers at tackling adjusted for their surrounding circumstances.


## How to Calculate Expected Tackles (xT)

For any given snap, the $j$th player's expected number of tackles **$xT$ equals the probability that the average player gets a tackle on the play**:

\begin{aligned}
xT_{j} &= (numberOfTackles  | isTackler)*P(isTackler) + (numberOfTackles | isNotTackler)*P(isNotTackler)          \\
&= 1*P(isTackler) + 0*P(isNotTackler)                                        \\
&= P(isTackler)                                                                      \\
\end{aligned}

Formally, we want the probability that player $j$ is the tackler at time $t$ given all the events that have previously occured:

$$
P(T_{jt} | S_{t}, S_{t-1}, S_{t-2},...,S_{0})$$
where $S_{k}$ is the state of the game at time k. Due to project time constraints, we make the simplifying assumption that $P(T_{jt} | S_{t}, S_{t-1}, S_{t-2},...,S_{0}) = P(T_{jt} | S_{t})$; i.e. we only consider the current state of the field for our predictions.

For a given player $j$ this gives us a sequence of probabilities as the play develops $p_{j0}, p_{j1}, p_{j2},...,p_{jT}$. To get a single statistic to summarize this sequence of $T$ probabilities, we take the average to get the $jth$ player's final $xT$ for the play. Subtracting this from the player's true number of tackles gives us our Tackles Over Expected (ToE) metric for a play.

All that's left is to actually generate the probabilities.

## A New Approach - Going Deeper

The historical way of tackling such a problem was to handcraft useful features and plug these into older machine learning methods (such as tree-based models or SVMs). This has even been the framework in past Big Data Bowl winners. In the 2022 winning submission [Punt Returns: Using the Math to Find the Path](https://www.kaggle.com/code/robynritchie/punt-returns-using-the-math-to-find-the-path/notebook), the authors need to calculate what they define as Penalized Expected Arrival Time to the Returner:

> Intuitively speaking, this time penalty is:  
1-5 seconds when the blocker is <5 yards to the tackler and directly in his path,  
0.1-1 seconds when the blocker is >5 yards from the tackler but in the neighbourhood of his path, or  
0-0.1 seconds when he is far enough to the side of the tackler’s path that he will likely not be able to block him.
>

Aprior, we don't know how true these explicit assumptions and parameterizations are. A weighted Gaussian kernel is used — is it parameterized well? Could a non-parametric approach perform better, one that has the representational power to generate multimodal blocker time distributions, conditioned on blocker success (or latent talent)?

**Deep learning models do not require such restrictive assumptions**, being able to take as input raw data and learn whatever representations are supported by the data. For this reason, we turn our *attention* to more flexible AI models.

## An Attention-Based Transformer

**Our model has a Transformer-based architecture utilizing the concept of [attention](https://arxiv.org/abs/1706.03762) in machine learning, which has revolutionized the ability to process sequences.** At any given time, we can think of our 23 players on the field as an unordered sequence. The transformer will then focus on the most relevant players and predict who will be the final tackler, all without affixing an arbitrary ordering to the players.

At time $t$, we define our state $S_{t}$ as a $(23,P)$ matrix, where each row represents the $i^{th}$ player (including the ball) on the field and $P=$ the number of features. In our base model $P=9$, with features:

**$[$IsAttackingTeam, IsFootball, IsBallCarrier, XCoord, YCoord, Direction, Orientation, Speed, Acceleration$]$**

Successive models include height, weight, and position. The only feature preprocessing we did was to standardize the features according to [Michael Lopez's notebook](https://www.kaggle.com/statsbymichaellopez/nfl-tracking-wrangling-voronoi-and-sonars) and normalize them to improve training speed.

# Accounting for Player Positions

## Including Tactical Context

Before moving on to the model outputs, have we accounted for all the context there is on the field? With NGS data we have all the physical movements, but can we incorporate the **tactical context** going on during the play?

For example, suppose Derrick Henry (RB) is carrying the ball up the middle, breaking tackle attempts from player A (MLB) and player B (FS). Given nearly identical NGS features (think acceleration, angle, distance to runner), should we have different tackling expectations based on their position's responsibilities?

## Defining Positional Embeddings

We want to be able to account for player position. The most obvious approach would be to assign dummy variables to each position. For example, a 1 if quarterback, 0 if not. But with so many positions, that would mean adding 20 dummy variables to our data, 19 of which would be 0 for any given player. One issue with this is that adding a large number of dummy variables effectively [reduces](https://files.eric.ed.gov/fulltext/ED493866.pdf) the amount of training data.

Additionally, using 1s (player is a cornerback) and 0s (is not a cornerback) ignores relationships that exist between positions (isn’t a cornerback more like a safety in their behavior than a guard?). If we want to preserve these relationships, we need some sort of continuous representation (e.g. 0.75 for CBs, 0.7 for S, and 0.2 for G).


## The Graph2Vec Algorithm

At a high level, we draw inspiration from the Natural Language Processing (NLP) community's [algorithm](https://arxiv.org/pdf/1301.3781.pdf) to create continuous representations of words. Their thesis is that a word can be defined by its surrounding words in a sentence. Translated to sports, **we can define a player's position by the surrounding players in a play**.

At every timestep, we represent each player on the field as a node in a graph, connected by edges (weighted by their distance). The Graph2Vec algorithm then learns a representation of the player's position by considering both their location on the field and their distance to every other player.

## Positional Relationships

Let's first make it clear what our output is. Below would be a 20-dimensional dummy-encoding vector with each dimension (column) having an interpretable meaning - the first representing "is quarterback", the second "is wide receiver", etc:

Cooper_Kupp = $[0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]$

For our 32-dimensional positional embedding vector, we instead have:

Cooper_Kupp = $[-3.07, -5.81, 14.20, 2.41, 6.12, 4.34, -1.89, -3.83, 3.15, ...]$

What does each dimension represent? It's not explicitly defined - it's what the model learned. We can try to intuit some meaning from them by graphing them:

For example, take the green cluster. Our (projected) Cooper Kupp positional embedding would look like:

Cooper_Kupp = $[40, -21.0]$

Though there are a few wide receivers in an adjacent cluster (like Michael Thomas, Skyy Moore), we can see Kupp surrounded by guys like Stefon Diggs and Gabe Davis. In fact, it seems like a lot of the wide receivers have a dimension 2 value of less than -15. One possible interpretation of our embeddings is that negative values of dimension 2 encode some sort of measure of "wide receiver-ness". 

By taking the similarity of every pair of player's embeddings and grouping by position, we can also see that **the relationships we'd expect to exist between positions arise naturally from our embeddings**:

### Pass-Catching Running Backs - Case Study

The original motivation behind this idea was how to properly represent players who didn't fit nicely into a single well-defined position. The current 49ers roster, headlined by Deebo Samuel and Christian McCaffrey, have an abundance of these players.

In fact, there were two WRs with over 100 rushing yards in 2022 - Deebo Samuel (SF) and Curtis Samuel (WAS). The top three RBs in both receptions and receiving yards were Austin Ekeler, Leonard Fournette, and Christian McCaffrey. To test our positional embeddings' efficacy, shouldn't these WRs be more similar to RBs than the average WR (and vice versa for the listed RBS)? 

This turns out to be the case — the WRs and RBs are higher than the average WR-RB similarity. **This is part of the tactical context that positional embeddings capture but dummy-variable encoding and NGS data ignore**.


The result also holds for slot receivers like Cooper Kupp and Tyler Boyd - they are more similar to the average RB (avg_rb row) than an average WR (bottom-right cell).

# Model Breakdown

Now that we have probabilities that include the context surrounding a player, we can model his probability of finishing with the tackle.

If we look at a few specific timesteps, we can see one of the model's greatest strengths - **its ability to learn the affects of high-level features (like blocking) from raw data**. 

## Summary Statistics

As a baseline, we simply predict the tackler as the closest defender to the ball. **Our model is much better at predicting the correct tackler than this naive baseline.** The prediction performance is further improved by adding height, weight, and our own novel positional embeddings.

# Player Rankings

Using our Tackles over Expected metric, we can now rank players based on their performance relative to the league average. Below we have rankings for (cumulative) **Tackles over Expected** and **Tackles over Expected per Snap** *(divided by total number of snaps)*.

The notable aspect of our model is that, while properly identifying players we know to be good tacklers, it also **highlights players who we suspect are good tacklers but are overlooked by traditional metrics such as missed tackle percentage**.

# Innovation & Utility

We now have a new way to measure how well a player performs at tackling relative to his peers. This model has a unique ability to understand and apply the rules of football to predict tackling because of its deep, attention-based transformer and its novel method of capturing the relationship between player positions via positional embeddings.

Its uses extend to player evaluation, acquisition, and gameday strategy. The positional embedding framework also has many uses independent of tackling, such as modeling that uses position as an input, or potential player similarity metrics.

<a href="https://github.com/scottmaran/big_data_bowl_2024">Code</a>