## Problem Statement / Evaluation Criteria

This year's competition offers up a general goal — create metrics that assign value to elements of tackling.

You can access the NFL’s Next Gen Stats data as in previous competitions. This year's player tracking includes data from Weeks 1-9 of the 2022 NFL season. Data will show the location, speed, and acceleration of all 22 players on the field, along with football location. Additional PFF scouting data and NFL advanced stats such as expected points and win probability are also included.

Your challenge is generating actionable, practical, and novel insights from player tracking data corresponding to tackling.

<b>Examples to Consider</b> <br>
Examples include, but are not limited to:

* Predictions of tackle time, probability, and/or location
* Tackle range: angle of pursuit, speed and acceleration, closing speed
* Player evaluation (e.g, yards saved, tackle value, missed tackles)
* Credit assignment (e.g, one player makes a tackle because of another players, blocks shed, area of influence)
* Tackle type (solo vs gang, open field vs in the trenches, etc)
* Team and player roles and responsibilities (setting the edge, filling gaps, etc)


<b>Evaluation</b> <br>
Submission tracks
Participants will select one of (two) tracks in which to submit.

<p>1. Metric Track</p>
<ul>
    <li>Create a metric to assess performance and/or strategy.</li>
    <li>You may focus on offensive or defensive players, teams, or individuals.</li>
</ul>

<p>2. Coaching Presentation Track.</p>
<ul>
    <li>This track aims to analyze and present data in a submission designed for coaches.</li>
    <li>We encourage participants interested in this track to partner with a coach (or current/former player), though this isn’t required.</li>
</ul>

<hr>

## Proposed Ideas

1. Defensive player intimidation/dominance factor
    + examine difference of intra-game offensive player performance before/after high-impact tackles
    + possible multiplier considerations - do repeated such tackles on same offensive player have multiplicative impact on difference?

<br>

2. Denial of space measurement based upon field area that defensive player can cover
    + would probably need to create/find metric or statistic to quantify area that player will cover (presumably $\text{yrds}^{2}$)

<br>
    
3. Something related to reinforcement learning

<br>

4. Defensive player presnap probability-of-making-tackle statistic
    + use historical data to train/fit some predictive model to data
    + the motivation behind the idea would be for the defense to incorporate this information into its alignment after the offensive players break their huddle and line up before the play.
    + response variable = `tackle` in the `tackles.csv` file (binary: 1=player made tackle; 0=else)
    + predictor variables = ???
        * need to work this out further
        * something like: (in-game timestamp, down/distance, line-of-scrimmage yard line, offensive formation, etc...)
        
<br>

5. Examine relationship between tackle type, missed tackles, and wins
    + categorize tackle types into factor variable levels
    + descriptive stats about tackle type levels
    + compute prob(missed tackle | (tackle type, $\theta$)), $\theta = $ other params
    + seek to understand whether a particular tackle type affects the prob of missed tackle
    + establish relationship between prob(win) ~ count(missed tackles)
    + investigate whether a certain tackle type/situation negatively affects the number of wins
    + argue that this particular tackling scenario should be avoided
        * check whether certain defensive alignments/formations are most likely to avoid these situations
        * recommend these defensive alignments for more wins
        
        
6. Entropy-based quantification of player tackling probability
    + Compute entropy of each each player's tackles-per-game
    $$ H\left(p\right) = - \sum\limits_{x \in \mathcal{X}} p\left(x\right) \log\left(p\left(x\right)\right) = \mathbb{E} \left(-\log p\left$$

## References/Ideas

* For example, we might consider looking over the github repo for last years' winner, <a href="https://github.com/qntkhvn/strain">STRAIN</a> (Sacks, Tackling, Rushing, Aggression INdex) to get ideas about what a good submission looks like.

* <a href="https://www.kaggle.com/competitions/nfl-big-data-bowl-2024/discussion/446963">Discussion post</a> on 2024 data bowl page that contains links to kaggle notebooks for 2021/2022 winners.

## Issues/Considerations

1. Model selection / Feature Space Dimensionality Reduction
* For example, `plays.csv` contains 35 columns. We won't need all of them. For a given model, what procedure will we use to identify the most informative feature variables to include?
* For simp model, might do something like Principal Component Analysis or LASSO regression