# Predicted Yards Model

Football is a fast game in a sense. Unlike other sports, the action takes place in short bursts which increases the importance of every second of actual gameplay. Planning and strategy are essential, and thus recieve most of the attention when preparing for a game. Individual efforts and skill are known to be the determinant factor in most sports, however the tools to evaluate them in football remain limited. This work aims to provide a framework to evaluate ball carriers and the decisions they took in real time. While traditional statistics calculated, ie. net yards, net returns, rushing yards, and many more aim to represent player skill, they fall short because of the collaborative nature of football. Every man counts and is therefore contributing to the success or lack of in a play, so associating a player with a team statistic could be misleading.

In previous NFL Data Bowl competitions, models to predict rushing yards were built to evaluate players rushing the ball, which helped teams and fans alike better understand who was doing it better or worse than their similes. These models, independent of their strenghts and shortcomings, only provide estimates at the moment a rusher recieves the ball. PY, or Projected Yards aims to use the richness of special teams data to provide an estimate at any moment of time when a player has the ball. 

## Model Definition

A bit more formally, PY or Predicted Yards is defined as:
$$ \text{PY}= \mathbf{E}[Y_{t}|F_{obs}] $$
Where $Y_t$ is defined as the Yards to be gained at time $t$, and $F_{obs}$ is the set of events that have transcurred in the play before time $t$.

While mathematically finding the distribution for $Y_t|F_{obs}$ may be almost impossible, our model aims to exploit the benefits of Machine Learning to provide an estimate. One of the main assumptions for this to work is that punt returns and kickoffs provide a wide variety of player locations and orientations which serve as a dataset rich enough to make this estimate accurately. 

To make the model agnostic to the type of play and current ball location, the information used in the training dataset was carefully selected, and excludes variables that are particular to punt and kick returns. For each given time $t$ the following variables are calculated:
- Distance from ball carrier to $n$th closest player in team with ball possesion, with $n=1...11$
- Distance from ball carrier to $n$th closest player in team defending, with $n=1...11$
- Speed of $n$th closest player in team with ball possesion, with $n=1...11$
- Speed of $n$th closest player in team defending, with $n=1...11$

The target $Y_t$ is defined as the difference between the current yardline location of the football, and the yardline location of the football at the end of the play. Other variables are left for later iterations of the model.

To train the model, a sample of 10 random frames was taken from every punt and kickoff return from the 2018-2019 season, considering only frames from the moment the returner catches the ball and before the play ends. It is important to note that our model considers frames that may differ from the moment the return starts, and it is this that makes it robust enough to be used in any kind of NFL play where a ballcarrier is involved. 

To produce the estimates for predicted yards $PY$, a machine learning algorithm *XGBoost* was used. It uses decision trees to calculate probabilities, and its non linear nature helps it find relationships between variables that could be difficult to spot for a human. In practice *XGBoost* is one of the most common statistical models used in the industry because of its robustness and predictive power in all kind of contexts. 

Our model has error metrics of $RMSE=6.748$ and $MAE=4.055$ estimated with a test data set, which implies its prediction errors are not expected to be greater than 7 yards in most cases.




The most important variables in the model and their approximated contributions to predicting more or less yards are shown using SHAP.
## <img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/var_importance.png" alt ="Punt Returner Leaderboard" style='width: 800px;'>


# Model Results

After we have seen how our model is built, we can go and evaluate which players have had the best results in 2020. With Predicted Yards Model, we can analyze both the Punt and the Kickoffs. We only considered returners with at least 10 attempts. So let's start.

### Punt Returners

In an attempt to estimate a returner's *True* PYOE, we fit their average PYOE using Bayesian bootstraping. This technique, which calculates a statistic more accurately by iterating and removing random observations, gives us a better idea of a player's PYOE during the 2020 season. Ideally, a boxplot for a returner should look as compact as possible, while having a high average. Special teams strategy may also want to prioritize returns with a high amount of PYOE every now and then, which is represented by the arms of the box plot.

## <img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/pr_leaderboard.png" alt ="Punt Returner Leaderboard" style='width: 800px;'>

At the top of our ranking the best returners we find the Patriots Player, Gunner Olszewski.  He scored an avg. of 10.21 PYOE in his 20 punt returns, one of which returned in touchdown. His average is above all others and only with an avg. of 6.38 PYOE we find the runner-up, Hunter Renfrow (Las Vegas Raiders). Former Detroit Lions player, now in Jacksonville, Jamaal Agnew is in third place with a PYOE avg 6.19. Jakeem Grant (Miami Dolphins) is the player with the highest PYOE in single play 79.80.  

If the Tampa Bay Buccaneers won the Super Bowl in 2020 they certainly don't owe it to their punt returners. Using our model at the bottom of the ranking, we can find Kenyon Barber and Jaydon Mickens. Their PYOEs are negative with avg -2.56 and -3.58. We can also see the star wide receiver Ceedee Lamb with negative PYOE. Far fewer punt returned this year
and most likely he shouldn't do that again in the next few seasons.

## <img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/892225fd9d7f3c3d87012f1ecad5750ac1f26fe5/punt%20returner.png" alt ="Punt Returner Leaderboard" style='width: 1500px;'>

Our model shows that not all Punt Returners justify keep gettin attempts, as is the case with Alex Erickson or Nsimba Webster. Other coaches made switches during the season, which in a way confirms that their expertise was giving a similar judgement to their attempts. Ideally a punt returner has ocassional bursts while mantaining consistency in plays where they are unable to gain many yards.  

<img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/pr_acum_leaderboard.png" alt ="Punt Returner Leaderboard" style='width: 800px;'>


    
    

### Kick Returners

Our model can also be used to evaluate kickoff returns and again we use the limit of 10 attempts to qualify returners. Some teams use a player to return both punt and kickoffs, so we can see how Jamal Agnew and Deonte Harris appears in the top-10 in this ranking as well.

## <img src="https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/kr_leaderboard.png" width="800">

In the top-3, we find players with a PYOE value above 6 yards. Leading the ranking, Byron Pringle (Chiefs) followed by Nasir Adderley (Chargers), they only have a dozen attempts, while in third place we find the more consistent Brandon Wilson with 24 returns and a PYOE of 6.18. The Bengals player also scored the highest PYOE with a value of 79.05. Among the worst players in kickoff returns we find some confirmations and some surprises. Diontae Spencer was fourth in the punt return, and is now in last place. In the ranking of the worst players, DeAndre Carter (Bears) and Donovan People-Jones (Browns) are confirmed.


## <img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/892225fd9d7f3c3d87012f1ecad5750ac1f26fe5/kickoff.png" alt ="Punt Returner Leaderboard" style='width: 1500px;'>


<img src= "https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/kr_acum_leaderboard.png" alt ="Punt Returner Leaderboard" style='width: 800px;'>


### Special Teams Units

We looked into how Special Teams Units stacked against each other leaving individual players aside. Here, the Colts scored higher than the Bears, even if the latter had an all-time return legend in Patterson returning most attempts. This implies that Colts had a more solid kickoff return unit, that depended less on the genious of a specific player. The Saints scored higher than most teams, and had two individual players on our top 10.

## <img src="https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/kr_team_leaderboard.png" width="800">


 Surprisingly, the Bengals had one of the best kick return units, in spite of having one of the worst punt return units. A surprising finding particularly when both units are often thought of as interchangeable. The opposite happened with the Patriots, which never really got their Kick return game going, while their Punt Return units was one of the best in 2020.
 
 Saints special teams scored well in every category, and this probably speaks that they were the best coached Special Teams unit during that time.

## <img src="https://raw.githubusercontent.com/ecastillomon/databowl22/main/output/pr_team_leaderboard.png" width="800">


# Conclusions and Further Work

Mostly due to the lack of offensive plays and the nature of the competition, our model was tested using only punt and kickoff returns. The results appear to confirm some of the prior beliefs we had about returners, while still offering new insights that match some of the actions that NFL coaches took during the season. The model appears to be very accurate when predicting the yards to be gained at any moment of action in a kickoff or punt return, and it remains to be seen if the model will continue to be useful when used in different contexts such as rushing plays, or yards gained after a catch. The use of player speed as a feature is perhaps the only problem that could arise when using the model in another football league or division, as the magnitude of this variable will undoubtely change in another context. 

Other variables had been considered that could make the model more accurate, without sacrificing its benefits. Among them were features that indicated if a player is blocked, the number of players of the opposite team in a certain radius, the field of vision of the ballcarrier, and the acceleration of players. If anyone is interested in further evolving the model, the code used to produce the datasets and train the model are provided in the model repository, and anyone interested can attempt to replicate it or contact us. 




##### Model Repository
https://github.com/ecastillomon/databowl22/

##### Contact
[Esteban Castillo](est092@gmail.com) : [@patpAItriot](https://twitter.com/patPaitriot) 

[Andrea Casiraghi](acasi84@gmail.com) : [@andreacasiragh1](https://twitter.com/andreacasiragh1) 