# Decision-Making in Punt Returns

## Analyzing Decision-Making on Returnable Punts

By Tabor Alemu and Vinesh Kannan


# Introduction

When a punt returner sees that a punt is returnable, they have three choices:

- Return: Catch the ball and try to advance it for better field position.
- Fair Catch: Make the catch and accept that field position.
- Bail: Let the ball land and hope for good field position.

Put yourself in the shoes of Corey Clement, punt returner for the Philadelphia Eagles. Which action would you choose on this punt?

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/Clement%20Return%20Decision.gif?raw=true" />

While the ball is still in the air, Clement signals to his teammate to stay away, so that they do not accidentally touch the ball and give Trenton Cannon, the nearby gunner for the New York Jets, a chance to recover.

But after the ball takes a bounce in favor of the punting team, threatening to pin the Eagles back even deeper, Clement changes his decision.

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/Clement%20Return%20Muff.gif?raw=true" />

The result is disaster. Clement fails to secure the ball and Cannon recovers it, setting up the Jets’ only touchdown of the game. Even the announcer remarks that it was a bad decision, but it is easier to say that as an observer than from in the heat of the play.

We use the NFL’s Next Gen Stats (NGS) tracking data to form a better picture of what the returner faced on the field to analyze punt returner decision-making.

We developed a model that uses the pre-arrival frame to predict whether a returnable punt will net the return team a loss or zero yards from the end of the kick, a measure of return difficulty. This Jets-Eagles play was in the hold-out set, meaning that our model was never trained or tuned on it. Our model predicted a 76% chance of this play resulting in a loss or no gain from the end of the kick. This indicates not only that Clement's original decision to stay away was correct, but that his second decision to try and limit the losses was a big risk.

Our analysis provides:

- Heatmaps to help punting teams decide where to punt.
- Average return yards plots to help receiving teams decide whether to return or bail.
- Ranking of punt returners who turn the most difficult punts into gains.
- Ranking of returning teams who turn the most difficult punts into gains.


# Part 1: Tendencies and Trade-Offs

Throughout this analysis, we focus only on returnable punts, which we define as a punt that lands or is caught in the field, including in the endzone. More details on how we filtered returnable punts can be found in the methodology section.

First, we formed a baseline for what parts of the field punt returners generally return, fair catch, or bail by using tracking data from 2018 to 2020 returnable punts 

> Kicking teams can use this information to decide where to punt if they want to force a returner to fair catch or bail.

In the following plots, the receiving team’s end zone is on the left. Kicking team yard line is relative to the kick team’s  endzone, for example, a kicking yard line of 60 means the receiving team’s 40 yard line.

We constructed multiple “heatmap” models depicting the most telling parts of the field that would yield better results for punt specialists. By balancing averages of previous punt results, we have made diagrams that will tell returners where on the field they should call a fair catch, bail in the hopes for a touchback, or return the ball. 

## Figures 1-3. Heatmap of returnable punt decisions based on field position.

This figure separates three punt return categories (fair catch, bail/downed, and return) and is plotted to show possible destinations for which these categories occur at when they first hit the ground. These plots are subject to where the line of scrimmage is located prior to the punt, separated by 10 yard bucket increments. 

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/Return_plot.png?raw=true" />
<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/fc_plot.png?raw=true" />
<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/Bail_plot.png?raw=true" />

This can be handy for punters to understand ball placement when neutralizing dangerous and explosive punt returns. Therefore forcing desired outcomes from certain punts. The model above shows destinations of punts snapped between the punting team’s own 30 to 40 yard line resulting in fair catches throughout the 3 season dataset. Tendencies can be described, telling us that the “sweet spot” for forcing fair catches is between the 15 to 25 yardline. 

## Figure 4. Average return yards gained for returnable punt decisions based on field position.

Next, we analyzed the average return yards gained for each type of decision, depending on field position, based on returnable punts from the 2018 to 2020 season.

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/mix_plot.png?raw=true" />


The second model also looks at the same three categories and examines different positional situations to obtain the highest net yards. While examining, the main goal is to get the most net yards from when the punt first reaches ground level to when the offense starts their drive. The model is also independently evaluated based on the line of scrimmage prior to the punt. This model is extremely useful for punt returners and special team coordinators in understanding what decisions should be made before fielding a punt. For example, if a team is punting to a returner from their own 35 yardline, this model tells specialists to almost always attempt to return the ball when comfortable and outside their own 5 yard line. On the contrary, if the returner does not feel comfortable to return a punt due to pressure from the punt coverage team, the model tells coaches that calling for a fair catch inside of the returner’s own 14 yardline will yield a lower average net yardage than bailing from catching the ball with the intent for the ball to be a touchback. This gives returners a step up on the competition, using analytics to get the best field position for his offense.


# Part 2: Ranking Difficult Returns

To evaluate punt returner decision-making, we developed an estimate of return difficulty  before the outcome is known. More specific details can be found in the methodology section.

For each punt, we identify the decision frame, the point in the tracking data one second before the ball lands. We assume that this is the last point when the returner can use information about their surroundings to decide whether to return, fair catch, or bail. Returners may decide earlier (or later, at their peril), but we will use this frame to evaluate the difficulty.

We train a machine learning model to classify returnable punts that result in zero or negative return yards gained by the returning team, from the end of the kick. This model estimates which punts result in losses for the returning team or that punt returners choose not to return. Our justification for combining both loss and neutral plays into one target is that both outcomes represent situations where the expected value of returning is low.

We rank punt returners based on how often they face, return, and gain from returnable punts that our model flags as difficult to return for a gain.

## Figures 5-6. Top Returners by Difficulty

Here are the top returners, ranked based on how often they turn a difficult punt into a gain.

<img style="max-width: 700px;" src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/player_ranks.png?raw=true" />

This plot compares how often returners return difficult returnable punts with how often those returns gain yards from the end of the kick.

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/player_plot.png?raw=true" />

## Figures 7-8. Top Teams by Difficulty

Here are the top teams, ranked based on how often their returners turn a difficult punt into a gain.

Originally, the Raiders were the top team, but after merging their Las Vegas (LV) and Oakland (OAK) records, the Cleveland Browns (CLE) edged them out for the top spot.

<img style="max-width: 700px;" src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/team_ranks.png?raw=true" />

This plot compares how often teams return difficult returnable punts with how often those returns gain yards from the end of the kick.

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/team_plot.png?raw=true" />

# Code

- [Returnable Punt Data Processing](https://www.kaggle.com/vingkan/process-punt-return-decision-data)
- [Model Training and Evaluation](https://www.kaggle.com/vingkan/model-training-returns-for-loss)
- [Difficult Returnable Punts Analytics](https://www.kaggle.com/vingkan/analytics-gutsy-returners)
- [Tendencies and Trade-Offs Analytics](https://www.kaggle.com/talemu/analytics-returns-for-loss)

Thank you for organizing this challenge! Please see the appendix for methodology details and model scores.

# Appendix: Methodology

The first half of our analysis uses the processed tracking data to provide a baseline for what parts of the field punt returner can expect to return, fair catch, or bail.

The second half of our analysis predicts which punts will be difficult to return for a gain and applies the predictions to rank players and punts.

## Data Processing

Prior to our analysis, we applied the following data processing steps:

1. Reorient tracking data so that the receiving team always advances from the left of the screen to the right.
1. Filter plays to only include punts that are returnable, with no penalties.
1. Assign a decision based on the result of the play: return, fair catch, or bail.

## Machine Learning

For each returnable punt:

Step 1: Identify the decision frame.

- The decision frame is the latest point by which the returner has to decide whether or not to return the ball, that comes before the ball has actually arrived.
- Identify the frame of the first returnable event, the first time the punt lands or is caught in the field of play.
- Move back one second (or ten frames) from that frame to get the decision frame.
- Only train the model on tracking data from the decision frame for model inputs.

Step 2: Split the punts into cross-validation sets.

- Keep 50% of the data for training, keep 25% for tuning (validation), and keep 25% as a hold out set (test).
- Only one frame per play is used, the decision frame.
- Any play can be in any split. We decided not to stratify by season because the rates of different decisions were comparable across seasons.

Step 3: Create a target variable for model training.

- Create a target variable isZeroOrLoss that is false if the receiving team gained yards from the end of the kick, and true if they lost yards or gained no yards.
- The majority of returnable punts result in zero gain, including many fair catches.
- Our justification for combining fair catches with returns for zero or loss is returners fair catch in a situation where they do not want to return or let the ball land, so both fair catches and negative returns represent situations where returners face difficult punts and the expected value of returning is low.

Step 4: Create features based on tracking data to use as model inputs.

For all input features, we use the ball location from the decision frame, when it is still in the air.

It could be valid to use the ball’s eventual landing spot, under the assumption that the returner and other players are able to estimate where the ball will land, but we choose to use the decision frame location for a more conservative estimate.

We derive these ten features from the tracking data, during the decision frame:

- Ball Yard Line: Location of the ball, in yards from the receiving team goal line.
- Closest Defender Distance: Distance in yards from the ball to the closest member of the kicking team/
- Defenders Within Radius: Number of members of the kicking team within a two yard radius of the ball.
- Blockers Within Radius: Number of members of the receiving team besides the returner within a five yard radius of the ball and who are ahead of the returner.
- Closest Defender Speed Upfield: The component of the closest defender’s speed that goes from goal line to goal line, negative if heading towards the receiving team goal line, in yards per second.
- Closest Defender Speed Lateral: The component of the closest defender’s speed that goes from sideline to sideline, always positive, in yards per second.
- Distance to Sideline: Distance in yards from the ball to the closest sideline, with a value of zero if the ball is out of bounds.
- Is Inside Own Endzone: Value of one if the ball is over the receiving team endzone during the decision frame, zero otherwise.
- Is Inside Own 10: Value of one if the ball is inside the receiving team 10 yard line during the decision frame, zero otherwise.
- Is Inside Own 20: Value of one if the ball is inside the receiving team 20 yard line during the decision frame, zero otherwise.

Step 5: Train and evaluate models.

- We employed three types of models: logistic regression, random forest classifier, and support vector machine classifier.
- All models are binary classifiers that support a prediction probability score.
- We scored and compared models on the unseen validation data.
- We evaluated models using precision, recall, F1-score, accuracy, and area under the receiver operating characteristic (ROC) curve.
- We did not perform any automatic hyperparameter tuning.
- The random forest model performed best, but we chose the class-balanced logistic regression model because it had strong, similar precision and recall scores, as well as comparable ordering power (by area under ROC curve) to the random forest. We wanted to prioritize a high-bias model over a high-variance model, and a model whose results would be more explainable to coaches and players. The visual simplity of the heatmap of the logistic regression model compared to the random forest model bears this out.

## Figure 9: Final Model Performance

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/model_scores.png?raw=true" />

Final model scores on validation data:

```
Accuracy  = 0.729
Precision = 0.776
Recall    = 0.768
F1-Score  = 0.772
ROC AUC   = 0.720
```

## Figure 10. Heatmap for Clement Play

Heat map of where the ball could have landed on the Clement return play and the predicted probability of a loss or no gain (pink) vs a gain (green).

<img src="https://github.com/vingkan/nfl-big-data-bowl-2022-supplementals/blob/main/clement_map.png?raw=true" />