# NFL Big Data Bowl 2022: Punt Return Yards Over Expected (PRYOE)

***

## Outline of Notebook

1. Introduction
2. Exploratory Data Analysis
3. Feature Engineering
4. Baseline Model and Model Tuning with GridSearch CV
5. Final Model Creation
6. Production Run
7. Results
8. Discussion
9. Acknowledgements

<br>
<br>

***

# Introduction

- **Name: Sean Sullivan**
- **Affiliation: URAM Analytics**
- **Twitter Handle: @URAM_Analytics**
- **Website: https://www.uramanalytics.com/**

With the NFL Big Data Bowl 2022's focus on Special Teams, I wanted to focus my efforts on creating a metric and corresponding analysis to evaluate punt returns. Punts that result in a return only account for about 38% of punts from the 2018, 2019, and 2020 seasons. So while the plays are fairly infrequent, they could have a large impact on a game. 

To do this, I modeled how many yards, on the x axis, that a returner would gain on a frameID level, for a play that was a punt and had a return. To get the data prepared for modeling, I went through a very thorough feature engineering process that is detailed below. With the data in hand, I was able to generate a very accurate model to predict how many yards that a punt returner would be expected to gain, given the current situation as of that frame. By "current situation" I mean the location of the returner, how fast they are moving, their direction and orientation, how far away kick return and kick defending players are from the returner and their movement attributes (speed, acceleration, etc), and game specific attributes (time, score, etc). The model was trained on data from 2018 and 2019 and the final model was ran on 2020 data to generate the metric results.

To create the metric, I subtracted the predicted yards gained in a frame from the actual yards gained in a frame (essentially the error/residual). A positive value would suggest that the returner gained more yards than "expected" whereas a negative value would suggest that the returner gained less yards than expected. I believe that this is appropriate given how accurate the model was. If it was less accurate, then I do not think this would have been worth the effort. Since this was done on a frame level, I aggregated the results by Game and Play to know the results for a given punt return play. 

With the results in hand, evaluation could be done on a play level, a player level, and a team level. Please continue to read the notebook to learn more! 


*Note: Not all of the work was completed in python. In fact, all of the feature engineering was conducted in GCP's BigQuery. Links to all queries are provided in the appropriate sections.*

<br>
<br>

***

# Load Data and EDA

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import time

from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score,mean_squared_error, mean_absolute_error

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
# Load plays data and perform basic exploration of punt data

plays = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')

In [None]:
# What is the shape of the plays file (rows x columns)

plays.shape

In [None]:
# What are the columns present?

plays.columns

<br>

In [None]:
"""
Using this guide, we know what each of the columns are, but will provide a brief exploration (https://www.kaggle.com/c/nfl-big-data-bowl-2022/data)

Let's evaluate a few key variables from the plays table
"""
# Evaluate playResult, but only on a punt. We do this by filtering the dataframe ('plays')

punt_plays = plays[plays['specialTeamsPlayType'] == 'Punt']

punt_plays[['playResult']].describe()

In [None]:
# Visualize the distribution of playResult for specialTeamsPlayType = 'Punt'

plt.hist(punt_plays['playResult'], bins=20)
plt.show()

**Interpretation:** This shows us that on the 5,991 plays in this table that are a specialTeamsPlayType = 'Punt', the kicking team is gaining about 40 yards on the play. The median value (50th Percentile) is nearly identical to the mean, the data is nearly symmetrical. The histogram confirms this, and shows outliers to the left (slight left skew...the distribution is non-normal). 

<br>

In [None]:
# What about this, but from the point of view of the returning team? For this, we will evaluate 'kickReturnYardage'

punt_plays[['kickReturnYardage']].describe()

In [None]:
# Visualize the distribution of kickReturnYardage for specialTeamsPlayType = 'Punt'

plt.hist(punt_plays['kickReturnYardage'], bins=20)
plt.show()

**Interpretation:** This shows us that on specialTeamsPlayType = 'Punt', the returning team is gaining about 9 yards on the play. Similar to playResult, the median value (50th Percentile) is close to the mean. The histogram confirms shows outliers to the right (slight right skew...the distribution is non-normal). This result is not especially surprising given the nature of punt returns. 

<br>

In [None]:
# What kind of results are associated with punt plays? We will evaluate specialTeamsResult with the same filtered dataframe as above. 
# Since it is categorical, we will use a difference method. 

punt_plays['specialTeamsResult'].value_counts()

In [None]:
# Let's visualize the value count results

ax = sns.countplot(x='specialTeamsResult', data=punt_plays, order = punt_plays['specialTeamsResult'].value_counts().index)
ax.set_title('Punt Play Result Count')
ax.set_xlabel('Result Type')
ax.set_ylabel('Count')
plt.xticks(rotation=90)
plt.show()

**Interpretation:** The most common results of a punt play are return, fair catch, and downed. For this project, we are only interested in working with punts that are associated with returns.  

<br>

### What about the tracking data mentioned earlier? What does that look like?

In [None]:
# Load 2020 Tracking Data file

from google.cloud import storage

tracking_2020 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2020.csv')

In [None]:
tracking_2020.head()

<br>

**Let's explore a few key fields: s (speed), dis (distance traveled from prior time point in yards), and event (tagged play details)**

In [None]:
# s exploration

tracking_2020['s'].describe()

In [None]:
# Visualize the distribution of s (speed)

plt.hist(tracking_2020['s'], bins=20)
plt.show()

**Interpretation:** It's important to understand the scale of this data before interpreting. The tracking data is taken at a frameID level - which is a snapshot of a play in action. It's also important to note that plays do not begin with frameID = 1 on the snap and the last frameID with the end result. On all plays, the snap event is at some later frame and the end event is not the last frame; meaning that there is extra movement recorded before and after a play begins and ends. So, it is to no surprise that most speed values are quite low and that there are a few outliers of very fast player movement. 

<br>

In [None]:
# dis exploration

tracking_2020['dis'].describe()

In [None]:
# Visualize the distribution of dis (distance)

plt.hist(tracking_2020['dis'], bins=20)
plt.show()

**Interpretation:** With the added context of the level of granularity that this data provides, it is not a surprise that on a play at the frame level, the yards gained are quite small. Especially when accounting for frames before the ball is snapped. 

<br>

In [None]:
# Let's look at events

tracking_2020['event'].value_counts()

**Interpretation:** With being able to see events at the frame level, I used this to only look at punt return plays between the frame where the punt was received and the end event of the play (tackle, touchdown, fumble, etc. 

<br>
<br>

***

# Feature Engineering 

I performed all of my data prep and feature engineering in Google Cloud Platform's BigQuery data warehouse. Since none of this was performed in python, I will link you to a text file in my GitHub repository that contains all of the code (with comments) that I used to ultimately create my datasets for model development and production. However, in this section, I will detail my thought process and provide a data dictionary of the final features created. 


Since this project was focused on modeling returner yards on a punt return, I needed to have the rows contain as much detail as possible to create a robust enough model. With the level of granularity provided by the Tracking files, I knew that this is where I would spend the majority of my efforts. After vigorous exploration, I determined that only focusing on frameID level rows, with the returner's current tracking situation (x, y, s, etc) and each of the other players' data on the same row would be necessary for modeling. To clarify, I needed to know the returner's tracking data and each of the players' tracking data in relation to the returner's at the frameID level. I included some additional game situation variables (like score differential, time situation, etc) to add in context to the play. Below, I provide a high level overview of the process I took and below that, I will provide the file used for training and testing the model.  

Link to Code: https://github.com/seanwsullivan1/NFL-Big-Data-Bowl-2022/blob/main/2022_NFL_BDB_SQL_DataPrep_FeatureEngineering.txt

<br>

## SQL Feature Engineering Process:

- **Aggregate all tracking data from 2018, 2019, and 2020 files that were associated with plays that were: SpecialTeamsPlayType = 'Punt' and specialTeamsResult = 'Return'**


- **Determine frameID, by gameID and playID where the ball was snapped, where the final event (not a None event), and the frameID one past the final event frameID.** This is necessary because I needed the x value for the next frame after the final event so that I could determine yards gained. This is because the "situation" of the frameID at frameID = 2 is the result of what happened in frameId = 1. 


- **Use the table generated in Step 2 to filter out frameIDs from the table generated in Step 1**


- **With the table generated in Step 3, grab the x value from the following frameID and attach it as a new column.** Using the example above, now the x value at frameID = 2 would be on the same row as frameID = 1. 


- **I wanted to provide movement features for each of the 11 players on the kicking team and the 10 additional players (not including returner) on the returning team.** To do this, I needed to assign an identity at the beginning of the play so they could be followed throughout. I did this by determining if a player was on the kicking or receiving team and labled them "kicking_NUMBER" or "receiving_NUMBER" based on their sorting on the Y axis at the frameID where the ball was snapped. 


- **Join all 11 kicking team players' and 10 receiving players' X, Y, S, A, DIS, O, and DIR values to the same row as the returner's for each frameID within a gameID and playID.**


- **Create features for: the distance, in yards, of each player in relation to the returner as of that frameID; count, in buckets, the number of kicking team players or returning team players that are within 5, 5-10, 10-15, and 15+ yards from the returner, and the spread of the kicking and returning teams on the x and y axis.** Additional features were created from some of the situational variables such as Quarter, Advantage (which team is winning), Score Differential, and Time Situation. 


- **With the table generated in Step 7, I: filtered the data to only include frames between punt received and end event, dropped features that would not be needed, created my dependent variable (x_gained_dv), and added team names.** Using the playDirection field and the x value from step 4, I was able to determine if the returner movement was a positive or negative gain on the x axis. 


- **With the table generated in Step 8: filter out plays that had penalties and split into two tables (one for training/testing the model and one for generating the analysis/metric.** Plays from 2018 and 2019 were used for the model training/testing the 2020 data was saved for production. 

<br>

In [None]:
# Load training data

training_data = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/NFL_Big_Data_Bowl_Fall_2021_training_data.csv')

In [None]:
# Quick look at the file (transposed)

training_data.head().T

<br>

## Quick Data Dictionary for Created Features

* **Dis Upated and X Updated**: The dis and x values from the next frameID for a player within a game and play. Dis Updated was not used in final iteration and X Updated was but only in the creation of the dependent variable. NOTE: these are not used as features in modeling. 

* **Kicking/Receiving Distance**: Distance, in yards, between player indicated and the returner. 

* **Kicking/Receiving S**: Speed, at that frameID, of the player indicated.

* **Kicking/Receiving A**: Acceleration, at that frameID, of the player indicated.

* **Kicking/Receiving O**: Orientation, at that frameID, of the player indicated.

* **Kicking/Receiving Dir**: Direction, at that frameID, of the player indicated.

* **Kicking/Receiving Buckets**: Bucket 1: # of players within 0-5 yards, Bucket 2: # of players within 5-10 yards, Bucket 3: # of players within 10-15 yards, Bucket 4: # of players within 15+ yards

* **X Gained DV**: Number of yards, on x axis, gained (either positive or negative) as a result of the situation of the current frameID. **THIS IS THE DEPENDENT VARIABLE**

<br>

**Exploration of Dependent Variable: X Gained DV**

In [None]:
# X Gained DV

training_data['x_gained_dv'].describe()

In [None]:
# Visualize the distribution of X Gained DV

plt.hist(training_data['x_gained_dv'], bins=20)
plt.show()

In [None]:
from scipy.stats import shapiro

#perform Shapiro-Wilk test
shapiro(training_data['x_gained_dv'])

**Interpretation:** The mean and median are very close (near 0.2) which suggests data symmetry. Evaluating the histogram, shows a nearly normal appearing distribution. However, using the Shapiro-Wilk test of normality, the test statistic is 0.99 and the p-value is less than 0.05. Thus, we reject the null hypothesis and have sufficient evidence to determine that the dependent variable does not come from a normal distribution. 

<br>
<br>

***

# Load training/test data, hyper-parameter tuning, and model evaluation

Note: Model was trained/tested on 2018 and 2019 data. The final model was ran on 2020 data and this is what was used to generate the metric results. 

In [None]:
training_data = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/NFL_Big_Data_Bowl_Fall_2021_training_data.csv')

In [None]:
# Get shape of file

training_data.shape

In [None]:
# Split into numeric and categorical variables for evaluation

num_var = training_data.drop(['gameID', 'playID', 'frameID', 'nflID', 'displayName', 'event', 'dis_updated', 'x_updated',
                   'punt_received_frame', 'ultimate_id','quarter', 'advtanage', 'team', 'time_situation',
                   'receiving_team_name', 'kicking_team_name'], 1)

cat_var = training_data[['playDirection', 'quarter', 'advtanage', 'team', 'time_situation']]

In [None]:
# Evaluate Numerical Variables

num_var.describe().T

In [None]:
# Categorical Variable Exploration

for i in cat_var:
    print()
    print(cat_var[i].value_counts())
    print('---------------------------------------------')

In [None]:
# Check for Nulls

# Numeric Variable Check
num_var.isnull().values.any()

In [None]:
# Categorical Variable Check
cat_var.isnull().values.any()

<br>
<br>

## Split into X/Y and Train/Test Sets

In [None]:
# x variables

x = training_data.drop(['gameID', 'playID', 'frameID', 'nflID', 'displayName', 'event', 'punt_received_frame', 
             'ultimate_id', 'dis_updated', 'x_updated', 'x_gained_dv', 'receiving_team_name', 'kicking_team_name'],1)

In [None]:
# y variable

y = training_data['x_gained_dv']

In [None]:
# Get dummy variables for categorical x variables

x = pd.get_dummies(x)

In [None]:
# Check x columns

list(x.columns)

In [None]:
# Print shapes of x and y

print(x.shape)
print(y.shape)

In [None]:
# Train/Test Split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3, random_state = 21)

print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

<br>
<br>

## Baseline XGBoost Regressor Model

Before embarking on tuning the model, it's important to understand what the baseline performance is. We evaluate that here before moving on to tuning the model. 

In [None]:
# Run model with timer

start = time.time()

model = xgb.XGBRegressor(seed=21)
model.fit(x_train, y_train)

print('Runtime: ', time.time() - start)

In [None]:
# Quick runtime of ~26 seconds. Let's evaluate performance with R^2, MAE, and RMSE


# Evaluate
y_pred = model.predict(x_test)

# R^2
print('R^2 = ', r2_score(y_test, y_pred))
# MAE
print('MAE = ', mean_absolute_error(y_test, y_pred))
# RMSE
print('RMSE = ', np.sqrt(mean_absolute_error(y_test, y_pred)))

<br>

**Interpretation:** 

**R^2** indicates the percentage of the variance in the dependent variable that the independent variables explain. The results are between 0 and 1 and the closer to 1, the better. Seeing that this baseline model is at 0.98 is incredibly strong and will be difficult to improve. 

**MAE** measures the average magnitude of the errors in a set of predcitions. Simply put, it is the average over the test sample of the absolute differences between prediction and the actual observation. MAE is in the unit of the dependent variable, so having it at 0.03 is quite accurate. 

**RMSE** is the square root of the average of squared differences between the prediction and the actual observations. Like MAE, it is in the unit of the dependent variable. A RMSE of 0.17 is also quite strong. 

**Using Cross Validation, I attempted to improve the performance of the model.**

<br>
<br>

## Cross Validation

For this model, I used an "informed" Grid Search Cross Validation process. By informed, I mean that I determined reasonable values to test with GridSearchCV and recorded the results. I took an iterative process to better understand how each hyper-parameter affected the model. The hypter-parameters tested and their associated values are provided below. I will not show the results of each run but will provide an example. 

- **Max Depth**: 3, 4, 5, 6, 7, 8, 9, 10
- **Learning Rate**: 0.01, 0.07, 0.13, 0.19, 0.25, 0.3
- **N Estimators**: 100, 280, 460, 640, 820, 1000
- **Colsample_bytree**: 0.5, 0.6, 0.7, 0.8, 0.9, 1
- **Sub Sample**: 0.5, 0.6, 0.7, 0.8, 0.9, 1
- **Min Child Weight**: 1, 2, 3, 4, 5, 6
- **Alpha**: 0, 1, 5, 10, 15
- **Lambda**: 0, 1, 5, 10, 15
- **Gamma**: 0, 1, 2, 3, 4, 5

**Note: I will not run the cell below on Kaggle but the code does work. Example is found on my GitHub**

Link: https://github.com/seanwsullivan1/NFL-Big-Data-Bowl-2022/blob/main/NFL_Big_Data_Bowl_2022_SeanSullivan.ipynb

In [None]:
# Make sure you import GridSearchCV


params_testing = {
        'gamma': [0, 1, 2, 3, 4, 5]
        }


model = xgb.XGBRegressor(seed=21, verbosity = 0)


grid_search = GridSearchCV(
        estimator = model, 
        param_grid = params_testing, 
        scoring = 'neg_mean_squared_error', 
        n_jobs = 1, 
        cv = 10,
        verbose = True
        )


start = time.time()
grid_search.fit(x_train,y_train)
print('Runtime: ', time.time() - start)


print("Best parameters:", grid_search.best_params_)
print("Lowest RMSE: ", (np.sqrt(-grid_search.best_score_)))

<br>

**Interpretation:** You can see that out of the six options, that 0 produced the lowest RMSE of 0.04. With these results in hand, I would note the run time, the results, and move forward with testing the next hyper-parameter. 

<br>

## Hyper-parameters Chosen

- **Max Depth**: 10
- **Learning Rate**: 0.19
- **N Estimators**: 1,000
- **Colsample_bytree**: 1
- **Sub Sample**: 0.5
- **Min Child Weight**: 6
- **Alpha**: 1
- **Lambda**: 10
- **Gamma**: 0

<br>
<br>

## Train Tuned Model and Evaluate Performance

In [None]:
# Train Model


start = time.time()

model = xgb.XGBRegressor(seed = 21, max_depth = 10, learning_rate = 0.9,
                         n_estimators = 1000, colsample_bytree = 1, sub_sample = 0.5,
                         alpha = 1, lamba = 10, gamma = 0, min_child_weight = 6)
model.fit(x_train, y_train)
print('Runtime: ', time.time() - start)

In [None]:
# Generate Prediction

y_pred = model.predict(x_test)

In [None]:
# Evaluate Performance on Test Set

# R^2
print('R^2 = ', r2_score(y_test, y_pred))
# MAE
print('MAE = ', mean_absolute_error(y_test, y_pred))
# RMSE
print('RMSE = ', np.sqrt(mean_absolute_error(y_test, y_pred)))

<br>

**Interpretation:** Interestingly enough, going through the hyper-parameter tuning process (that I chose) did not lead to a superior model to the base model. There are other methods of tuning models, but for this exercise, I will move forward with the base model and it's default settings since it's performance was superior. By superior performance, I mean that: the R^2 was higher and the MAE / RMSE were lower in the base model compared to the tuned model. 

<br>
<br>

***

# Final Model and Feature Importance

In [None]:
# Run model with timer

start = time.time()

final_model = xgb.XGBRegressor(seed=21)
final_model.fit(x_train, y_train)

print('Runtime: ', time.time() - start)

In [None]:
# Confirm performance
y_pred = final_model.predict(x_test)

# R^2
print('R^2 = ', r2_score(y_test, y_pred))
# MAE
print('MAE = ', mean_absolute_error(y_test, y_pred))
# RMSE
print('RMSE = ', np.sqrt(mean_absolute_error(y_test, y_pred)))

In [None]:
# Feature Importance
from xgboost import plot_importance

# plot feature importance
plot_importance(final_model, max_num_features=20)
plt.show()

<br>

**Interpretation:** The results of the feature importance plot is very helpful for the interpretation of the model. Seeing that the most important features are ones that describe the state of the returner (direction, speed, x position, orientation, y position, and acceleration) makes sense. A returner's ability to gain yards certainly would appear to be dependent on their movement at that time. Additionally, seeing many kicking players' distance features show as highly important also passes the taste test. Having defenders near the returner would likely limit their ability to gain yards. It would work the other way too, where defenders being further away would make it more likely to enable a returner's ability to gain yards. I am a little surprised that the features that described the number of kicking team or returning team players within X yards of the returner did not show as more important. In conclusion, I am not totally surprised to see the results above. It has proven that the effort taken in the feature engineering stage was worth while. 

<br>
<br>

***

# Run 2020 Data Through Trained Model

In [None]:
# Load production data

production_data = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/NFL_Big_Data_Bowl_Fall_2021_production_data.csv')

In [None]:
# Get shape of file

production_data.shape

In [None]:
# How many punt plays are being evaluated?

len(pd.unique(production_data['ultimate_id']))

**I am running 604 punt plays that were associated with a return and that did not contain a penality called in the play. If you compare the count ran with similar logic to the Plays table, it is 608 unique plays. The reason for the difference is from lableing issues noticed in the data provided and I did not override the results due to the labeling issues not aligning with my data cleaning process.**

<br>

In [None]:
# Split into numeric and categorical variables for evaluation

num_var_prod = production_data.drop(['gameID', 'playID', 'frameID', 'nflID', 'displayName', 'event', 'dis_updated', 'x_updated',
                   'punt_received_frame', 'ultimate_id','quarter', 'advtanage', 'team', 'time_situation',
                   'receiving_team_name', 'kicking_team_name'], 1)

cat_var_prod = production_data[['playDirection', 'quarter', 'advtanage', 'team', 'time_situation']]

In [None]:
# Evaluate Numerical Variables

num_var_prod.describe().T

In [None]:
# Categorical Variable Exploration

for i in cat_var_prod:
    print()
    print(cat_var_prod[i].value_counts())
    print('---------------------------------------------')

In [None]:
# Check for Nulls

# Numeric Variable Check
num_var_prod.isnull().values.any()

In [None]:
# Categorical Variable Check
cat_var_prod.isnull().values.any()

<br>

**The production data looks good. Time to run through the model.**

In [None]:
# Create X and dummy variables

x_prod = production_data.drop(['gameID', 'playID', 'frameID', 'nflID', 'displayName', 'event', 'punt_received_frame', 
             'ultimate_id', 'dis_updated', 'x_updated', 'x_gained_dv', 'receiving_team_name', 'kicking_team_name'],1)

x_prod = pd.get_dummies(x_prod)

In [None]:
# Ensure correct columns in X

list(x_prod.columns)

In [None]:
# Y

y_prod = production_data['x_gained_dv']

In [None]:
# Run through model

y_pred_2020 = final_model.predict(x_prod)

In [None]:
# Take results, convert to list, and attach to DF

production_data['predicted_x_gained_dv'] = y_pred_2020.tolist()

In [None]:
# Generate difference of the two (Return Yards Over Expected)

production_data['x_gained_difference'] = production_data['x_gained_dv'] - production_data['predicted_x_gained_dv']

In [None]:
# Save to CSV

production_data.to_csv('all_2020_results.csv')

<br>
<br>

***

# Results Analysis: So What? Why Do We Care?

In [None]:
production_data.head()

In [None]:
players = production_data[['displayName', 'receiving_team_name', 'ultimate_id', 'x_gained_dv', 'predicted_x_gained_dv', 'x_gained_difference']]

In [None]:
player_results = players.groupby(['displayName', 'receiving_team_name'])['x_gained_dv', 'predicted_x_gained_dv', 'x_gained_difference'].sum()

In [None]:
# Sort by x_gained_difference (Return Yards Over Expected)

player_results.sort_values(by = ['x_gained_difference'], inplace = True, ascending = False)

In [None]:
# Get Return Count

return_count = players.groupby(['displayName', 'receiving_team_name']).ultimate_id.nunique()

In [None]:
# Merge Results, Rename Columns, and Look at Top 10 for Punt Return Yards Over Expected

player_merged = pd.merge(player_results, return_count, how = 'left', on = ['displayName', 'receiving_team_name'])

player_merged.columns = ['Return Yards Gained', 'Predicted Return Yards', 'Punt Return Yards Over Expected', 'Return Count']

player_merged.head(10)

In [None]:
# Create Return Yards Per Return, Predicted Yards Per Return, and Punt Return Yards Over Expected Per Return

# Return Yards Per Return
player_merged['Return Yards Per Return'] = (player_merged['Return Yards Gained'] / player_merged['Return Count']).round(2)

# Predicted Yards Per Return
player_merged['Predicted Yards Per Return'] = (player_merged['Predicted Return Yards'] / player_merged['Return Count']).round(2)

# Punt Return Yards Over Expected Per Return
player_merged['Punt Return Yards Over Expected Per Return'] = (player_merged['Punt Return Yards Over Expected'] / player_merged['Return Count']).round(2)

In [None]:
player_merged.head(10)

<br>

**Interpretation:**

As we saw earlier, the model was quite accurate for being able to predict how many yards a returner would gain on a punt return. So, it is reasonable to observe that Punt Return Yards Over Expected (PRYOE) is quite small given the smaller sample size of punt returns. In this case, Kalif Raymond of the Tennessee Titans had the most PRYOE +8.6 yards gained on 18 punt returns. This means, that he outgained what his total punt return yards were expected to be, by about 9 yards. 

Let's evaluate who has the most PRYOE per return, with at least 8 punt returns (average number of return attempts). 

<br>

In [None]:
# Filter for players with at least 8 returns. 

players_filtered = player_merged.loc[player_merged['Return Count'] >= 8]

In [None]:
# Sort and View Results

players_filtered.sort_values(by = ['Punt Return Yards Over Expected Per Return'], inplace = True, ascending = False)

players_filtered

<br>

**Interpretation:**

Evaluating the results on a per return basis, Kalif Raymond (Tennessee Titans), Andre Roberts (Buffalo Bills), and Diontae Spencer (Denver Broncos) are in a tier all to themselves nearly gaining +0.5 yards per return more than expected.  

Jabrill Peppers (New York Giants), Pharoh Cooper (Carolina Panthers), Marquez Callaway (New Orleans Saints), Gunner Olszewski (New England Patriots), and Jaydon Mickens (Tampa Bay Buccaneers) were in the next tier gaining nearly +0.3 yards per return more than expected. 

The players with the worst PRYOE per play were Christian Kirk (Arizona Cardinals), Mecole Hardman (Kansas City Chiefs), and Jakeem Grant (Miami Dolphins). 

**I will perform a deep dive on what made Kalif Raymond so successful at outperforming his punt returns and what made Jakeem Grant so poor at his. Grant is especially interesting because he ranks first for total yards gained in these results, but is the third worst returner on a PRYOE per play lens.**

<br>
<br>

## Case Study: Kalif Raymond and Jakeem Grant

In [None]:
# Evaluate Kalif Raymond's PRYOE

production_data[production_data.displayName == 'Kalif Raymond'].x_gained_difference.describe()

In [None]:
# Visualize Kalif Raymond's Returns

plt.hist(production_data[production_data.displayName == 'Kalif Raymond'].x_gained_difference, bins=20)
plt.show()

In [None]:
# Evaluate Actual Yards Gained

production_data[production_data.displayName == 'Kalif Raymond'].groupby(['ultimate_id'])['x_gained_dv'].sum().describe()

In [None]:
# Evaluate Actual Yards Gained Distribution

plt.hist(production_data[production_data.displayName == 'Kalif Raymond'].groupby(['ultimate_id'])['x_gained_dv'].sum())
plt.show()

<br>

**Look at the same thing, but for Jakeem Grant**

In [None]:
# Evaluate Jakeem Grant's PRYOE

production_data[production_data.displayName == 'Jakeem Grant'].x_gained_difference.describe()

In [None]:
# Visualize Jakeem Grant's Returns

plt.hist(production_data[production_data.displayName == 'Jakeem Grant'].x_gained_difference, bins=20)
plt.show()

In [None]:
# Evaluate Actual Yards Gained

production_data[production_data.displayName == 'Jakeem Grant'].groupby(['ultimate_id'])['x_gained_dv'].sum().describe()

In [None]:
# Evaluate Actual Yards Gained Distribution

plt.hist(production_data[production_data.displayName == 'Jakeem Grant'].groupby(['ultimate_id'])['x_gained_dv'].sum())
plt.show()

<br>

**Interpretation**: 

A quick look at punt returns for Raymond and Grant show that Grant, on average, gained more return yards than Raymond. But, this is slightly skewed from an 86 yard return (for a touchdown) by Grant. If you evaluate the 75th percentile of their return yard distribution, Raymond slightly out-gained Grant. 

We already know that Raymond was the best returner on a PRYOE per Return basis and that Grant was one of the worst. So what made one better than the other? Was it a factor of their teammates? Was it individual ability? I dove into the features of the model to try to understand what dynamics were at play. For this, I performed the data crunching in BigQuery but will provide the file and interpretation below. 

<br>

In [None]:
# Load CSV

case_study = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/raymond_and_grant_results.csv')

In [None]:
# Filter Results

cs_filtered = case_study[['displayName', 'total_yards_gained_rank', 'total_return_yards_over_expected_ranked', 'rec_to_kick_5YRD_AVG__Diff_Rank', 'kb1_frame_per_return_rank', 
                          'return_yards_per_play_rank', 'predicted_return_yards_per_play_rank', 'pryoe_per_play_rank']]

In [None]:
cs_filtered

<br>

**Interpretation**

You may notice some unfamiliar metric rankings here, but I will explain. As you already know, Jakeem Grant was one of the best returners in terms of total return yards gained and Kalif Raymond was one of the better, but not the best. We also already know that Raymond was the best punt returner based on PRYOE and Grant was one of the worst. 

"Rec_to_Kick_5YRD_AVG_Diff_Rank" is the difference of the average returning team member count within 5 yards of the returner and the average kicking team member count within 5 yards, but ranked amongst the players. Basically, on average (on a frame level), how many more returner teammates are within 5 yards of the returner than kicking team members. The higher the rank, the more, on average, teammates within 5 yards than opponents. Which, would suggest despite the crowdedness, that the returner had teammates providing blocking/resitance to the kicking team. In this case, Raymond was 30th out of 34 (filtered out to only include returners with 8 or more returns) and Grant was 7th out of 34. This tells us that Raymond typically had more defenders than teammates within 5 yards of him and Grant had the opposite. So, this would suggest that Grant had a better blocking situation whereas Raymond did not. This may hint at Raymond having superior returning "vision", but this can not be proven with this information. 

"KB1_Frame_Per_Return" is counting the number of frames, per play, with more than one kicking team player within 5 yards of the returner, but ranked amongst the players. This is providing insight to how much pressure a returner faces on a typical play. This is slightly inflated if a returner is known to "run around" quite a bit and accumulate more frames within a play, so take this with a grain of salt. But, with that said, Grant ranks 5th highest for frames with defenders within 5 yards per play and Raymond ranks 15th. 

These two measurements present conflicting insight on a high level, but provide insight nonetheless. Grant typically had more teammtes than defenders within 5 yards than Raymond, but Grant typically had more defenders, on a frame level, within 5 yards of him. This may be an indicator into why Grant underperformed so badly when evaluated with PRYOE. On the other hand, Raymond typically had more defenders within 5 yards of him than teammates, but he was middle fo the pack regarding the number of frames with defenders within 5 yards of him. Given that Raymond tended to have a tougher "situation" and that he outperformed his expecation, it is reasonable to ponder what his punt returning performance would be with a better blocking scheme.  

<br>

## Visualize Jakeem Grant Punt Returns

In the visualizations below, I animated two punt returns by Jakeem Grant. One was a punt return for a touchdown and the other was a return with negative PRYOE accumulated in the play. This is meant to help show what a play with positive associated PRYOE and negative PRYOE look like. The size of the returner's bubble is reflecting the predicted yards gained in a frame. I ended up scaling the metric to ensure that the players would be visible within the animation. 

The code below was modified from code posted by Cube Root at https://www.kaggle.com/threecifanggen/replay-the-game-using-plotly-to-animate-the-game. 

<br>

In [None]:
import plotly.express as px
from plotly.offline import init_notebook_mode
import plotly.graph_objects as go
init_notebook_mode()
import ipywidgets as widgets

### Jakeem Grant Punt Return for TD (GameID: 2020110106, PlayID: 1473) 

In [None]:
tracking_20_df = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/2020_11_01_06_JakeemGrant_TD_Return.csv')

In [None]:
def plot_play_in_game(gameId, speed=50):
    temp_tracking_df = tracking_20_df[tracking_20_df['gameId'] == gameId]
    for playId in temp_tracking_df['playId'].dropna().unique():
        temp_tracking_query = (tracking_20_df['gameId'] == gameId) & (tracking_20_df['playId'] == playId)
        temp_tracking_df = (
            tracking_20_df[temp_tracking_query][['x', 'y', 'frameId', 'nflId', 'team', 'displayName', 'predicted_yards_gained']]
            .fillna(0.)
            .sort_values(['team', 'frameId'])
        )
        color_discrete_map = {'home': 'rgb(0,142,151)', 'away': 'rgb(255,163,0)', 'football': 'rgb(49,29,0)'}
        fig = px.scatter(
            temp_tracking_df,
            x='x',
            y='y',
            animation_frame='frameId',
            color='team',
            color_discrete_map=color_discrete_map,
            animation_group="nflId",
            hover_name="displayName",
            size = 'predicted_yards_gained'
        )
        fig.update_traces(marker=dict(size=12,
                                      line=dict(width=2,
                                                color='DarkSlateGrey')),
                          selector=dict(mode='markers'))

        ## Drawing the Ground
        for x in range(0, 130, 10):
            fig.add_trace(go.Scatter(x=[x, x], y=[0, 53.3], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.add_trace(go.Scatter(x=[0, 120], y=[53.3, 53.3], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.add_trace(go.Scatter(x=[0, 120], y=[0, 0], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.update_layout(
            autosize=False,
            width=1100,
            height=600,
            title=f'Jakeem Grant (MIA) vs LAR: Punt Return for 86 Yards (PRYOE: +5.24)',
        )
        fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = speed
        fig.show()

In [None]:
gameId = 2020110106
plot_play_in_game(gameId)

In [None]:
# Filter results for gameID = 2020110106 and playID = 1473
# Note, only frames between punt received and touchdown

jg_td = production_data.loc[(production_data['gameID'] == 2020110106) & (production_data['playID'] == 1473)]

In [None]:
# Line Chart

jg_td.plot(x = 'frameID', y = 'x_gained_difference', kind = 'line')
plt.show()

This line chart shows the PRYOE for the animated play above. You can see that the model is expected Grant to gain more yards after initially catching the punt. Frames 100 - 120 are where it gets interesting. According to the model, this is where Grant is really outperforming expectation. Near frame 140 is where Grant made the punter and another defender miss and from there, it was off the the races. 

<br>

### Jakeem Grant Punt Return with Poor PRYOE (GameID: 2020092004, PlayID: 3071)

In [None]:
tracking_20_df = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/2020_09_20_04_JakeemGrant_Poor_Return.csv')

In [None]:
def plot_play_in_game(gameId, speed=50):
    temp_tracking_df = tracking_20_df[tracking_20_df['gameId'] == gameId]
    for playId in temp_tracking_df['playId'].dropna().unique():
        temp_tracking_query = (tracking_20_df['gameId'] == gameId) & (tracking_20_df['playId'] == playId)
        temp_tracking_df = (
            tracking_20_df[temp_tracking_query][['x', 'y', 'frameId', 'nflId', 'team', 'displayName', 'predicted_yards_gained']]
            .fillna(0.)
            .sort_values(['team', 'frameId'])
        )
        color_discrete_map = {'home': 'rgb(0,142,151)', 'away': 'rgb(198,12,48)', 'football': 'rgb(49,29,0)'}
        fig = px.scatter(
            temp_tracking_df,
            x='x',
            y='y',
            animation_frame='frameId',
            color='team',
            color_discrete_map=color_discrete_map,
            animation_group="nflId",
            hover_name="displayName",
            size = 'predicted_yards_gained'
        )
        fig.update_traces(marker=dict(size=12,
                                      line=dict(width=2,
                                                color='DarkSlateGrey')),
                          selector=dict(mode='markers'))

        ## Drawing the Ground
        for x in range(0, 130, 10):
            fig.add_trace(go.Scatter(x=[x, x], y=[0, 53.3], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.add_trace(go.Scatter(x=[0, 120], y=[53.3, 53.3], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.add_trace(go.Scatter(x=[0, 120], y=[0, 0], mode='lines', showlegend=False, line=dict(color="#333333")))
        fig.update_layout(
            autosize=False,
            width=1100,
            height=600,
            title=f'Jakeem Grant (MIA) vs BUF: Punt Return for 10 Yards (PRYOE: -10.79)',
        )
        fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = speed
        fig.show()

In [None]:
gameId = 2020092004
plot_play_in_game(gameId)

In [None]:
# Filter results for gameID = 2020092004 and playID = 3071
# Note, only frames between punt received and touchdown

jg_poor = production_data.loc[(production_data['gameID'] == 2020092004) & (production_data['playID'] == 3071)]

In [None]:
# Line Chart

jg_poor.plot(x = 'frameID', y = 'x_gained_difference', kind = 'line')
plt.show()

This line chart shows the PRYOE for the animated play above. You can see that the model is expecting Grant to gain more yards than he did for most of this play. His performance really cratered in the range of frame 100 - 130. Looking at the animated chart, this is where Grant begins to run more to the left (loosing yards) and down on the y axis. It is interesting to see the model pick up a better PRYOE around frame 120 when Grant made a defender miss a tackle. 

<br>
<br>

## Team level Rankings and Results

To evaluate PRYOE on a team level, I aggregated the results for teams for their returning plays and their punting plays. The returning results mostly align with the player level - which makes sense given the low sample size of punts that are returned. The punt defending results are interesting and begin to hint at units that are superior at limiting strong returns.  

Note: I performed the roll-ups in BQ in order to de-clutter the notebook now that we are towards the end. 

<br>

In [None]:
# Load Team level Returning Results. Note: Ranked Highest to Lowest for PRYOE per play

receiving_team_results = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/receiving_team_rankings.csv')

receiving_team_results[['receiving_team_name', 'total_yards_gained_rank', 'total_return_yards_over_expected_ranked', 'total_return_yards_over_expected__per_play_ranked']]

In [None]:
# Load Team level Punt Defending Results. Note: Ranked Lowest to Highest for PRYOE per play

kicking_team_results = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/kicking_team_rankings.csv')

kicking_team_results[['kicking_team_name', 'total_yards_gained_rank', 'total_return_yards_over_expected_ranked', 'total_return_yards_over_expected__per_play_ranked']]

In [None]:
# Let's visualize the Speical Teams Unit performance (returning and punting) for PRYOE by Team

special_teams_results = pd.read_csv('../input/d/seanwsullivan1/nfl-big-data-bowl-2022/speical_teams_comparison.csv')

special_teams_results.head()

In [None]:
# Scatter Plot Visualization

fig = px.scatter(special_teams_results, x='PRYOE_Returning_Per_Play', y='PYROE_Kicking_Per_Play', text='Team')
fig.update_traces(textposition='top center')
fig.update_layout(title_text='Special Teams Unit Performance', title_x=0.5)
fig.show()

**Interpretation:**

Evaluating the scatter plot, we should use the 0,0 coordinates to base ourselves. Anything to the right of the 0 on the x axis indicates a positive PRYOE per play as the returning team (good). Anything to the left indicates a negative value (not ideal). Any value above the 0 on the y axis indicates a positive PYROE per play as the kicking team (not ideal) and anything below it indicates a negative value (good). 

Where teams would want to be is the bottom right hand quadrant (positive PRYOE as a returning team and negative PRYOE as a kicking team). Using this criteria, **the Buffalo Bills** are one of the strongest punting units. Their PRYOE per play as a returning unit is slightly above 0.4 yards and as a kicking team, it is about -1.5 yards. Other teams with strong unit performance include: **New England Patriots, Indianapolis Colts, and the Philadelphia Eagles.**

<br>
<br>

***

# Value of PRYOE Discussion

Now that the model has been explained and the results discussed, I think it is important to discuss the value of this metric. Should this metric take over in place of actual return yards gained? Probably not. But, I do think that it possess value for a few different areas of relevance. 

**Value Proposition 1: Post Performance Evaluation**
PRYOE provides insight into how a returner (or kicking unit) performed relative to how they would be expected to perform given the situation of the punt return. As shown in the above section, insight can be given into if a returner was due more credit for their over performance or if their blocking unit provided a beneficial boost to the return. It is a method that begins to get to the individual contribution that the returner has on a play. 


**Value Proposition 2: In Game Strategy**
Teams would be able to layer in this metric to help guide in game strategic decision making. For example, perhaps the Chicago Bears are playing the Buffalo Bills and know that the Bills have one of the best units on a kicking defense PRYOE per play basis. Knowing this, the Special Teams Coordinator may encourage their returner to take a fair catch or allow a punt to drop in order to maximize starting position for the ensuing drive. 


I also think that this PRYOE metric is a piece to a decision optimization problem: what is the optimal choice for a punt return (return, fair catch, let bounce)? By creating a very robust RYOE model, this tackles the first part of this decision making equation. Models would need to be developed to estimate how a punt would bounce and then provide the expected return yards, fair catch location, and bounced punt location on a given punt, in order to determine if a decision was optimal. I am aware of a few BDB projects doing this and am excited to see their finished products. 

<br>

## Limitations and Next Steps

As I mentioned in the value discussion above, there is a limitation to the PRYOE metric due to how narrow its scope is. 

In terms of a next step, I think it would be worth expanding some of the player tracking features. Perhaps including features that provide information on how close players are to each other (and not just the returner) would aid in the accuracy of the model. This sort of robust feature set could also be a launching point for quantifying impact of individual players' on a given play. For example, perhaps we'd be able to quantify a kicking team defender's impact on limiting return yards or we could quantify a player on the returning team's impact on blocking and creating more return yards. However, a discussion would need to be had on how much impact this would truly bring to a team and the sports analyics community. Essentially, is the juice worth the squeeze given the number of punt returns that happen within a given season? Reagrdless, it would set the wheels in motion for similar analyses that could be performed on other actions within a football game. 

<br>
<br>

***

# Acknowledgements

Spcial thank you to the following people for offering direct help in serving as a sounding board and/or encouraging me to complete the project:

- **My girlfriend, Sarah**
- **Ben Draus**
- **Tej Seth**