# Ranking Gunners, Vises, and Punt Return Degree of Difficulty

This analysis set out to understand which factors in punt coverage impact return yardage. I built Random Forest Regression and Classification Models, which were only somewhat effective in predicting return yardage. However, the models highlighted features key to limiting return yardage. These features can be used to rank special teams players. 

Kick length and hang time are crucial to predicting return yardage, but getting close to the returner is also crucial. No surprises here. Getting downfield in punt coverage is obviously important, but the analysis supports it. Given this assumption, this workbook uses punt play tracking data to produce the following:

1. ***Gunner Rankings*** - Using tracking data at the snap, I matched gunners with vises. I then calculated how far gunners could get downfield 3, 4, and 5 seconds from the snap. As I'd paired gunners with opposing vises, I could calculate their downfield penetration in both absolute yardage and yardage ***relative to the vise***. I used percentile ranking on each penetration metric for all gunners that participated in at least 15 qualifying punt plays (punt of 40+ yards with no penalties). I averaged percentile rankings of all metrics to determine the final metric for ranking gunners.
2. ***Vise Rankings*** - Vises try to prevent gunners from getting downfield. Using the same method, I ranked vises by the average penetration allowed on qualifying punt plays.
3. ***Punt Return Degree of Difficulty*** - For punt returns, I percentile ranked four metrics. I averaged these rankings to determine "Degree of Difficulty," by which all returns are ranked. The following metrics went into the rankings: 
    - Defender Proximity - I tallied the number of frames where a defender was within two yards of the returner during the return. If multiple defenders were within two yards in the same frame, that counted as two instances.
    - Football Movement - Total distance the football traveled during the return.
    - Return Time - Time that passed between receiving the punt and the end of the return.
    - Broken Tackles - Using PFF data, I determined how many tackles were broken during the return.

## Punt Coverage Features 

- **Hang Time** - Provided feature, reflects how long a punt is in the air.
- **Kick Direction** - I converted kickDirectionActual to a one-hot encoded field where 0 represents a centered kick and 1 represents a kick to the left/right.
- **Kick Team Position** - I calculated the average horizontal (x-axis) position of kicking team players (excluding the punter) relative to their position at the snap. Calculated at 3, 4, and 5 seconds from the snap. The theory was that the kicking team's penetration would correlate negatively with return yardage.
- **Receiving Team Position** - Average horizontal position of the receiving team (excluding the returner) at 3, 4, and 5, seconds.
- **Gunner Penetration** - Average horizontal position of gunners relative to snap position at 3, 4, and 5, seconds.
- **Max Spread** - Difference between minimum and maximum vertical (y-axis) position of kicking team players at 3, 4, and 5 seconds. I wanted to explore whether vertical distance covered impacted return yardage.
- **Gap Standard Deviation** - Standard deviation of the vertical gaps between kicking team players at 3, 4, and 5 seconds. This is a measure of the size of the holes in coverage.
- **Minimum Defender Distance to Returner** - Distance from nearest defender to the returner at 3, 4, and 5 seconds.
- **Mean Defender Distance to Returner** - Mean distance from all defenders (excluding the punter) to the returner at 3, 4, and 5 seconds.

## Machine Learning Models and Findings

To confirm conventional understanding of key aspects of effective punt coverage, I built machine learning models to predict punt return yardage based on the features listed above. Below I discuss the models, their effectiveness, and conclusions drawn from the process.

### Feature Importances

See below for the features of the Random Forest Regression and Classification models used to predict punt return yardage. For Regression, I used the metrics as calculated at 4 seconds from the snap of the football (where applicable). For Classification, all possible features were included and then reduced through sequential forward and backward selection.

In [None]:
from IPython.display import Image

# Display feature importances
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/feature_importance.png?raw=true'
Image(url=image_url, width=1000)

### Random Forest Linear Regression

I first explored correlation and collinearity of the features and target variable (return yardage). In my initial check of Variance Inflation Factor (VIF), I included all available features. I eliminated features in order to reduce VIF to an acceptable level (< 5.0) and performed hyperparameter tuning to maximize R-Squared. Even with tuning, I could only achieve ***R-Squared of 0.1254***. 

While this model is particularly accurate, it does provide information about key features. In the screenshot above, I've included the feature importances for the best performing regression model. This provides hard evidence that getting close to the returner during the punt is key to limiting return yardage. Ground breaking? No. But it's nice to see the data support conventional wisdom.

### Random Forest Classification

I had more success with Classification. I separated punt plays by their return yardage into four buckets: (1) fair catch, (2) less than 5 yards, (3) 5-15 yards, and (4) greater than 15 yards. I used sequential forward and sequential backward feature selection to determine the optimal combination of features to maximize accuracy. I tuned the hyperparameters of the winning Random Forest model using grid search and ultimately achieved ***classification accuracy of 47.4%***.

See below for the confusion matrices normalized by true and predicted classification. The model is best at predicting when a punt will result in a fair catch or in a long return (15+ yards). Given that the model is reasonably predictive, I wanted to understand feature importances. Above, you'll see the features included in the best performing classification model. These features were selected through a feature reduction process, so were already deemed important to predicting return yardage, but some are slightly more important. Kick Length and Hang Time top the list. Beyond metrics related to the kick itself, we see features related to getting downfield (Gunner Penetration, Kick Team Position). Surprisingly, Gap Standard Deviation also made the cut. 

In [None]:
# Display punt return confusion matrices
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/confusion_punt.png?raw=true'
Image(url=image_url, width=1250)

## Actionable Insights

With knowledge acquired from analyzing the predictive models, I aimed to use tracking data to derive actionable insights. I decided to explore the most critical skill positions involved in punt return plays:  gunners, vises, and returners. The kicker is also critical, but I found tracking data to be more useful in analyzing punt coverage. Below, I transform the data to rank gunners and vises and calculate the degree of difficulty of punt returns.

### Gunner and Vise Rankings

We know the minimum distance between defenders and the returner helps to predict return yardage. We also know the player most likely to get close to the returner is the gunner. It is therefore important to understand gunners' ability to penetrate downfield. It is equally important to understand the vises' ability to prevent gunner penetration. Below, I've displayed the best and worst gunners and vises as determined by the following criteria:

- **Absolute Penetration** - I calculated the gunner's change in horizontal (x-axis) position relative to his snap position at 3, 4, and 5 seconds from the snap of the football.
- **Gunner Penetration Relative to Vise** - The gunner's horizontal position relative to the opposing vise's horizontal position at 3, 4, and 5 seconds. On each play, each gunner was matched with an opposing vise based on the players' vertical (y-axis) snap position.
- **Excluded Plays and Players** - Plays were only included in the rankings data if the punt traveled 40+ yards. Penalized plays were excluded. Plays were excluded if there were not an equal number of gunners and vises. For fairness, I only analyzed gunner penetration and vise protection for plays where gunners and vises battled one-on-one. Players were only included in the rankings if they played on 15+ qualifying punt plays.
- **Average Rank** - The field used to rank players is "Average Rank." This represents the average of the player's rankings in each of the fields representing absolute and relative penetration at 3, 4, and 5 seconds.

#### Gunner Punt Penetration Rankings - Top and Bottom 10

In [None]:
# Display gunner penetration rankings
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/gunner_rankings.png?raw=true'
Image(url=image_url, width=1250)

#### Vise Return Protection Rankings - Top and Bottom 10

In [None]:
# Display vise protection rankings
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/vise_rankings.png?raw=true'
Image(url=image_url, width=1250)

### Punt Return Degree of Difficulty

Given the importance of Defender Proximity, I hoped to use tracking data to rank punt return probabilitites. This led me to seek out other metrics that could be used to establish a "Degree of Difficulty" metric to rank returns. Degree of Difficulty is calculated by averaging the percentile rank of all returns for the following metrics:

- **Defender Proximity** - For each frame ***during the return***, I calculated the distance between each defender and the returner. I created a binary column representing whether each defender was within two yards of the returner. Returns were percentile ranked based on the total. A score of 1.00 in Defender Proximity would be assigned to the single punt return play that contained the most instances of a defender within two yards of the returner during the return.
- **Total Movement** - This represents ***total movement of the football*** from reception through the end of the return. Below, you'll see that for touchdowns, the percentile rank is typically near 1.00.
- **Return Time** - Length of the return in frames. For the touchdown plays below, the return time is near 1.00.
- **Broken Tackles** - PFF supplied identifying information for missed tacklers. I converted this data to count of broken tackles, which was percentile ranked.

The percentile rankings include all punt plays that were returned (not only touchdowns). This is why Total Movement and Return Time are very near 1.00 for touchdown plays. You'll also notice that Defender Proximity doesn't approach 1.00 in the punt return touchdown rankings. This is because Defender Proximity is maximized when the punt returner is tackled. When I include all punt returns, the top ranking punt return touchdown has the 17th highest degree of difficulty. You can see the full rankings in my [GitHub workbook](https://github.com/drwismer/NFL_special_teams/blob/main/Punt%20Ranking%20Code.ipynb). Defender Proximity, as it is currently calculated, is only useful for ranking touchdown returns. When ranking punt returns regardless of the scoring result, it may be best to exclude Defender Proximity.

See below for a ranking of punt return touchdowns. You'll also find GIF's of one of the highest and one of the lowest ranked punt return touchdowns. [Thank you to Samira Kumar for the fantastic visualization tool!](https://github.com/samirak93/Game-Animation)

#### Touchdown Punt Returns - Degree of Difficulty

In [None]:
# Display punt return TD's sorted by degree of difficulty
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/punt_return_rankings.png?raw=true'
Image(url=image_url, width=1250)

#### Rank #2 - Isaiah McKenzie - BUF vs. MIA - 2020 Week 17. [Watch a video of the return.](https://www.youtube.com/watch?v=0kqp9vD3HMA&ab_channel=HighlightHeaven)

Notice the punt returner is nearly tackled multiple times. His route goes straight through the defense, which increases Defender Proximity. The return is slowed by the punter forcing back inside, increasing Return Time. If you follow the YouTube link, you'll see he has to break a few tackles as well.

In [None]:
# Display return gif
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/punt_return_hard.gif?raw=true'
Image(url=image_url, width=1000)

#### Lowest Ranked - Mecole Hardman - KC vs. MIA - 2020 Week 14. [Watch a video of the return.](https://www.youtube.com/watch?v=1VVSQwPANiU&ab_channel=TopFanTV)

This play looks very different from the highly ranked return above. The returner takes the sideline route, avoiding defenders almost entirely. This play has particularly low values of Defender Proximity and Broken Tackles. Because it is a touchdown return, Return Time and Total Movement are high, but not compared to other touchdown returns. You might say this touchdown return was due more to good blocking or poor coverage than to the returner's incredible effort.

In [None]:
# Display return gif
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/punt_return_easy.gif?raw=true'
Image(url=image_url, width=1000)

## Kickoff Returns

This is not the focus of this analysis, but I've included some kickoff return analysis for fun. Below, you'll find a confusion matrix showing the performance of a Random Forest Classification model applied to kickoff return yardage. Note that the performance is better than for punt return classification, with ***accuracy of 65.2%***. You'll also find Degree of Difficulty rankings for all kickoff return touchdowns included in the data. You can follow links to YouTube to watch the highest ranked kickoff touchdown, scored by Jakeem Grant of the Miami Dolphins.

[To see the full code, visit the workbook on my GitHub profile](https://github.com/drwismer/NFL_special_teams/blob/main/Kickoff%20Coverage.ipynb).

### Classification - Confusion Matrix

In [None]:
# Display kickoff return confusion matrices
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/confusion_kickoff.png?raw=true'
Image(url=image_url, width=1250)

### Touchdown Kickoff Returns - Degree of Difficulty

In [None]:
# Display kick return TD's
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/kick_return_rankings.png?raw=true'
Image(url=image_url, width=1250)

#### Rank #1 - Jakeem Grant - MIA vs. TEN - 2018 Week 1. [Watch a video of the return.](https://www.youtube.com/watch?v=UqgE-356WlU&ab_channel=HighlightHeaven)

In [None]:
# Display return gif
image_url = 'https://github.com/drwismer/NFL_special_teams/blob/main/images/kick_return_hard.gif?raw=true'
Image(url=image_url, width=1000)

# -------------------------------- THE CODE --------------------------------

# Import Libraries and Data

In [None]:
import pandas as pd
import numpy as np
from math import sqrt

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import seaborn as sns
sns.set(style='darkgrid')

import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.metrics import accuracy_score, plot_confusion_matrix

import pickle

In [None]:
games = pd.read_csv('../input/nfl-big-data-bowl-2022/games.csv')
players = pd.read_csv('../input/nfl-big-data-bowl-2022/players.csv')
plays = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')
pff = pd.read_csv('../input/nfl-big-data-bowl-2022/PFFScoutingData.csv')
tracking_2018 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2018.csv')
tracking_2019 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2019.csv')
tracking_2020 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2020.csv')

## Data Setup

### Concatenate Tracking Data and Orient all Kicks to the Right

In [None]:
# Concatenate tracking data
tracking_full = pd.concat([tracking_2018, tracking_2019, tracking_2020], axis=0)
tracking_full.reset_index(drop=True, inplace=True)

# Flip so all plays go begin with kick to the right
tracking_full.loc[tracking_full['playDirection'] == "left", 'x'] = 120-tracking_full.loc[tracking_full['playDirection'] == "left", 'x']
tracking_full.loc[tracking_full['playDirection'] == "left", 'y'] = 160/3-tracking_full.loc[tracking_full['playDirection'] == "left", 'y']

# Punt Coverage and Return Analysis

### Filter Plays and PFF Data to Punt Plays

In [None]:
# Filter for punts only
punt_plays = plays[plays['specialTeamsPlayType'] == 'Punt'].copy()

# Add column for yards from endzone
punt_plays['yardsFromEndzone'] = np.where(punt_plays['possessionTeam'] == punt_plays['yardlineSide'],
                                          100 - punt_plays['yardlineNumber'],
                                          punt_plays['yardlineNumber']
                                         )

# Add bucket column for yards from endzone
conditions = [punt_plays['yardsFromEndzone'].le(50), 
              punt_plays['yardsFromEndzone'].gt(50) & punt_plays['yardsFromEndzone'].le(65),
              punt_plays['yardsFromEndzone'].gt(65) & punt_plays['yardsFromEndzone'].le(93),
              punt_plays['yardsFromEndzone'].gt(93)
             ]

choices = ['0-50', '50-65', '65-93', '93+']

punt_plays['yardsFromEndzoneBucket'] = np.select(conditions, choices)

In [None]:
# Filter PFF data for punt plays
pff_punt_plays = pff[pff['kickType'].isin(['N', 'R', 'A'])]

### Filter Tracking Data to Punt Plays

In [None]:
# Only keep plays considered punt plays by PFF
punt_tracking = tracking_full.merge(pff_punt_plays[['gameId', 'playId']],
                                    on=['gameId', 'playId'],
                                    how='inner'
                                   ).reset_index(drop=True)

### Prepare Tracking Data for Point in Time Metrics

In [None]:
# Merge player info to punt tracking data
play_cols = ['gameId', 'playId', 'kickerId', 'returnerId', 'possessionTeam', 'absoluteYardlineNumber']
pff_cols = ['gameId', 'playId', 'gunners', 'vises']
games_cols = ['gameId', 'homeTeamAbbr', 'visitorTeamAbbr']

punt_tracking = punt_tracking.merge(plays[play_cols], on=['gameId', 'playId'])\
                                        .merge(pff[pff_cols], on=['gameId', 'playId'])\
                                        .merge(games[games_cols], on='gameId')

In [None]:
# Convert merged ID columns to integer
punt_tracking['kickerId'].fillna(0, inplace=True)
punt_tracking['kickerId'] = punt_tracking['kickerId'].astype(int)
punt_tracking['returnerId'] = punt_tracking['returnerId'].str.split(';').str[0]
punt_tracking.dropna(subset = ['returnerId'], inplace=True)
punt_tracking['returnerId'] = punt_tracking['returnerId'].astype(int)

In [None]:
# Flip the yardline number for plays going left
punt_tracking['absoluteYardlineNumber'] = np.where(punt_tracking['playDirection']=='left', 
                                                         (punt_tracking['absoluteYardlineNumber'] - 120) * -1,
                                                         punt_tracking['absoluteYardlineNumber']
                                                        )

# Adjust x to be relative to line of scrimmage
punt_tracking['x_adj'] = punt_tracking['x'] - punt_tracking['absoluteYardlineNumber']

In [None]:
# Create team name, team and number combination, and kicking team True/False columns
punt_tracking['jerseyNumber'].fillna(0, inplace=True)
punt_tracking['teamName'] = np.where(punt_tracking['team']=='football', 'football',
                                     np.where(punt_tracking['team']=='home', 
                                              punt_tracking['homeTeamAbbr'],
                                              punt_tracking['visitorTeamAbbr']
                                             )
                                    )

punt_tracking['teamNumber'] = punt_tracking['teamName'] + ' ' + punt_tracking['jerseyNumber'].astype(int).astype(str)

punt_tracking['kickingTeam'] = punt_tracking['teamName']==punt_tracking['possessionTeam']

In [None]:
# Create True/False gunner and vise columns to mark gunners and vises
punt_tracking['gunner_y_n'] = punt_tracking.apply(lambda row: row['teamNumber'] in str(row['gunners']), axis=1)
punt_tracking['vise_y_n'] = punt_tracking.apply(lambda row: row['teamNumber'] in str(row['vises']), axis=1)

In [None]:
# Create role player column to track punter/return/gunner/vise/football
punt_tracking['rolePlayer'] = np.where(punt_tracking['nflId']==punt_tracking['kickerId'], 'punter',
                                             np.where(punt_tracking['nflId']==punt_tracking['returnerId'], 'returner',
                                                      np.where(punt_tracking['gunner_y_n'], 'gunner',
                                                               np.where(punt_tracking['vise_y_n'], 'vise',
                                                                        np.where(punt_tracking['team']=='football', 'football', 'other'
                                                                                )
                                                                       )
                                                              )
                                                     )
                                            )

In [None]:
# Determine snap frame and create new frame column relative to the snap
punt_snaps = punt_tracking[punt_tracking['event']=='ball_snap'][['gameId', 'playId', 'frameId', 'event']].drop_duplicates()
punt_snaps.rename(columns={'frameId' : 'snap'}, inplace=True)

punt_tracking = punt_tracking.merge(punt_snaps[['gameId', 'playId', 'snap']], on=['gameId', 'playId'])
punt_tracking['frameVsSnap'] = punt_tracking['frameId'] - punt_tracking['snap']

## Calculate Point in Time Metrics for Each Frame

#### Team Penetration by Frame

In [None]:
# Calculate average team position and kicking team penetration by frame
exclude = ['punter', 'returner', 'football']
group_cols = ['gameId', 'playId', 'frameVsSnap', 'teamName']

penetration_by_frame_kicking = punt_tracking[(~punt_tracking['rolePlayer'].isin(exclude)) &
                                                   (punt_tracking['kickingTeam'])
                                                  ].groupby(group_cols)['x_adj'].mean().reset_index()

penetration_by_frame_receiving = punt_tracking[(~punt_tracking['rolePlayer'].isin(exclude)) &
                                                     (~punt_tracking['kickingTeam'])
                                                    ].groupby(group_cols)['x_adj'].mean().reset_index()


team_penetration_by_frame = penetration_by_frame_kicking.merge(penetration_by_frame_receiving, on=['gameId', 'playId', 'frameVsSnap'])
team_penetration_by_frame.columns = ['gameId', 'playId', 'frameVsSnap', 'kickTeam', 'kickTeamPosition', 'recTeam', 'recTeamPosition']
team_penetration_by_frame['kickTeamPenetration'] = team_penetration_by_frame['kickTeamPosition'] - team_penetration_by_frame['recTeamPosition']

#### Gunner Penetration by Frame

In [None]:
# Calculate average gunner and vise position and average gunner penetration by frame
include = ['gunner', 'vise']

penetration_by_frame_gunner = punt_tracking[(punt_tracking['rolePlayer'].isin(include)) &
                                                  (punt_tracking['kickingTeam'])
                                                 ].groupby(group_cols)['x_adj'].mean().reset_index()

penetration_by_frame_vise = punt_tracking[(punt_tracking['rolePlayer'].isin(include)) &
                                                (~punt_tracking['kickingTeam'])
                                               ].groupby(group_cols)['x_adj'].mean().reset_index()

gunner_penetration_by_frame = penetration_by_frame_gunner.merge(penetration_by_frame_vise, on=['gameId', 'playId', 'frameVsSnap'])
gunner_penetration_by_frame.columns = ['gameId', 'playId', 'frameVsSnap', 'kickTeam', 'gunnerPosition', 'recTeam', 'visePosition']
gunner_penetration_by_frame['gunnerPenetration'] = gunner_penetration_by_frame['gunnerPosition'] - gunner_penetration_by_frame['visePosition']

#### Vertical Spread and Deviation of Gaps by Frame

In [None]:
# Calculate vertical spread of the punt team
kick_team_spread = punt_tracking[(punt_tracking['kickingTeam'])
                                      ].groupby(group_cols)['y'].agg({np.min,np.max}).reset_index()

kick_team_spread['max_spread'] = kick_team_spread['amax'] - kick_team_spread['amin']

In [None]:
# Calculate standard deviation of the player gaps
kick_team_gaps = punt_tracking[punt_tracking['kickingTeam']].sort_values(['gameId', 'playId', 'frameVsSnap', 'y']).copy()

kick_team_gaps['y_gap'] = kick_team_gaps['y'].diff()

mask = ((kick_team_gaps['gameId']==kick_team_gaps['gameId'].shift(1)) &
        (kick_team_gaps['playId']==kick_team_gaps['playId'].shift(1)) &
        (kick_team_gaps['frameVsSnap']==kick_team_gaps['frameVsSnap'].shift(1))
       )

kick_team_gaps['y_gap'] = np.where(mask!=True, np.nan, kick_team_gaps['y_gap'])

kick_team_gaps = kick_team_gaps.groupby(['gameId', 'playId', 'frameVsSnap'])['y_gap'].std().reset_index()
kick_team_gaps.columns = ['gameId', 'playId', 'frameVsSnap', 'gapStdDev']

In [None]:
# Merge gap deviation to spread dataframe
kick_team_spread = kick_team_spread.merge(kick_team_gaps, on=['gameId', 'playId', 'frameVsSnap'])

#### Nearest Defender to Returner by Frame

In [None]:
# Calculate the distance between each defender and the returner, as well as average and minimum by frame
returner_position = punt_tracking[punt_tracking['rolePlayer']=='returner'][['gameId', 'playId', 'frameVsSnap', 'x', 'y']]
returner_position.columns = ['gameId', 'playId', 'frameVsSnap', 'x_returner', 'y_returner']

punt_tracking = punt_tracking.merge(returner_position, on=['gameId', 'playId', 'frameVsSnap'])

punt_tracking['dist_to_returner'] = np.sqrt((punt_tracking['x_returner'] - punt_tracking['x'])**2 + (punt_tracking['y_returner'] - punt_tracking['y'])**2)

dist_to_returner = punt_tracking[(~punt_tracking['rolePlayer'].isin(exclude)) &
                                 (punt_tracking['kickingTeam'])
                                ].groupby(['gameId', 'playId', 'frameVsSnap'])['dist_to_returner'].agg({np.min,np.mean}).reset_index()

dist_to_returner.columns = ['gameId', 'playId', 'frameVsSnap', 'minDist', 'meanDist']

### Combine Punt Details Into a Single Dataframe

In [None]:
def merge_frame_data(df1, df2, on_cols, merge_cols, frame=[], rename_cols=[]):
    """
    Merge tracking data for specified frame(s) from specified df.
    """
    for f in frame:
        df1 = df1.merge(df2[df2['frameVsSnap']==f][merge_cols], on=on_cols)
        if rename_cols:
            for col in rename_cols:
                df1.rename(columns={col : col + '_frame' + str(f)}, inplace=True)

    return df1

In [None]:
# Merge relevant columns from plays and pff dataframes
pff_filtered = pff_punt_plays[['gameId', 'playId', 'snapDetail', 'operationTime', 'hangTime', 'kickDirectionActual']]
punt_plays_filtered = punt_plays[['gameId', 'playId', 'specialTeamsResult', 'penaltyCodes', 'kickLength', 
                                  'kickReturnYardage', 'playResult', 'yardsFromEndzoneBucket']]

punt_data_combined = pff_filtered.merge(punt_plays_filtered, on=['gameId', 'playId'], how='inner')

In [None]:
# Merge team penetration data
on_cols = ['gameId', 'playId']
merge_cols = on_cols + ['kickTeam', 'recTeam', 'kickTeamPosition', 'recTeamPosition']
rename_cols = ['kickTeamPosition', 'recTeamPosition']

punt_data_combined = merge_frame_data(punt_data_combined,
                                      team_penetration_by_frame,
                                      on_cols=on_cols,
                                      merge_cols=merge_cols,
                                      frame=[30],
                                      rename_cols=rename_cols
                                     )

merge_cols = on_cols + rename_cols

punt_data_combined = merge_frame_data(punt_data_combined,
                                      team_penetration_by_frame,
                                      on_cols=on_cols,
                                      merge_cols=merge_cols,
                                      frame=[40, 50],
                                      rename_cols=rename_cols
                                     )

In [None]:
# Merge gunner penetration data
rename_cols = ['gunnerPenetration']
merge_cols = on_cols + rename_cols

punt_data_combined = merge_frame_data(punt_data_combined,
                                      gunner_penetration_by_frame,
                                      on_cols=on_cols,
                                      merge_cols=merge_cols,
                                      frame=[30, 40, 50],
                                      rename_cols=rename_cols
                                     )

In [None]:
# Merge kick team spread and gap data
rename_cols = ['max_spread', 'gapStdDev']
merge_cols = on_cols + rename_cols

punt_data_combined = merge_frame_data(punt_data_combined,
                                      kick_team_spread,
                                      on_cols=on_cols,
                                      merge_cols=merge_cols,
                                      frame=[30, 40, 50],
                                      rename_cols=rename_cols
                                     )

In [None]:
# Merge defender distance data
rename_cols = ['minDist', 'meanDist']
merge_cols = on_cols + rename_cols

punt_data_combined = merge_frame_data(punt_data_combined,
                                      dist_to_returner,
                                      on_cols=on_cols,
                                      merge_cols=merge_cols,
                                      frame=[30, 40, 50],
                                      rename_cols=rename_cols
                                     )

In [None]:
# Reorder the columns in the final dataframe
new_order = ['gameId',
             'playId',
             'kickTeam',
             'recTeam',
             'snapDetail',
             'kickDirectionActual',
             'specialTeamsResult',
             'penaltyCodes',
             'yardsFromEndzoneBucket',
             'kickLength',
             'kickReturnYardage',
             'playResult',
             'operationTime',
             'hangTime',
             'kickTeamPosition_frame30',
             'kickTeamPosition_frame40',
             'kickTeamPosition_frame50',
             'recTeamPosition_frame30',
             'recTeamPosition_frame40',
             'recTeamPosition_frame50',
             'gunnerPenetration_frame30',
             'gunnerPenetration_frame40',
             'gunnerPenetration_frame50',
             'max_spread_frame30',
             'max_spread_frame40',
             'max_spread_frame50',
             'gapStdDev_frame30',
             'gapStdDev_frame40',
             'gapStdDev_frame50',
             'minDist_frame30',
             'minDist_frame40',
             'minDist_frame50',
             'meanDist_frame30',
             'meanDist_frame40',
             'meanDist_frame50']

punt_data_combined = punt_data_combined[new_order]

## Explore Correlation and Collinearity

In [None]:
# Change NaN to zero for kickReturnYardage
punt_data_combined['kickReturnYardage'].fillna(0, inplace=True)

In [None]:
# Display correlation matrix
corr = punt_data_combined[new_order[9:]].corr()

plt.figure(figsize=[20,15])
ax = plt.axes()
plt.rcParams.update({'font.size': 12})
ax.set_title('Punt Data Correlation Matrix', fontsize=22, pad=15)
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, cmap='RdBu', annot=True, fmt='.2f', vmin=-1.0)
plt.show();

In [None]:
# Drop output variables to perform for VIF calculation
output_var = ['kickReturnYardage', 'playResult']
linear_reg_vif = punt_data_combined[new_order[12:]].dropna()

# Create constant and run VIF calculation
X = sm.tools.add_constant(linear_reg_vif)
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
display(vif)

In [None]:
# New run of VIF, using frame 30 only
linear_reg_vif = punt_data_combined[['operationTime', 'hangTime', 'kickTeamPosition_frame30', 'recTeamPosition_frame30',
                                    'gunnerPenetration_frame30', 'max_spread_frame30', 'gapStdDev_frame30', 'minDist_frame30',
                                    'meanDist_frame30']].dropna()

X = sm.tools.add_constant(linear_reg_vif)
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
display(vif)

In [None]:
# New run of VIF, frame 40 only
linear_reg_vif = punt_data_combined[['operationTime', 'hangTime', 'kickTeamPosition_frame40', 'recTeamPosition_frame40',
                                    'gunnerPenetration_frame40', 'max_spread_frame40', 'gapStdDev_frame40', 'minDist_frame40',
                                    'meanDist_frame40']].dropna()

X = sm.tools.add_constant(linear_reg_vif)
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
display(vif)

In [None]:
# New run of VIF, frame 50 only
linear_reg_vif = punt_data_combined[['operationTime', 'hangTime', 'kickTeamPosition_frame50', 'recTeamPosition_frame50',
                                    'gunnerPenetration_frame50', 'max_spread_frame50', 'gapStdDev_frame50', 'minDist_frame50',
                                    'meanDist_frame50']].dropna()

X = sm.tools.add_constant(linear_reg_vif)
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
display(vif)

## Run Linear Regression

In [None]:
# Set feature and target columns and create separate dataframes (drop NA)
x_cols = ['kickLength', 'operationTime', 'hangTime', 'kickTeamPosition_frame40', 'recTeamPosition_frame40',
          'gunnerPenetration_frame40', 'max_spread_frame40', 'gapStdDev_frame40', 'minDist_frame40',
          'meanDist_frame40']

y_cols = ['kickReturnYardage']

punt_reg_data = punt_data_combined[x_cols + y_cols].dropna()

X = punt_reg_data[x_cols]
Y = punt_reg_data[y_cols]

In [None]:
# Train test split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=1)

In [None]:
# Initialize, train, and score Random Forest Regressor
rf_regressor = RandomForestRegressor(random_state=13)
rf_regressor.fit(x_train, y_train)

score = rf_regressor.score(x_test, y_test)

print('Random Forest Regressor R-Squared:  ' + str(format(score, '.4f')))

In [None]:
# Perform grid search for optimal Random Forest Regressor parameters
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4]
bootstrap = [True, False]

random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

rf_regressor = RandomForestRegressor(random_state=13)
rf_random = RandomizedSearchCV(estimator=rf_regressor,
                               param_distributions=random_grid,
                               n_iter=100,
                               cv=3,
                               verbose=2,
                               random_state=13,
                               n_jobs = -1)

In [None]:
# Train and score best estimator
rf_random.fit(x_train, y_train)
rf_random.best_params_
rf_random_score = rf_random.best_estimator_.score(x_test, y_test)

print('Random Forest Regressor R-Squared:  ' + str(format(rf_random_score, '.4f')))

In [None]:
# Display feature importances for best scoring estimator
best_rf_regressor = rf_random.best_estimator_
pd.Series(best_rf_regressor.feature_importances_, index=x_train.columns).sort_values(ascending=False)

## Classification Model

In [None]:
def sequential_backward(model_data, target, classifier):
    """
    Accept a classification model and the full train/test data. Remove features one by one
    by determining the feature whose removal results in the highest accuracy score. Return
    a list of features in order of removal and a dataframe with all feature combinations
    and their accuracy scores, sorted by highest accuracy.
    """
    test_columns = list(model_data.columns)
    test_columns.remove(target)
    
    feature_combos = pd.DataFrame()

    backward_order = []

    while test_columns:
        cols = test_columns + [target]
        data = model_data[cols]

        X = data.drop(columns=target)
        Y = data[target]
        
        x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=13)

        classifier.fit(x_train, y_train)
        y_pred = classifier.predict(x_test)
        score = accuracy_score(y_test, y_pred, normalize = True)

        test_instance = {'combo' : ', '.join(test_columns), 'score' : score, 'length' : len(test_columns)}
        feature_combos = feature_combos.append(test_instance, ignore_index=True)

        feature_importance = pd.Series(classifier.feature_importances_, index=x_train.columns).sort_values(ascending=False)
        remove = feature_importance.index[-1]

        test_columns.remove(remove)
        backward_order.extend([remove])

    sorted_combos = feature_combos.sort_values('score', ascending=False).reset_index()
    
    return backward_order, sorted_combos

In [None]:
def sequential_forward(model_data, target, classifier):
    """
    Accept a classification model and the full train/test data. Find the single feature
    that provides the best classification on its own. Then add featurues one by one
    by determining the feature whose addition results in the highest accuracy score. Return
    a list of features in order of addition and a dataframe with all feature combinations
    and their accuracy scores, sorted by highest accuracy.
    """
    test_columns = list(model_data.columns)
    test_columns.remove(target)
    
    feature_combos = pd.DataFrame()

    forward_order = []
    i = 1

    while test_columns:
        for col in test_columns:
            cols = [target] + forward_order + [col]
            data = model_data[cols].copy()
            
            X = data.drop(columns=target)
            Y = data[target]
            
            x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=13)

            classifier.fit(x_train, y_train)
            y_pred = classifier.predict(x_test)

            score = accuracy_score(y_test, y_pred, normalize = True)

            test_instance = {'combo' : ', '.join(cols), 'score' : score, 'length' : i}
            feature_combos = feature_combos.append(test_instance, ignore_index=True)

        remove = feature_combos[feature_combos['length']==i].sort_values('score', ascending=False).reset_index().loc[0,'combo'].split(', ')[-1]

        test_columns.remove(remove)
        forward_order.extend([remove])
        i += 1

    sorted_combos = feature_combos.sort_values('score', ascending=False).reset_index()
    sorted_combos = sorted_combos.drop_duplicates(subset='length', keep='first')
    
    return forward_order, sorted_combos

In [None]:
def balance_target(df, target):
    """
    Reduce all target classes to the level of the minority class.
    """
    
    limit = df[target].value_counts()[-1]
    
    balanced_df = pd.DataFrame()
    
    # Loop through each of the possible classes
    for value in df[target].value_counts().index:
        subset = df[df[target] == value]
        subset = subset.sample(limit, random_state=13)
        balanced_df = pd.concat([balanced_df, subset])
    
    # Return the final dataframe
    return balanced_df

In [None]:
# Make copy of punt data for classification, remove muffed punts and plays with penalties
punt_data_classification = punt_data_combined.copy()

punt_data_classification['penaltyCodes'] = punt_data_classification['penaltyCodes'].astype(str)

punt_data_classification = punt_data_classification[(punt_data_classification['specialTeamsResult'] != 'Muffed') &
                                                    (punt_data_classification['penaltyCodes'] == 'nan')]

In [None]:
# Add target bucket column for kick return yardage
conditions = [punt_data_classification['kickReturnYardage'].le(5), 
              punt_data_classification['kickReturnYardage'].gt(5) & punt_data_classification['kickReturnYardage'].le(15),
              punt_data_classification['kickReturnYardage'].gt(15)
             ]

choices = ['<5', '5-15', '15+']

punt_data_classification['kickReturnYardageBucket'] = np.select(conditions, choices)

In [None]:
# Convert kick direction to float (Center is 0, L/R is 1)
punt_data_classification['kickDirectionActual'] = np.where(punt_data_classification['kickDirectionActual']=='C', 0, 1)

In [None]:
# Mark fair catches in the target column
punt_data_classification['kickReturnYardageBucket'] = np.where(punt_data_classification['specialTeamsResult']=='Fair Catch',
                                                               'Fair Catch',
                                                               punt_data_classification['kickReturnYardageBucket'])

In [None]:
# Balance the classes for the modeling data
punt_classification_balance = balance_target(punt_data_classification, 'kickReturnYardageBucket')

### Baseline Model

In [None]:
# Establish feature and target columns and set the dataframes for modeling
x_cols = ['kickLength', 'operationTime', 'hangTime',
          'kickDirectionActual',
          'kickTeamPosition_frame30', 'recTeamPosition_frame30',
          'kickTeamPosition_frame40', 'recTeamPosition_frame40',
          'kickTeamPosition_frame50', 'recTeamPosition_frame50',
          'gunnerPenetration_frame30', 'gunnerPenetration_frame40', 'gunnerPenetration_frame50',
          'max_spread_frame30', 'max_spread_frame40', 'max_spread_frame50',
          'gapStdDev_frame30', 'gapStdDev_frame40', 'gapStdDev_frame50', 
          'minDist_frame30', 'minDist_frame40', 'minDist_frame50',
          'meanDist_frame40', 'meanDist_frame40', 'meanDist_frame50']

y_cols = ['kickReturnYardageBucket']

X = punt_classification_balance[x_cols]
Y = punt_classification_balance[y_cols]

In [None]:
# Train test split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=13)

In [None]:
# Initialize, train, and score the Random Forest Classifier model
rf_classifier = RandomForestClassifier(random_state=13)

rf_classifier.fit(x_train, y_train)

y_pred_train = rf_classifier.predict(x_train)
y_pred_test = rf_classifier.predict(x_test)

rf_train_score = accuracy_score(y_train, y_pred_train, normalize = True)
rf_test_score = accuracy_score(y_test, y_pred_test, normalize = True)

print('Training Score:   ' + str(rf_train_score))
print('Test Score: ' + str(rf_test_score))

In [None]:
# Display the feature performances
pd.Series(rf_classifier.feature_importances_, index=x_train.columns).sort_values(ascending=False)

### Find Optimal Feature Combinations

In [None]:
# Establish the initial dataframe with all possible features
punt_classification_balance = punt_classification_balance[x_cols + y_cols]

In [None]:
# Run forward and backward feature selection
backward_order, sorted_combos_backward = sequential_backward(punt_classification_balance, 'kickReturnYardageBucket', rf_classifier)
forward_order, sorted_combos_forward = sequential_forward(punt_classification_balance, 'kickReturnYardageBucket', rf_classifier)

In [None]:
# Display feature order from backward selection
backward_order[::-1]

In [None]:
# Show top combinations from backward selection
sorted_combos_backward[0:5]

In [None]:
# Display feature order from forward selection
forward_order

In [None]:
# Show top combinations from forward selection
sorted_combos_forward[0:5]

In [None]:
# Set features as top feature combination from forward and backward selection
x_cols = sorted_combos_forward.combo[0].split(', ')[1:]

X = punt_classification_balance[x_cols]
Y = punt_classification_balance[y_cols]

In [None]:
# Train test split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=13)

In [None]:
# Initialize, train, and score the Random Forest Classifier
rf_classifier = RandomForestClassifier(random_state=13)

rf_classifier.fit(x_train, y_train)

y_pred_train = rf_classifier.predict(x_train)
y_pred_test = rf_classifier.predict(x_test)

rf_train_score = accuracy_score(y_train, y_pred_train, normalize = True)
rf_test_score = accuracy_score(y_test, y_pred_test, normalize = True)

print('Training Score:   ' + str(rf_train_score))
print('Test Score: ' + str(rf_test_score))

In [None]:
# Display feature importances for Random Forest Classifier
pd.Series(rf_classifier.feature_importances_, index=x_train.columns).sort_values(ascending=False)

### Hyperparameter Tuning on Top Classifier Model

In [None]:
# Run grid search for Random Forest Classifier
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4]
bootstrap = [True, False]

random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

rf_classifier = RandomForestClassifier(random_state=13)
rf_random = RandomizedSearchCV(estimator=rf_classifier,
                               param_distributions=random_grid,
                               n_iter=100,
                               cv=3,
                               verbose=2,
                               random_state=13,
                               n_jobs = -1)

rf_random.fit(x_train, y_train)

In [None]:
# Display optimal Random Forest Classifier parameters
rf_random.best_params_

In [None]:
# Score the best of the Random Forest Classifier models
rf_random_score = rf_random.best_estimator_.score(x_test, y_test)

print('Random Forest Classification Accuracy:  ' + str(format(rf_random_score, '.4f')))

In [None]:
# Display feature importances for the best Random Forest Classifier models
best_rf_classifier = rf_random.best_estimator_

pd.Series(best_rf_classifier.feature_importances_, index=x_train.columns).sort_values(ascending=False)

## Confusion Matrices

In [None]:
def confusion(classifier, x_test, y_test, y_pred, model, cmap):
    """
    Plot a confusion matrix for a given classifier and its test data.
    """
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18,8))
    fig.patch.set_alpha(0)

    norm_list = ['true', 'pred']
    titles = ['Normalized by Actual', 'Normalized by Prediction']
    labels = ['15+', '5-15', '<5', 'Fair Catch']
    
    fig.suptitle(model + ' Confusion Matrices', fontsize=24, fontweight='bold')

    for ax, norm, title in zip(axes.flatten(), norm_list, titles):
        plot_confusion_matrix(classifier, 
                              x_test, 
                              y_test, 
                              ax=ax, 
                              cmap=cmap,
                              normalize=norm,
                              values_format='.1%',
                              labels=labels
                             )
        ax.set_title(title, pad=15, fontsize=20, fontweight='bold')
        ax.set_xlabel('Predicted Classification', labelpad=20, fontsize=20, fontweight='bold')
        ax.set_xticklabels(labels, rotation=45, ha='right', fontsize=18)
        ax.set_yticklabels(labels, ha='right', fontsize=18)
        ax.set_ylabel('True Classification', labelpad=20, fontsize=20, fontweight='bold')
        ax.grid(None)
    
    plt.tight_layout(pad=2.2)  
    plt.show();

In [None]:
# Make predictions using the best Random Forest Classifier
y_pred = best_rf_classifier.predict(x_test)

In [None]:
# Create custom colormap for use in heatmaps
colors = [(1, 1, 1), ((106/256, 235/256, 245/256)), ((22/256, 159/256, 169/256)), ((14/256, 95/256, 101/256))] # first color is black, last is red
cm = LinearSegmentedColormap.from_list("Custom", colors, N=30)

In [None]:
# Display confusion matrices
sns.set(font_scale=2.0)
plt.rcParams.update({'font.size': 18})

confusion(classifier=best_rf_classifier, x_test=x_test, y_test=y_test, y_pred=y_pred, model='Punt Return - Random Forest', cmap=cm)

## Coverage Throughout the Return

In [None]:
# Determine plays where the punt was received and returned, exclude muffed punts, fair catches, and penalty plays
punt_received = punt_tracking[(punt_tracking['event']=='punt_received')][['gameId', 'playId']]
punt_received = punt_received.merge(plays[['gameId', 'playId', 'specialTeamsResult', 'possessionTeam', 'penaltyJerseyNumbers']], on=['gameId', 'playId'])

punt_received['penaltyTeam'] = np.where(punt_received.apply(lambda x: str(x['possessionTeam']) in str(x['penaltyJerseyNumbers']), axis=1),
                                        np.where(punt_received.apply(lambda x: ';' in str(x["penaltyJerseyNumbers"]), axis=1), 'Both', 'Kicking'),
                                        np.where(punt_received['penaltyJerseyNumbers'].astype(str)=='nan', 'None', 'Receiving')
                                       )

punt_returned = punt_received[(~(punt_received['specialTeamsResult'].isin(['Muffed', 'Fair Catch']))) &
                              (~(punt_received['penaltyTeam'].isin(['Both', 'Receiving'])))
                             ]

In [None]:
# Create columns displaying the frameVsSnap for start and end of the return
starting_cols = ['gameId', 'playId', 'specialTeamsResult', 'possessionTeam', 'penaltyJerseyNumbers', 'penaltyTeam'] 
merge_cols = ['gameId', 'playId', 'frameVsSnap']

punt_returned = punt_returned.merge(punt_tracking[punt_tracking['event']=='punt_received'][merge_cols],
                                    on=['gameId', 'playId'], how='left')

punt_returned = punt_returned.merge(punt_tracking[punt_tracking['event']=='out_of_bounds'][merge_cols],
                                    on=['gameId', 'playId'], how='left', suffixes=('_rec', '_oob'))

punt_returned = punt_returned.merge(punt_tracking[punt_tracking['event']=='tackle'][merge_cols],
                                    on=['gameId', 'playId'], how='left', suffixes=('_oob', '_tackle'))

punt_returned = punt_returned.merge(punt_tracking[punt_tracking['event']=='touchdown'][merge_cols],
                                    on=['gameId', 'playId'], how='left', suffixes=('_tackle', '_td'))

punt_returned = punt_returned.merge(punt_tracking[punt_tracking['event']=='fumble'][merge_cols],
                                    on=['gameId', 'playId'], how='left', suffixes=('_td', '_fumble'))


punt_returned.columns = starting_cols + ['returnStart', 'outOfBounds', 'tackle', 'touchdown', 'fumble']

punt_returned = punt_returned.drop_duplicates()

punt_returned['returnEnd'] = punt_returned[['outOfBounds','tackle', 'touchdown']].min(axis=1)
punt_returned.dropna(subset=['returnEnd'], inplace=True)

In [None]:
# Exclude all plays with fumbles
punt_returned = punt_returned[punt_returned['fumble'].astype(str)=='nan']

In [None]:
# Calculate the distance between each defender and the football, as well as average and minimum by frame
football_position = punt_tracking[punt_tracking['rolePlayer']=='football'][['gameId', 'playId', 'frameVsSnap', 'x', 'y']]
football_position.columns = ['gameId', 'playId', 'frameVsSnap', 'x_football', 'y_football']

punt_tracking = punt_tracking.merge(football_position, on=['gameId', 'playId', 'frameVsSnap'])

punt_tracking['dist_to_football'] = np.sqrt((punt_tracking['x_football'] - punt_tracking['x'])**2 + (punt_tracking['y_football'] - punt_tracking['y'])**2)

dist_to_football = punt_tracking[(punt_tracking['kickingTeam'])].groupby(['gameId', 'playId', 'frameVsSnap'])['dist_to_football']\
                                                                .agg({np.min,np.mean})\
                                                                .reset_index()

dist_to_football.columns = ['gameId', 'playId', 'frameVsSnap', 'minDist_football', 'meanDist_football']

In [None]:
# Filter tracking data to show frames that occurred during the return
punt_tracking_during_return = punt_tracking.merge(punt_returned[['gameId', 'playId', 'returnStart', 'returnEnd']], on=['gameId', 'playId'])

punt_tracking_during_return.dropna(subset=['returnStart', 'returnEnd'], inplace=True)

punt_tracking_during_return = punt_tracking_during_return[(punt_tracking_during_return['frameVsSnap']>=punt_tracking_during_return['returnStart']) &
                                                          (punt_tracking_during_return['frameVsSnap']<=punt_tracking_during_return['returnEnd'])
                                                         ]

In [None]:
# Calculate whether each defender was within two yards of the football during the return
punt_tracking_during_return['defender_within_two_yds'] = np.where((punt_tracking_during_return['dist_to_football'] <= 2.0) &
                                                                  (punt_tracking_during_return['kickingTeam']) &
                                                                  (punt_tracking_during_return['rolePlayer']!='football'), 1, 0)

In [None]:
# Show punt return touchdowns sorted by number of defenders within two yards throughout the play
close_defenders = punt_tracking_during_return.groupby(['gameId', 'playId'])['defender_within_two_yds'].sum().reset_index().sort_values('defender_within_two_yds', ascending=False)

In [None]:
# Determine the length of the return in frames (time)
punt_tracking_during_return['returnTime'] = punt_tracking_during_return['returnEnd'] - punt_tracking_during_return['returnStart']

In [None]:
# Calculate movement of the football during the return
return_length = punt_tracking_during_return[['gameId', 'playId', 'frameVsSnap', 'x_football', 'y_football']].drop_duplicates()

return_length['x_change'] = return_length['x_football'].diff()
return_length['y_change'] = return_length['y_football'].diff()
return_length['football_movement'] = np.sqrt(return_length['x_change']**2 + return_length['y_change']**2)


mask = ((return_length['gameId']==return_length['gameId'].shift(1)) &
        (return_length['playId']==return_length['playId'].shift(1))
       )

return_length['football_movement'] = np.where(mask!=True, np.nan, return_length['football_movement'])

return_length = return_length.groupby(['gameId', 'playId'])['football_movement'].sum().reset_index()

In [None]:
# Merge tables to create return difficulty table
return_difficulty = return_length.merge(close_defenders, on=['gameId', 'playId'], how='left')

return_difficulty = return_difficulty.merge(punt_tracking_during_return[['gameId', 'playId', 'returnTime']].drop_duplicates(),
                                            on=['gameId', 'playId'], how='left'
                                           )

In [None]:
# Add broken tackles to the return difficulty table
pff['brokenTackles'] = pff['missedTackler'].str.count(';') + 1

return_difficulty = return_difficulty.merge(pff[['gameId', 'playId', 'brokenTackles']],
                                            on=['gameId', 'playId'], how='left'
                                           )

return_difficulty['brokenTackles'].fillna(0, inplace=True)

In [None]:
# Convert return difficulty metrics to percentile ranking and calculate the average of all metrics
return_difficulty_perc = return_difficulty.copy()

for col in ['football_movement', 'defender_within_two_yds', 'returnTime', 'brokenTackles']:
    return_difficulty_perc[col] = return_difficulty_perc[col].rank(pct=True)

return_difficulty_perc['difficulty'] = 0.25 * (return_difficulty_perc['football_movement'] +
                                               return_difficulty_perc['defender_within_two_yds'] +
                                               return_difficulty_perc['returnTime'] +
                                               return_difficulty_perc['brokenTackles']
                                              )

In [None]:
# Merge game and player identification information to the return difficulty table for presentation
return_difficulty_perc = return_difficulty_perc.merge(games[['gameId', 'season', 'week', 'homeTeamAbbr', 'visitorTeamAbbr']],
                                                      on=['gameId'])

return_difficulty_perc = return_difficulty_perc.merge(punt_tracking[['gameId', 'playId', 'possessionTeam', 'returnerId']].drop_duplicates(),
                                                      on=['gameId', 'playId'])

return_difficulty_perc = return_difficulty_perc.merge(players[['nflId', 'displayName']], left_on=['returnerId'], right_on=['nflId'])

return_difficulty_perc['returnTeam'] = np.where(return_difficulty_perc['possessionTeam']==return_difficulty_perc['homeTeamAbbr'],
                                                return_difficulty_perc['visitorTeamAbbr'],
                                                return_difficulty_perc['homeTeamAbbr']
                                               )

return_difficulty_perc = return_difficulty_perc.merge(punt_returned[['gameId', 'playId', 'touchdown']], on=['gameId', 'playId'])

# Add touchdown column (yes/no) 
return_difficulty_perc['touchdown'] = np.where(return_difficulty_perc['touchdown'] > 0, 'Yes', 'No')

In [None]:
# Prepare return difficulty table for presentation
presentation_order = ['season', 'week', 'homeTeamAbbr', 'visitorTeamAbbr', 'possessionTeam', 'returnTeam',
                      'displayName', 'touchdown', 'difficulty', 'defender_within_two_yds', 'football_movement', 
                      'returnTime', 'brokenTackles']

new_names = ['Season', 'Week', 'Home', 'Away', 'Kick Team', 'Return Team', 'Returner', 'Touchdown', 'Difficulty', 'Defender Proximity',
             'Total Movement', 'Return Time', 'Broken Tackles']

return_difficulty_presentation = return_difficulty_perc[presentation_order]

return_difficulty_presentation.columns = new_names

### Punt Return Touchdowns - Degree of Difficulty

In [None]:
# Display punt return TD's sorted by difficulty
cm = sns.color_palette('RdYlGn', as_cmap=True)
heatmap_cols = ['Difficulty', 'Defender Proximity', 'Total Movement', 'Return Time', 'Broken Tackles']

return_difficulty_presentation[return_difficulty_presentation['Touchdown']=='Yes'].sort_values('Difficulty', ascending=False)\
                                                                                  .style.hide_index()\
                                                                                  .background_gradient(cmap=cm, subset=heatmap_cols)


### All Punt Returns - Degree of Difficulty

In [None]:
return_difficulty_presentation.sort_values('Difficulty', ascending=False)\
                              .style.hide_index()\
                              .background_gradient(cmap=cm, subset=heatmap_cols)


## Gunner Penetration and Vise Protection

#### Filter for Punt Plays of 40+ Yards with No Penalties

In [None]:
# Determine list of plays that meet criteria:  no penalties, 40+ yard punt
punt_play_subset = plays.copy()
punt_play_subset['penaltyCodes'] = punt_play_subset['penaltyCodes'].astype(str)

punt_play_subset = punt_play_subset[(punt_play_subset['kickLength'] >= 40) &
                                    (punt_play_subset['penaltyCodes'] == 'nan')]

# Filter gunner and vise tracking data for punt plays meeting the above criteria
punt_play_subset['game_play'] = punt_play_subset['gameId'].astype(str) + '-' + punt_play_subset['playId'].astype(str)
gunners_and_vises_tracking = punt_tracking[punt_tracking['rolePlayer'].isin(['gunner', 'vise'])].copy()
gunners_and_vises_tracking['game_play'] = gunners_and_vises_tracking['gameId'].astype(str) + '-' + gunners_and_vises_tracking['playId'].astype(str)

gunners_and_vises_tracking = gunners_and_vises_tracking[gunners_and_vises_tracking['game_play'].isin(punt_play_subset['game_play'])]

#### Match Each Gunner with a Vise and Combine Their Tracking Data to One Row Per Frame

In [None]:
# Filter for gunner and vise positions at the snap
gunners_and_vises_at_snap = gunners_and_vises_tracking[gunners_and_vises_tracking['frameVsSnap']==0].copy()

# Count number of gunners and vises per play and only keep plays with even numbers of gunners and vises (one on one)
gunner_vise_count = gunners_and_vises_at_snap.groupby(['gameId', 'playId', 'rolePlayer']).size().unstack(fill_value=0).reset_index()
one_on_one_punts = gunner_vise_count[gunner_vise_count['gunner']==gunner_vise_count['vise']]

one_on_one_at_snap = gunners_and_vises_at_snap.merge(one_on_one_punts, 
                                                     on=['gameId', 'playId'],
                                                     how='right')[['gameId', 'playId', 'nflId', 'rolePlayer', 'x', 'y']]

In [None]:
# Separate gunners and vises and then concatenate horizontally, matching each gunner with the vise lined up against him
gunners_at_snap = one_on_one_at_snap[one_on_one_at_snap['rolePlayer']=='gunner'].sort_values(['gameId', 'playId', 'y']).reset_index(drop=True)
vises_at_snap = one_on_one_at_snap[one_on_one_at_snap['rolePlayer']=='vise'].sort_values(['gameId', 'playId', 'y']).reset_index(drop=True)

gunners_vs_vises = pd.concat([gunners_at_snap[['gameId', 'playId', 'nflId']], vises_at_snap[['nflId']]], axis=1).reset_index(drop=True)

gunners_vs_vises.columns = ['gameId', 'playId', 'gunnerId', 'viseId']

In [None]:
# Separate tracking data for gunners and vises and concatenate horizonally, matching each gunner with the vise lined up against him
gunner_tracking = gunners_and_vises_tracking[gunners_and_vises_tracking['rolePlayer']=='gunner'][['gameId', 'playId', 'nflId', 'frameVsSnap', 'x', 'y']]
vise_tracking = gunners_and_vises_tracking[gunners_and_vises_tracking['rolePlayer']=='vise'][['gameId', 'playId', 'nflId', 'frameVsSnap', 'x', 'y']]

gunner_cols = ['gameId', 'playId', 'gunnerId', 'frameVsSnap', 'x_gunner', 'y_gunner']
vise_cols = ['gameId', 'playId', 'viseId', 'frameVsSnap', 'x_vise', 'y_vise']

gunner_tracking.columns = gunner_cols
vise_tracking.columns = vise_cols

gunner_vs_vise_tracking = gunner_tracking.merge(gunners_vs_vises, on=['gameId', 'playId', 'gunnerId'])

gunner_vs_vise_tracking = gunner_vs_vise_tracking.merge(vise_tracking, on=['gameId', 'playId', 'frameVsSnap', 'viseId'])

In [None]:
# Display the resulting dataframe, showing gunner and matching vise tracking data by frame
gunner_vs_vise_tracking.head()

#### Calculate Gunner Penetration by Frame (Absolute and Relative to Vise Position)

In [None]:
# Determine gunner and vise positions at the snap and add to tracking data
positions_at_snap = gunner_vs_vise_tracking[gunner_vs_vise_tracking['frameVsSnap']==0]
positions_at_snap.columns = ['gameId', 'playId', 'gunnerId', 'frameVsSnap', 'x_gunner_snap', 'y_gunner_snap',
                             'viseId', 'x_vise_snap', 'y_vise_snap']

gunner_vs_vise_tracking = gunner_vs_vise_tracking.merge(positions_at_snap[['gameId', 'playId', 'gunnerId', 'x_gunner_snap', 
                                                                           'y_gunner_snap', 'x_vise_snap', 'y_vise_snap']],
                                                        on=['gameId', 'playId', 'gunnerId'])

In [None]:
# Calculate gunner penetration (yards downfield) in absolute terms and verus the vise position
gunner_vs_vise_tracking['gunner_penetration'] = gunner_vs_vise_tracking['x_gunner'] - gunner_vs_vise_tracking['x_gunner_snap']
gunner_vs_vise_tracking['gunner_penetration_vs_vise'] = gunner_vs_vise_tracking['x_gunner'] - gunner_vs_vise_tracking['x_vise']

In [None]:
# Display the resulting dataframe, showing gunner and matching vise tracking data by frame
gunner_vs_vise_tracking.head()

#### Rank Gunners and Vises by Penetration Metrics

In [None]:
def top_gunner_vise(df, players, metric, frame, cutoff, id_field, ascend=False):
    """
    Return df with top gunners/vises for given metric and frame of play. Only return if number of punts cutoff is met.
    """
    
    # Calculate average of metric for given frame
    best_by_frame = df[df['frameVsSnap']==frame].groupby(id_field)[[metric]]\
                                                .mean()\
                                                .reset_index()\
                                                .sort_values(metric, ascending=ascend)
    
    # Determine number of qualifying punt plays for each gunner/vise
    punt_count = df[df['frameVsSnap']==frame].groupby(id_field)[[metric]].count().reset_index().sort_values(metric, ascending=ascend)

    # Merge punt play count to average by player and exclude players who do not meet the cutoff
    best_by_frame = best_by_frame.merge(punt_count, on=id_field).sort_values(metric + '_y', ascending=ascend)\
                                 .merge(players[['nflId', 'displayName']], left_on=id_field, right_on='nflId').drop(columns='nflId')
    
    best_by_frame.columns = [id_field, 'avg_' + metric, 'punt_plays', 'name']
    
    best_by_frame = best_by_frame[best_by_frame['punt_plays']>cutoff][['name', id_field, 'avg_' + metric, 'punt_plays']]\
                        .sort_values('avg_' + metric, ascending=ascend)
    
    # Add ranking column and return dataframe with ranking
    best_by_frame['rank'] = best_by_frame['avg_' + metric].rank(ascending=ascend).astype(int)
    
    return best_by_frame[['rank', id_field, 'name', 'avg_' + metric, 'punt_plays']].reset_index(drop=True)

In [None]:
# Establish number of punt plays required to be ranked
cutoff = 15

In [None]:
# Calculate absolute and relative penetration for gunners 3, 4, and 5 seconds into the play
gunner_penetration_frame30 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                             players=players,
                                             metric='gunner_penetration', 
                                             frame=30, 
                                             cutoff=cutoff,
                                             id_field='gunnerId',
                                             ascend=False)

gunner_penetration_frame40 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                             players=players,
                                             metric='gunner_penetration', 
                                             frame=40, 
                                             cutoff=cutoff,
                                             id_field='gunnerId',
                                             ascend=False)

gunner_penetration_frame50 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                             players=players,
                                             metric='gunner_penetration', 
                                             frame=50, 
                                             cutoff=cutoff,
                                             id_field='gunnerId',
                                             ascend=False)

gunner_penetration_vs_vise_frame30 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                     players=players,
                                                     metric='gunner_penetration_vs_vise', 
                                                     frame=30, 
                                                     cutoff=cutoff,
                                                     id_field='gunnerId',
                                                     ascend=False)

gunner_penetration_vs_vise_frame40 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                     players=players,
                                                     metric='gunner_penetration_vs_vise', 
                                                     frame=40, 
                                                     cutoff=cutoff,
                                                     id_field='gunnerId',
                                                     ascend=False)

gunner_penetration_vs_vise_frame50 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                     players=players,
                                                     metric='gunner_penetration_vs_vise', 
                                                     frame=50, 
                                                     cutoff=cutoff,
                                                     id_field='gunnerId',
                                                     ascend=False)

In [None]:
# Merge gunner rankings into a single dataframe and organize for presentation
original_cols = ['gunnerId', 'name', 'punt_plays', 'rank']
merge_cols = ['gunnerId', 'rank']

gunner_merged = gunner_penetration_frame30[original_cols].merge(gunner_penetration_frame40[merge_cols], on='gunnerId')

gunner_merged = gunner_merged.merge(gunner_penetration_frame50[merge_cols], on='gunnerId')

gunner_merged = gunner_merged.merge(gunner_penetration_vs_vise_frame30[merge_cols], on='gunnerId')

gunner_merged = gunner_merged.merge(gunner_penetration_vs_vise_frame40[merge_cols], on='gunnerId')

gunner_merged = gunner_merged.merge(gunner_penetration_vs_vise_frame50[merge_cols], on='gunnerId')

gunner_merged.columns = ['gunnerId', 'Name', 'Punt Plays', 'Penetration (3 Sec)', 'Penetration (4 Sec)', 'Penetration (5 Sec)',
                         'Vs Vise (3 Sec)', 'Vs Vise (4 Sec)', 'Vs Vise (5 Sec)']

In [None]:
# Calculate absolute and relative penetration vs vises 3, 4, and 5 seconds into the play
vise_penetration_frame30 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                           players=players,
                                           metric='gunner_penetration', 
                                           frame=30, 
                                           cutoff=cutoff,
                                           id_field='viseId',
                                           ascend=True)

vise_penetration_frame40 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                           players=players,
                                           metric='gunner_penetration', 
                                           frame=40, 
                                           cutoff=cutoff,
                                           id_field='viseId',
                                           ascend=True)

vise_penetration_frame50 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                           players=players,
                                           metric='gunner_penetration', 
                                           frame=50, 
                                           cutoff=cutoff,
                                           id_field='viseId',
                                           ascend=True)

vise_penetration_vs_vise_frame30 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                   players=players,
                                                   metric='gunner_penetration_vs_vise', 
                                                   frame=30, 
                                                   cutoff=cutoff,
                                                   id_field='viseId',
                                                   ascend=True)

vise_penetration_vs_vise_frame40 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                   players=players,
                                                   metric='gunner_penetration_vs_vise', 
                                                   frame=40, 
                                                   cutoff=cutoff,
                                                   id_field='viseId',
                                                   ascend=True)

vise_penetration_vs_vise_frame50 = top_gunner_vise(df=gunner_vs_vise_tracking, 
                                                   players=players,
                                                   metric='gunner_penetration_vs_vise', 
                                                   frame=50, 
                                                   cutoff=cutoff,
                                                   id_field='viseId',
                                                   ascend=True)

In [None]:
# Merge gunner rankings into a single dataframe and organize for presentation
original_cols = ['viseId', 'name', 'punt_plays', 'rank']
merge_cols = ['viseId', 'rank']

vise_merged = vise_penetration_frame30[original_cols].merge(vise_penetration_frame40[merge_cols], on='viseId')

vise_merged = vise_merged.merge(vise_penetration_frame50[merge_cols], on='viseId')

vise_merged = vise_merged.merge(vise_penetration_vs_vise_frame30[merge_cols], on='viseId')

vise_merged = vise_merged.merge(vise_penetration_vs_vise_frame40[merge_cols], on='viseId')

vise_merged = vise_merged.merge(vise_penetration_vs_vise_frame50[merge_cols], on='viseId')

vise_merged.columns = ['viseId', 'Name', 'Punt Plays', 'Penetration (3 Sec)', 'Penetration (4 Sec)', 'Penetration (5 Sec)',
                         'Vs Vise (3 Sec)', 'Vs Vise (4 Sec)', 'Vs Vise (5 Sec)']

#### Gunner Rankings

In [None]:
# Add "Average Rank" column, sort, add heatmap, and present
gunner_merged['Average Rank'] = gunner_merged[['Penetration (3 Sec)', 'Penetration (4 Sec)', 'Penetration (5 Sec)',
                                               'Vs Vise (3 Sec)', 'Vs Vise (4 Sec)', 'Vs Vise (5 Sec)']].mean(axis=1)

cm = sns.color_palette('RdYlGn_r', as_cmap=True)

no_heatmap_cols = ['Name', 'Punt Plays']
heatmap_cols = ['Average Rank', 'Penetration (3 Sec)', 'Penetration (4 Sec)', 'Penetration (5 Sec)',
                'Vs Vise (3 Sec)', 'Vs Vise (4 Sec)', 'Vs Vise (5 Sec)']

gunner_merged.sort_values('Average Rank')[no_heatmap_cols + heatmap_cols].style.hide_index()\
                                                                               .background_gradient(cmap=cm,subset=heatmap_cols)

#### Vise Rankings

In [None]:
# Add "Average Rank" column, sort, add heatmap, and present
vise_merged['Average Rank'] = vise_merged[['Penetration (3 Sec)', 'Penetration (4 Sec)', 'Penetration (5 Sec)',
                                             'Vs Vise (3 Sec)', 'Vs Vise (4 Sec)', 'Vs Vise (5 Sec)']].mean(axis=1)

vise_merged.sort_values('Average Rank')[no_heatmap_cols + heatmap_cols].style.hide_index()\
                                                                             .background_gradient(cmap=cm,subset=heatmap_cols)