## Introduction

In the National Football League, pinning a team deep in their own territory or flipping the field with a deep, well-covered punt can lead to large momentum changes within a game.  The extra yardage gained by these punts can smother opposing offenses, help struggling defenses make important stops, as well as help your own offense get the ball back in better field position.  While these game changing punts are the amalgamation of several key components, such as protection and coverage schemes, an often overlooked component is the performance of the punter to set-up these plays.

As the "quarterback" of special team punts, the punter can read the return team's pre-snap coverage and determine where to punt the ball in order to minimize the opposing team's field position.  Ultimately, the value of interest is this resulting expected field position.  The location of the punt relative to everyone else on the field will determine the actions of the return team, i.e. fair catching the ball, attempting to return, or letting the ball bounce.  This decision by the return team then determines the final field position of the ball at the end of the punt play.

Importantly, the analysis below is meant to evaluate the performance of the punter given only the pre-snap information available to the punter and the target landing location of the football in an attempt to decouple the play-specific performance of the other players.  Modelling the return team decision and the random bouncing of the football, each punt can be evaluated in terms of the resulting field position and compared to the expected value for an optimal punt location.

# 1. Data Processing

### Where does the ball land?

The most consistent method to find the frame where the ball lands is to use the measured hang time and add this to the tracking data time from the punt frame.  For those plays where the 'punt' event frame is not confidently found, the punt frame can instead be found by adding the operation time from the time of the snap.  One method to check that the landing frame of the football has been consistently found is to the compare the position of the ball in this frame to the recorded kick length (and adding the original line of scrimmage).  When the ball is cleanly caught from the air, the two values are in clear agreement between, represented by the dashed line in the plot below.  However, the distance the ball bounces is not included when finding the landing frame of the ball, even though it is included in the kick length measurement.  As a result, balls bouncing backwards will have shorter kick lengths then the landing location, while conversely balls bouncing forward will appear longer.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.integrate import nquad
import seaborn as sns

import os

dir_name = '/kaggle/input/nfl-big-data-bowl-2022/'

play_data = pd.read_csv(os.path.join(dir_name,'plays.csv'))
pff_data = pd.read_csv(os.path.join(dir_name,'PFFScoutingData.csv'))

discard_results = ['Non-Special Teams Result']
punt_plays = play_data[(play_data['specialTeamsPlayType']=='Punt')&(~play_data['specialTeamsResult'].isin(discard_results))]
pff_punt = pff_data.merge(punt_plays[['gameId','playId']])


# 2018 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2018.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("displayName == 'football'") for x in tracking_game), ignore_index=True)
punt_fbtrack_18 = football_tracking.merge(punt_plays[['gameId','playId']])
punt_fbtrack_18['time'] = pd.to_datetime(punt_fbtrack_18['time'])

# 2019 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2019.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("displayName == 'football'") for x in tracking_game), ignore_index=True)
punt_fbtrack_19 = football_tracking.merge(punt_plays[['gameId','playId']])
punt_fbtrack_19['time'] = pd.to_datetime(punt_fbtrack_19['time'])

# 2020 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2020.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("displayName == 'football'") for x in tracking_game), ignore_index=True)
punt_fbtrack_20 = football_tracking.merge(punt_plays[['gameId','playId']])
punt_fbtrack_20['time'] = pd.to_datetime(punt_fbtrack_20['time'])

# Combine all above tracking data
punt_fbtrack = pd.concat([punt_fbtrack_18,punt_fbtrack_19,punt_fbtrack_20]) \
                .sort_values(by=['gameId','playId','frameId']) \
                .drop(columns=['o','dir','nflId','displayName','jerseyNumber','position','team']) \
                .reset_index(drop=True)

# Add relevant PFF data
punt_fbtrack = punt_fbtrack.merge(pff_punt[['gameId','playId','operationTime','hangTime','kickContactType']])



# Select OOB and DEZ events and find all frames where the ball is out of bounds or past the endzones
# Label these frames with a new event (if current event is None) to label that the ball has left the field of play.
outbound_plays = punt_fbtrack[punt_fbtrack['kickContactType'].isin(['OOB','DEZ'])].copy()
outbound_plays.loc[
    ((outbound_plays['x']<10)|(outbound_plays['x']>110) # Touchback
    |(outbound_plays['y']<0)|(outbound_plays['y']>160/3)) # Out of Bound
    &(outbound_plays['event']=='None'), # Do not replace other events
    'event'] = 'out_bound'

# Find the frame that the ball goes out of bounds, or is caught/lands inbounds
outbound_events = ['out_bound','out_of_bounds','punt_land','kick_received','punt_muffed','punt_received','fair_catch','touchback']
outbound_plays_final = outbound_plays[outbound_plays['event'].isin(outbound_events)].drop_duplicates(subset=['gameId','playId'])



# Add hangTime from PFF data to the frame where there is a punt or autoevent_kick event in the tracking data
# This time gives a frame where the ball is recorded to land.
# Most of the times, this frame will not have an event listed, but it will be very close to a large number of possible events
inbound_plays = punt_fbtrack.merge(outbound_plays_final[['gameId','playId']], how='left', indicator=True).query('_merge == "left_only"').drop(columns=['_merge'])

# Ball is not being tracked properly, so we cannot accurately find where the ball lands
ball_data_oob = inbound_plays[((inbound_plays['y']<0)|(inbound_plays['y']>160/3))&(inbound_plays['frameId']<30)].drop_duplicates(subset=['gameId','playId'])
inbound_plays = inbound_plays.merge(ball_data_oob[['gameId','playId']],how='left',indicator = True).query('_merge == "left_only"').drop(columns=['_merge'])

# Find punt frame and landing frame
punt_frames = inbound_plays[inbound_plays['event'].isin(['punt','autoevent_kickoff'])].drop_duplicates(subset=['gameId','playId'],keep='last').copy()
punt_frames['landFrame'] = punt_frames['frameId'] + np.ceil(10*punt_frames['hangTime'])

# One play has the 'punt'-event frame clearly with significant error, so we exclude this event.
punt_frames.drop(punt_frames[(punt_frames['gameId']==2018112507)&(punt_frames['playId']==560)].index,inplace=True)

# Save the frames equal to the calculated landing frame
inbound_plays = inbound_plays.merge(punt_frames[['gameId','playId','landFrame']], how='left')
lf_from_ht = inbound_plays.query('frameId == landFrame')



# Find the remaining in-bound plays
# These plays are broken down into 3 sections
# PFF has no hang time data, there was no 'Punt' event frame found, or the calculated landing frame is after the tracking data ends.
ib_remain = inbound_plays.merge(lf_from_ht[['gameId','playId']],how='left', indicator=True).query('_merge == "left_only"').drop(columns=['_merge']).copy()
no_ht = ib_remain[ib_remain['hangTime'].isna()].copy()
no_lt = ib_remain[(~ib_remain['hangTime'].isna())&(ib_remain['landFrame'].isna())].iloc[:,:-2].copy()
no_frame = ib_remain[(~ib_remain['hangTime'].isna())&(~ib_remain['landFrame'].isna())].copy()

# No "punt" frame event can be solved by adding operation time to the mix
no_lt = no_lt.merge(pff_punt[['gameId','playId','operationTime']])
snap_frames = no_lt[no_lt['event']=='ball_snap'].copy()
snap_frames['landTime'] = snap_frames['frameId'] + np.floor(10*(no_lt['operationTime']+no_lt['hangTime']))
no_lt = no_lt.merge(snap_frames[['gameId','playId','landTime']])
no_lt_lf = no_lt.query('frameId == landTime')

# When the landing frame is not present in the tracking data, we allow 3 or less frames of tolerance
# This is likely similar error we have with bunching times in the tracking data
no_frame_last = no_frame.drop_duplicates(subset=['gameId','playId'],keep='last')
no_frame_last_keep = no_frame_last[no_frame_last['landFrame'] - no_frame_last['frameId']<=3]
no_frame_last_remove = no_frame_last[no_frame_last['landFrame'] - no_frame_last['frameId']>3]

# No hangtime data usually means the punt was blocked, which we will usually discard here.
# However, some of these events were only deflected and still made it past the line of scrimmage
# Last 2 bounce frames can be found manually watching the replay
# gameId = 2018102111, playId = 3651, ball bounces on frame 38.
# gameId = 2020100401, playId = 211,  ball bounces on frame 54.
no_ht_def_punts = no_ht.merge(punt_plays[['gameId','playId','specialTeamsResult']]).query('specialTeamsResult != "Blocked Punt"')
no_ht_lf = pd.concat([no_ht_def_punts[no_ht_def_punts['event'].isin(['fair_catch','punt_received','punt_land'])],
    no_ht_def_punts[(no_ht_def_punts['gameId']==2018102111)&(no_ht_def_punts['playId']==3651)&(no_ht_def_punts['frameId']==38)],
    no_ht_def_punts[(no_ht_def_punts['gameId']==2020100401)&(no_ht_def_punts['playId']==211)&(no_ht_def_punts['frameId']==54)]])
no_ht_lf = no_ht_lf.merge(no_ht_def_punts[no_ht_def_punts['event']=='punt'][['gameId','playId','frameId']].rename({'frameId':'puntFrame'},axis=1))
no_ht_lf['hangTime'] = (no_ht_lf['frameId'] - no_ht_lf['puntFrame'])/10  # Calculate time from punt frame to landing frame, this is definition of hangTime



# Collect all landing frame data
fb_landing = pd.concat([outbound_plays_final,lf_from_ht,no_lt_lf,no_frame_last_keep,no_ht_lf],ignore_index=True)[['gameId','playId','time','x','y','frameId','event','hangTime','playDirection']].rename({'x':'x_land','y':'y_land','frameId':'landFrame'},axis=1)


# Compare the calculated landing location to the play data kick length + starting location
compare_xloc = punt_plays[['gameId','playId','absoluteYardlineNumber','kickLength']].merge(fb_landing[['gameId','playId','x_land','playDirection']]).merge(pff_punt[['gameId','playId','kickContactType']])
compare_xloc.loc[compare_xloc['playDirection']=='left', 'x_land'] = 120 - compare_xloc.loc[compare_xloc['playDirection']=='left', 'x_land']
compare_xloc.loc[compare_xloc['playDirection']=='left', 'absoluteYardlineNumber'] = 120 - compare_xloc.loc[compare_xloc['playDirection']=='left', 'absoluteYardlineNumber']
compare_xloc['x_tot'] = compare_xloc['absoluteYardlineNumber'] + compare_xloc['kickLength']
compare_xloc = compare_xloc[compare_xloc['kickContactType'].isin(['BF','CC', 'BB'])]
compare_xloc['dx'] = compare_xloc['x_tot'] - compare_xloc['x_land']


lab_dict = {'BB':'Bounce Back', 'BF':'Bounce Forward', 'CC':'Clean Catch'}

fig,ax = plt.subplots(figsize=(8,6))
for label,group_df in compare_xloc.groupby('kickContactType'):
    ax.scatter(group_df['x_land'],group_df['x_tot'], label=lab_dict[label], alpha=0.5)
ax.legend(bbox_to_anchor=(.02, 1), loc='upper left')
ax.plot(np.linspace(0,120,2),np.linspace(0,120,2),'k--')
plt.xlabel('Landing Location')
plt.ylabel('Kick Length + Line of Scrimmage')
plt.ylim(35,110)
plt.xlim(20,120)
plt.show()

### Optimal 'League Average' Punting

Ultimately, the goal is to compare the performance of a punter to what a "league-average" punter would be capable.  While much of this averaging will be inherent in the modeling process below, one important aspect is to understand the distance a "league-average" punter is capable of kicking and how the hang time of each punt would change depending how far the punt travelled.

The complexity comes from not having access to information based on the vertical position of the football.  However, using kinematic equations, the distance the ball travels from the punter to the landing coordinates and the hang time are related to the velocity of the ball immediately following the punt by the following relation:
\begin{equation}
v_\text{punt}^2 = \left(\frac{\Delta x}{T}\right)^2 + \left(g T\right)^2,
\end{equation}
where $\Delta x$ is the distance the ball travels, $T$ is the hang time, and $g = 32.1$ ft$/s^2$ is the gravitational acceleration.  When considering a "league average" punter, the mean velocity of a kick can be found from the velocity distribution from all punts found using this formula.

In [None]:
###############################################################################################################
# Find the snap frame for each play.  This allows some players to move to a final resting spot before the snap
###############################################################################################################

fb_use = punt_fbtrack.merge(fb_landing[['gameId','playId']]).copy()
fb_use['dx'] = fb_use.groupby(['gameId','playId'])['x'].diff(-1)

# Generally speaking, the 'ball_snap' event is very reliable, so if it is present in the data, always use that
snaps = fb_use[fb_use['event']=='ball_snap']

# Otherwise, if the ball is already moving a significant distance (0.1 yards) in the first frame, then take the first frame as the snap frame
no_snap = fb_use[~(fb_use['gameId'].astype(str) + fb_use['playId'].astype(str)) \
                 .isin((snaps['gameId'].astype(str) + snaps['playId'].astype(str)).values)]
frame1_snap = no_snap[(no_snap['frameId']==1)&(np.abs(no_snap['dx'])>0.1)]

#  For the remaining plays, take the first frame where the ball moves a significant amount as the snap frame
remaining = fb_use.merge(no_snap[(no_snap['frameId']==1)&(np.abs(no_snap['dx'])<=0.1)][['gameId','playId']])
early_snap = remaining[np.abs(remaining['dx'])>0.1].groupby(['gameId','playId']).head(1)

snap_frames = pd.concat([snaps,frame1_snap,early_snap])[['gameId','playId','frameId']]



############################################################
# Pull the positional data for each player at the snap frame
############################################################

max_frame = max(snap_frames['frameId'])

# 2018 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2018.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("frameId <= @max_frame") for x in tracking_game), ignore_index=True)
punt_fbtrack_18 = football_tracking.merge(snap_frames)
punt_fbtrack_18['time'] = pd.to_datetime(punt_fbtrack_18['time'])

# 2019 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2019.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("frameId <= @max_frame") for x in tracking_game), ignore_index=True)
punt_fbtrack_19 = football_tracking.merge(snap_frames)
punt_fbtrack_19['time'] = pd.to_datetime(punt_fbtrack_19['time'])

# 2020 Tracking Data
tracking_game = pd.read_csv(os.path.join(dir_name,'tracking2020.csv'), chunksize=10**5)
football_tracking = pd.concat((x.query("frameId <= @max_frame") for x in tracking_game), ignore_index=True)
punt_fbtrack_20 = football_tracking.merge(snap_frames)
punt_fbtrack_20['time'] = pd.to_datetime(punt_fbtrack_20['time'])

# Combine all above tracking data
track_snap = pd.concat([punt_fbtrack_18,punt_fbtrack_19,punt_fbtrack_20]) \
                .sort_values(by=['gameId','playId','frameId']) \
                .drop(columns=['s','a','dis','event','frameId','displayName','jerseyNumber']) \
                .reset_index(drop=True)


# Flip tracking data so that all punts are 'right'-directional, i.e. the punt moves towards larger x-positions.

def flip_play_direction(df):
    df_flipped = df.copy()
    df_flipped.loc[df_flipped['playDirection']=='left', 'x'] = 120 - df_flipped.loc[df_flipped['playDirection']=='left', 'x']
    df_flipped.loc[df_flipped['playDirection']=='left', 'y'] = (160/3 - df_flipped.loc[df_flipped['playDirection']=='left', 'y']).round(2)
    
    # When orientation of players is included, flipping field is equal to a 180 degree rotation.
    # Mod by 360 degrees to keep values in the range [0,360)
    orientation_cols = [col for col in df_flipped.columns if col in ['o','dir']]
    for col in orientation_cols:
        df_flipped[col] = (180*(df_flipped['playDirection']=='left') + df_flipped[col])%360
    
    return df_flipped

fbland_f = flip_play_direction(fb_landing.rename({'x_land':'x', 'y_land':'y'},axis=1))
snap_track_f = flip_play_direction(track_snap)

fb_snap = snap_track_f[snap_track_f['team']=='football'].copy() # Football
player_snap = snap_track_f[snap_track_f['team'].isin(['home','away'])] # Players at snap frame

# Every play has a designated punters (either 'P' or 'K')
punter = player_snap[player_snap['position'].isin(['P','K'])].sort_values('x',ascending=False).groupby(['gameId','playId']).last().reset_index()


# Calculate punter's kick velocity
kick_length = punter[['gameId','playId','x','y','nflId']].rename({'x':'x_punt', 'y':'y_punt'},axis=1).merge(fbland_f[['gameId','playId','x','y','hangTime']].rename({'x':'x_land', 'y':'y_land'},axis=1))
kick_length['dx'] = kick_length['x_land'] - kick_length['x_punt']
kick_length['dy'] = kick_length['y_land'] - kick_length['y_punt']
kick_length['dist'] = np.sqrt(kick_length['dx']**2 + kick_length['dy']**2)

def calc_const(dist, ht):
    g = 32.1/3
    v_const = (dist/ht)**2 + (g*ht/2)**2
    return np.sqrt(v_const)

kick_length['v_punt'] = calc_const(kick_length['dist'],kick_length['hangTime'])
vc_avg = kick_length['v_punt'].describe()['mean']


def calc_time(del_x, v_const):
    g = 32.1/3
    p1 = 2/g**2*v_const**2
    p2 = 2/g**2*np.sqrt(v_const**4 - g**2 * del_x**2)
    return [np.sqrt(p1-p2),np.sqrt(p1+p2)]


fig, ax = plt.subplots(1,2, figsize=(12,6))
sns.histplot(data=kick_length, x='v_punt', stat='density', ax=ax[0])
ax[0].axvline(x=vc_avg, label=f'Mean = {vc_avg:.1f} yd/s',c='k', ls='--')
ax[0].legend()
ax[0].set_xlim(15,32.5)

dist_array = np.arange(0,3*vc_avg**2/32.1,0.1)
ht_m, ht_p = calc_time(dist_array, vc_avg)
ax[1].scatter(dist_array, ht_p, alpha=0.5)
ax[1].scatter(dist_array, ht_m, alpha=0.5)
ax[1].plot(np.array([0,70]), np.sqrt(2/(32.1/3)**2*vc_avg**2)*np.array([1,1]), 'k--')
ax[1].text(24, 4.45, 'High launch angle', fontsize=12)
ax[1].text(24, 2, 'Low launch angle', fontsize=12)
ax[1].text(42,0.1, 'velocity = 27.2 yd/s', fontsize=12)
ax[1].set_xlabel('Distance (yds)')
ax[1].set_ylabel('Hang Time (s)')
ax[1].set_xlim(0,70)
ax[1].set_ylim(0,5.2)
plt.show()

Using this kinematic relation, given a punt velocity and the punter's original location, any position on the field is a potential landing location given the condition
\begin{equation}
T = \frac{2}{g^2} \left[v_\text{punt}^2 + (v_\text{punt}^4 - g^2 \Delta x^2)^{1/2} \right] \;\; \rightarrow \;\; \Delta x \leq \frac{v_\text{punt}^2}{g}.
\end{equation}
This hang time is always taken as the higher launch angle solution, even if the real punt is calculated at the shorter solution.

# 2. Fair Catch, Return, or Not Fielded?

In order to model the return team's decision making process on whether or not to field the football, a 2D Convolution Neural Network is used. The general model structure is very similar to the work used by The Zoo team in the 2020 Big Data Bowl (write up here: https://www.kaggle.com/c/nfl-big-data-bowl-2020/discussion/119400) modeling the length of a rushing play.  The two dimensions of the CNN represent punt team vs return team data, several features encoding the relative position data of the team.  In their work, the player/location of central interest was the running back, which acts as an attractor for all defensive players.  However for a punt, some players on the return team are attempting to pressure the punter, while others are moving towards the landing position of the ball.

In [None]:
inbound_land = fb_landing[(fb_landing['x_land']<110)&(fb_landing['x_land']>10)&(fb_landing['y_land']>0)&(fb_landing['y_land']<160/3)].copy()
inbound_land = inbound_land.merge(punt_plays[['gameId','playId','specialTeamsResult']]).merge(pff_punt[['gameId','playId','kickContactType']]).sort_values(['gameId','playId'])


############################################################################
# Assign each punt to be either a bounce or a catch (For binary classifier)
############################################################################

# For the plays with the ball landing in-bounds, we need to categorize whether the receiving team catches (or attempts to catch) the punt or not.
# This can be done looking at the Contact Type recorded for each punt.  Some require more nuance.

# All punts of a particular contact type are bounces or catches
contact_to_land = {
    'BB':'Bounce',
    'BC':'Catch',
    'BF':'Bounce',
    'BOG':'Bounce',
    'CC':'Catch',
    'CFFG':'Bounce',
    'DEZ':'Bounce',
    'KTB':'Bounce',
    'KTC':'Bounce',
    'KTF':'Bounce',
}

# The remaining punts (ICC, MBC, MBDR, OOB, NaN) need to be looked at individually.
# OOB with the event as out_of_bounds should be listed as out_plays, so we must remove.
# For OOB and NaN events, the events are all filled and correspond to the following catch/bounce result:
event_to_land = {
    'kick_received':'Catch',
    'punt_received':'Catch',
    'punt_land':'Bounce',
    'fair_catch':'Catch',
    'fumble_defense_recovered':'Catch',
    'punt_muffed':'Catch',
    'fumble':'Catch' # For MBDR event
}

# Look at MBC punts:
# 2018112504 - 2312 -> 'Catch' (Found from the replay highlights)
# All others are 'Bounce'

# Look at ICC punts:
# specialTeamsResult = Fair Catch -> 'Catch'


# Remaining events, make event table look at [2] in event table
event_to_land_remain = {
    'punt_received':'Catch',
    'first_contact':'Catch',
    'punt_muffed':'Catch',
    'punt_received':'Catch',
    'punt_downed':'Bounce',
    'punt_land':'Bounce',
    'fair_catch':'Catch',
    'kick_received':'Catch',
    'fumble':'Catch'
}


# Create ordered list of events (exluding those in a particular list) for each play
def event_table(df, exc_list):
    df_trim = df[(~df['event'].isin(exc_list))&(df['event']!='None')].copy()
    df_trim['event_num'] = df_trim.groupby(['gameId','playId']).cumcount().values
    return pd.pivot_table(df_trim, values='event', index=['gameId','playId'], columns='event_num', aggfunc=lambda x: ''.join(x))


# Remove plays that are called out of bounds, even if the landing frame claims they are in bound
# This is mostly for consistency.  Either this landing frame is found incorrectly, or the refs whistled the play dead, so the return team may play differently
inbound_land_remoob = inbound_land.drop(inbound_land[((inbound_land['kickContactType']=='OOB')|(inbound_land['kickContactType'].isna()))&(inbound_land['event']=='out_of_bounds')].index)

# Using the above dictionaries, assign Catch/Bounce labels to each punt (without needing event table)
inbound_land_remoob['Type'] = inbound_land_remoob['kickContactType'].map(contact_to_land)
inbound_land_remoob.loc[inbound_land_remoob['Type'].isna(),'Type'] = inbound_land_remoob[(inbound_land_remoob['kickContactType'].isin(['OOB','MBDR']))|(inbound_land_remoob['kickContactType'].isna())]['event'].map(event_to_land)
inbound_land_remoob.loc[inbound_land_remoob['kickContactType']=='MBC','Type'] = 'Bounce'
inbound_land_remoob.loc[(inbound_land_remoob['gameId']==2018112504)&(inbound_land_remoob['playId']==2312),'Type'] = 'Catch'
inbound_land_remoob.loc[(inbound_land_remoob['kickContactType']=='ICC')&(inbound_land_remoob['specialTeamsResult']=='Fair Catch'),'Type'] = 'Catch'

# Use event table and the last dictionary to assign Catch/Bounce, then merge the results to have the full list of punt results.
remaining_ibland = inbound_land_remoob[inbound_land_remoob['Type'].isna()]
full_event_punts_table = event_table(punt_fbtrack.merge(remaining_ibland[['gameId','playId']]),['line_set']).reset_index([0,1]).merge(remaining_ibland[['gameId','playId','specialTeamsResult','kickContactType']])
full_event_punts_table['Type_update'] = full_event_punts_table[2].map(event_to_land_remain)

inbound_land_remoob = inbound_land_remoob.merge(full_event_punts_table[['gameId','playId','Type_update']], how='left')
inbound_land_remoob.loc[inbound_land_remoob['Type'].isna(), 'Type'] = inbound_land_remoob.loc[inbound_land_remoob['Type'].isna(), 'Type_update']
inbound_land_remoob.drop(columns=['Type_update', 'specialTeamsResult','kickContactType','event'],inplace=True)

bounce_catch = inbound_land_remoob.sort_values(['gameId','playId'])

In [None]:
# Separate players based on whether they are on the same team as the punter or not
# Note, we remove the punter from the punt team right away.
temp = player_snap.merge(punter[['gameId','playId','nflId','team']],on=['gameId','playId'],suffixes=('','_punt'))
punt_team = temp[(temp['team']==temp['team_punt'])&(temp['nflId']!=temp['nflId_punt'])].copy()
rec_team = temp[temp['team']!=temp['team_punt']].copy()

# For each play, get a list of x and y positions for each player at the snap for both punt and return team
punt_team['num'] = punt_team.groupby(['gameId','playId']).cumcount()
punt_team_pos = pd.pivot_table(punt_team,values=['x','y'],index=['gameId','playId'],columns='num')
punt_team_pos.columns  = [f'x_p{i}' for i in range(10)] + [f'y_p{i}' for i in range(10)]
punt_team_pos = punt_team_pos[[punt_team_pos.columns[i] for i in np.array([[j, j+10] for j in range(10)]).reshape((20,))]].reset_index()

rec_team['num'] = rec_team.groupby(['gameId','playId']).cumcount()
rec_team_pos = pd.pivot_table(rec_team,values=['x','y'],index=['gameId','playId'],columns='num')
rec_team_pos.columns  = [f'x_r{i}' for i in range(11)] + [f'y_r{i}' for i in range(11)]
rec_team_pos = rec_team_pos[[rec_team_pos.columns[i] for i in np.array([[j, j+11] for j in range(11)]).reshape((22,))]].reset_index()

# Get location of football at the snap.  Note, we do not use the tracking data for the x-value of the football
# This is because sometimes the snap frame is slightly delayed, and we really want to know what the original line of scrimmage is
football_loc = fb_snap[['gameId','playId','x','y','playDirection']].merge(punt_plays[['gameId','playId','absoluteYardlineNumber']]) \
                    .rename({'absoluteYardlineNumber':'x_yl'},axis=1)
football_loc['x_yl'] += (football_loc['playDirection']=='left')*(120-2*football_loc['x_yl'])
fb_loc = football_loc[['gameId','playId','x_yl','y']].rename({'x_yl':'x_fb','y':'y_fb'},axis=1)



# Combine all the individual play data needed to train the fielding decision model
all_pos = punter[['gameId','playId','nflId','x','y']].rename({'nflId':'nflId_punt','x':'x_punt','y':'y_punt'},axis=1)\
    .merge(punt_team_pos).merge(rec_team_pos)\
    .merge(fb_loc).merge(fbland_f[['gameId','playId','x','y','hangTime']].rename({'x':'x_land','y':'y_land'},axis=1)) \
    .merge(bounce_catch[['gameId','playId','Type']]).merge(punt_plays[['gameId','playId','specialTeamsResult']])



# This final dataframe contains all the information used to train the CNNs.  The individual training is saved for separate notebooks linked above.

The nine features used in the model are simply the relative x and y positions of the players or locations of interest, in addition to the hang time of the punt.  Note, the punter is excluded from the punt team, giving the result 11 x 10 x 9 tensor with which to train.

Features:
* (Punt Team - Return Team) X, Y
* (Punter - Return Team) X, Y
* (Landing Location - Return Team) X, Y
* Presnap Football Location X, Y
* Hang time

With these features, two separate networks are trained: One trained for the binary classification of whether the punt will be fielded or not; the other trained on classifying the three outcomes of a punt being returned, fair caught, or not fielded, which are then combined taking an ensemble average.  The goal of training the models in this way is to encourage the distinction between fielding and not fielding the punt.  Explicitly, the binary model outputs the probability of the punt not being fielded, while the three-class model outputs the probabilities of being a fair catch, not fielded, and a return.

the output of these two models are combined to find the following probabilities:
\begin{align}
\text{Not Fielded}&: P(\text{NF}) = \frac{1}{2}\left(P_{\text{binary}} + P_{\text{3-out}}(\text{NF})\right)\\
\text{Fair Catch}&: P(\text{FC}) = \left(1 - P(\text{NF})\right) \times \left(\frac{P_{\text{3-out}}(\text{FC})}{P_{\text{3-out}}(\text{FC}) + P_{\text{3-out}}(\text{R})}\right) \\
\text{Return}&: P(\text{R}) = \left(1 - P(\text{NF})\right) \times \left(\frac{P_{\text{3-out}}(\text{R})}{P_{\text{3-out}}(\text{FC}) + P_{\text{3-out}}(\text{R})}\right)
\end{align}

The above ensemble can be used to predict the probabilities for each outcome at each location on the field, making sure to modify the hang time for each proposed punt.  In the example plot below, the punt team is backed up deep in their own territory, with both gunners double covered.  The return team is attempting to take advantage of the already good field position and gain even more yardage with a return.

In [None]:
g = 2020121313 # 2018100703
p = 2577 # 4645
fbloc_ensprob = pd.read_csv(f'/kaggle/input/exampleensemble/ExampleEnsembleProb_{g}_{p}.csv')
ens_pos_ex1 = all_pos[(all_pos['gameId']==g)&(all_pos['playId']==p)]

fb_x = ens_pos_ex1.iloc[:,[47,49]]
fb_y = ens_pos_ex1.iloc[:,[48,50]]

punter_loc = ens_pos_ex1.iloc[:,3:5]

punt_team_x = ens_pos_ex1.iloc[:,5:24:2]
punt_team_y = ens_pos_ex1.iloc[:,6:25:2]

ret_team_x = ens_pos_ex1.iloc[:,25:46:2]
ret_team_y = ens_pos_ex1.iloc[:,26:47:2]


# Hash mark variables
hash_up_mid = 160/3 - 70.75/3
hash_down_mid = 70.75/3
hash_length = 2/3
x_hash_lines = np.arange(11,110,1)
y_hash_lines_up = [hash_up_mid-hash_length/2, hash_up_mid+hash_length/2]
y_hash_lines_down = [hash_down_mid-hash_length/2, hash_down_mid+hash_length/2]

gs_kw = dict(width_ratios=[8, 1], height_ratios=[8, 1])
fig, ax = plt.subplot_mosaic([['left', 'ul'],['left','ll']], gridspec_kw=gs_kw, figsize=(15, 6))


# Plot model output
ax['left'].scatter(fbloc_ensprob['x_land'],fbloc_ensprob['y_land'],c=fbloc_ensprob[['R','NF','FC']].values)

# Draw hash marks and 5-yard intervals
hash_alpha = 0.2
ax['left'].vlines(x=x_hash_lines,ymin=y_hash_lines_up[0],ymax=y_hash_lines_up[1],colors='k',alpha=hash_alpha) # Upper hash
ax['left'].vlines(x=x_hash_lines,ymin=y_hash_lines_down[0],ymax=y_hash_lines_down[1],colors='k',alpha=hash_alpha) # Lower hash
ax['left'].vlines(x=np.arange(15,110,5),ymin=0,ymax=54,colors='k',alpha=hash_alpha) # 5-yard intervals
ax['left'].vlines(x=[12,108],ymin=80/3-0.5,ymax=80/3+0.5,colors='k',alpha=hash_alpha) # 2-point conversion
ax['left'].vlines(x=[10,110],ymin=0,ymax=55,colors='k') # Endzones
#plt.vlines(x=yardline,ymin=0,ymax=55,colors='k',linestyles='dashed') # Line of Scrimmage

# Add text yard marker
ax['left'].text(20,12,'10',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(30,12,'20',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(40,12,'30',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(50,12,'40',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(60,12,'50',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(70,12,'40',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(80,12,'30',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(90,12,'20',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)
ax['left'].text(100,12,'10',horizontalalignment='center',verticalalignment='top',fontsize=30,alpha=hash_alpha)

ax['left'].text(20-1,160/3-12,'10',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(30-1,160/3-12,'20',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(40-0.5,160/3-12,'30',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(50-0.5,160/3-12,'40',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(60-0.8,160/3-12,'50',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(70-0.5,160/3-12,'40',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(80-0.5,160/3-12,'30',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(90-1,160/3-12,'20',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')
ax['left'].text(100-1,160/3-12,'10',horizontalalignment='center',verticalalignment='bottom',fontsize=30,alpha=hash_alpha,rotation='180')

# Plot each team's players
ax['left'].scatter(pd.concat([punt_team_x,punter_loc['x_punt']],axis=1).iloc[0],pd.concat([punt_team_y,punter_loc['y_punt']],axis=1).iloc[0],c='k',marker='x',s=50)
ax['left'].scatter(ret_team_x.iloc[0],ret_team_y.iloc[0],s=50,facecolors='none',edgecolors='k')

ax['left'].set_xlim(0,120)
ax['left'].set_ylim(0,160/3)
ax['left'].set_aspect(1)



test = np.arange(0,1,0.002)
test = pd.DataFrame([[x,y] for x in test for y in test],columns=['r','g'])
test['b'] = 1 - test['r'] - test['g']
test = test[test['b']>=0].round(2)

basis = np.array([[0.0, 1.0], [-1.5/np.sqrt(3), -0.5],[1.5/np.sqrt(3), -0.5]])
xy = np.dot(test.values,basis)
test['x'] = xy[:,0]
test['y'] = xy[:,1]

offset = 0.5
fontsize = 12

ax['ul'].scatter(test['x'],test['y'], c=test[['r','g','b']].values)
ax['ul'].text(basis[0,0]*(1+offset), basis[0,1]*(0.8+offset), 'P(R)', horizontalalignment='center',
            verticalalignment='center', fontsize=fontsize)
ax['ul'].text(basis[1,0]*(1+offset), basis[1,1]*(1+offset), 'P(NF)', horizontalalignment='center',
            verticalalignment='center', fontsize=fontsize)
ax['ul'].text(basis[2,0]*(1+offset), basis[2,1]*(1+offset), 'P(FC)', horizontalalignment='center',
            verticalalignment='center', fontsize=fontsize)

ax['ll'].text(0.4, 0.5, f'Game ID: {g}', transform=ax['ll'].transAxes,ha="center", va="center", fontsize=14, color="k")
ax['ll'].text(0.4, 0.0, f'Play ID: {p}', transform=ax['ll'].transAxes,ha="center", va="center", fontsize=14, color="k")

ax['ul'].set_frame_on(False)
ax['ul'].set_xticks(())
ax['ul'].set_yticks(())
ax['ul'].set_aspect(1)
ax['ll'].set_frame_on(False)
ax['ll'].set_xticks(())
ax['ll'].set_yticks(())
plt.show()

# 3. Return Length

The model for return length is similar in structure to the previous model, using a 2d CNN over punt vs return team player location data.  One significant difference however is removing the presnap location of the ball from the features.  Intuitively, assuming a punt is caught and returned, only relative positions should have a significant effect on the return length, outside of touchdown returns or safeties.

Furthermore, there is no additional feature indicating which player is the returner.  The main reason for this is that who will return the ball cannot always be known a priori.  There are a significant number of events with multiple members of the return team back deep in the field, and which member will return the ball may be dependent on where the ball is placed.  Similarly, there are laterals and longer throws that occasionally arise from the return team.  All of these add complexity to the return length (given only presnap information), many of which may be revealed in the formation.  Separating out a specific returner may miss out on these features.

As mentioned before, there is bound to be significant loss in any particular model using only the limited information available to the punter pre-snap.  Information such as blown lane assignments by the punt team can develop early within the punt and indicate potential large returns.  At the same time, these general mistakes are not (entirely) the fault of the punter and cannot be attributed as such.  Instead, the punt should evaluated assuming minimal mistakes, such as no missed tackles or muffed punts.  While there is certainly merit in some punts being more difficult to catch than others, leading to more mistakes by the returner, such considerations are beyond the scope of the present work.

In [None]:
outcome_land_data = fb_landing[['gameId','playId','landFrame']].merge(bounce_catch[['gameId','playId','Type']]).merge(play_data[['gameId','playId','specialTeamsResult']]).copy()

# Find punts that were returned, even if they bounced first
bounce_nf = ['Downed','Out of Bounds','Touchback']
punt_return = ['Return','Out of Bounds','Blocked Punt','Muffed']
outcome_land_data['Label'] = outcome_land_data['Type']
outcome_land_data.loc[(outcome_land_data['Type']=='Bounce')&(~outcome_land_data['specialTeamsResult'].isin(bounce_nf)),'Label'] = 'Catch'
outcome_land_data.loc[(outcome_land_data['Label']=='Catch')&(outcome_land_data['specialTeamsResult'].isin(punt_return)),'Label'] = 'Return'
returnable = outcome_land_data[(outcome_land_data['Label']=='Return')&(~outcome_land_data['specialTeamsResult'].isin(['Muffed','Downed','Fair Catch']))].drop(columns=['Label'])

# Look at tracking data from frames after the ball has landed
after_land = punt_fbtrack.merge(returnable).query('frameId >= landFrame')
frames_wevents = after_land.query('event != "None"')
excl_events=['punt_received','punt_land','kick_received','fair_catch','penalty_flag','lateral','pass_outcome_caught']
frames_wevents = frames_wevents[~frames_wevents['event'].isin(excl_events)]
frames_wevents['count'] = frames_wevents.groupby(['gameId','playId']).cumcount()

# After exluding a large number of events, nearly always the first event after the landing frame is first contact.
# The two fumble recovery events are:
# 2018 game > go to first contact
# 2019 game > go to recover
# Note there is also an infamous touchback/safety/return punt caused by unsportsmanlike conduct penalty, which we remove
return_values = frames_wevents.loc[(frames_wevents['count']==0)&(frames_wevents['event']!='touchback'),['gameId','playId','frameId','x','y','event']].copy()
return_values.loc[return_values['event']=='fumble_defense_recovered',['frameId','x','y']] = \
       frames_wevents.merge(frames_wevents[(frames_wevents['event']=='fumble_defense_recovered')
                                         &(frames_wevents['count']==0)][['gameId','playId']]).query('event=="first_contact"')[['frameId','x','y']].values

# Given these first contact frames, calculate the yards gained from the landing location to first contact
return_data = return_values.merge(after_land[['gameId','playId','playDirection']].drop_duplicates()).reset_index(drop=True).sort_values(['gameId','playId'])
return_dist = flip_play_direction(return_data).merge(fbland_f[['gameId','playId','x','y']], on=['gameId','playId'],suffixes=['','_land']).merge(punt_plays[['gameId','playId','kickReturnYardage']]).copy()
return_dist['YardsToContact'] = return_dist['x_land'] - return_dist['x']

sns.displot(data=return_dist, x='YardsToContact', y='kickReturnYardage')
plt.xlim(-20,100)
plt.ylim(-20,100)
plt.show()

To this end, assuming minimal mistakes by the punt team, the return length will instead be replaced by the yards gained from the landing location to first contact.  Only in rare occasions is there a significant deviation between the yards to first contact and the total return yards.  Large deviations between these two metrics are generally due to either a penalty, missed tackle, or blown coverage.


# 4. Bouncing Statistics

In much the same way a coin flip is considered random, the shape of the football enables the bouncing of the football to be largely considered random.  Nevertheless, there is an element of skill in the random bouncing of the ball.  In particular, punting the ball so that the gunners are able to stop unfavorable bounces can enable many punts near the endzone to be downed inside the five yardline.  Furthermore, the spin assocaited with the ball imparted by the handedness of the kicker can have a significant effect on biasing the bouncing direction and magnitude.

The model assumes that the x and y distances of the bounce are drawn from a bivariate normal distribution, but both means and the elements of the covariance matrix are functions of some features of the punt,
\begin{equation}
X \sim \mathcal{N}(\mu(\text{features}), \Sigma(\text{features})).
\end{equation}
The direction the punt is travelling in the air should already have some forward momentum and should preferentially favor bouncing in that direction.  (Similarly, the spin orientation of the ball will bias the perpendicular direction, but this data is not available.)  It seems natural then to attempt to reduce the covariance of the two directions as much as possible by working in the coordinates given by this punt direction and its perpendicular.

In [None]:
bounce_punts = bounce_catch.query('Type == "Bounce"')
frames_bounce = punt_fbtrack.merge(fb_landing[['gameId', 'playId','landFrame']]).query('frameId >= landFrame').merge(bounce_punts[['gameId','playId']])

bounce_play_data = punt_plays[['gameId','playId','kickLength','penaltyYards','specialTeamsResult']].merge(bounce_punts[['gameId','playId']])

# Frames where there is an easy to find event indicating some interaction with the ball after the bounce
ground_event_list = ['punt_downed','out_of_bounds','touchback','punt_received','punt_muffed','fair_catch','kick_received']
final_event = frames_bounce[frames_bounce['event'].isin(ground_event_list)].drop_duplicates(subset=['gameId','playId'])
remain = frames_bounce.merge(final_event[['gameId','playId']],indicator=True,how='outer').query('_merge == "left_only"').drop(columns=['_merge'])

result_to_event = {'Downed':'punt_downed','Fair Catch':'fair_catch','Muffed':'punt_muffed','Out of Bounds':'out_of_bounds','Touchback':'touchback','Return':'Return'}
fe_merged = final_event.merge(bounce_play_data)
fe_merged['event_check'] = fe_merged['specialTeamsResult'].map(result_to_event)


# (5) Result = Muffed > Event = out_of_bounds ------ See below
# gameId = 2018092000,playId = 801 : event = fumble_defense_recovered
# gameId = 2018093005,playId = 3731 : frame = 88
# gameId = 2018102104,playId = 789 : ?
# gameId = 2018122309,playId = 3195 : ?
# gameId = 2020101112,playId = 1389 : ?
# Drop these events entirely.  Generally the ball does not bounce for long before being muffed

# (1) Result = Downed > Event = fair_catch --------- 'Fake' fair catch -> skip to downed event
# (5) Result = Downed > Event = punt_received ------ 2019: 'Fake' received (3) -> skip to downed event. 2020: Correct values (2)

# All of the rest of the mismatched events are correct.
correct = fe_merged.query('(event_check == event)|(event_check == "Return")')
check_events = fe_merged.query('(event_check != event)&(event_check != "Return")')

dropped_events = check_events.query('(specialTeamsResult == "Muffed")&(event == "out_of_bounds")')
reeval_events = check_events[(check_events['specialTeamsResult']=='Downed')&((check_events['event']=="fair_catch")|((check_events['event']=="punt_received")&(check_events['gameId']//10**6 == 2019)))]

checked = check_events.merge(dropped_events,how='outer',indicator=True).query('_merge == "left_only"').drop(columns=['_merge']) \
    .merge(reeval_events,how='outer',indicator=True).query('_merge == "left_only"').drop(columns=['_merge'])
correct = pd.concat([correct,checked,after_land.merge(reeval_events[['gameId','playId']]).query('event == "punt_downed"')])

end_bounce = correct[['gameId','playId','x','y','frameId']].copy()
fb_bounce_data = end_bounce.merge(fb_landing,on=['gameId','playId'])
fb_bounce_data = flip_play_direction(fb_bounce_data).rename({'x':'x_final', 'y':'y_final'},axis=1)[['gameId','playId','x_land','y_land','x_final','y_final','landFrame','frameId','hangTime','playDirection']]

# Gather data to find the bounce length in each direction
fb_locations = fb_loc.merge(punter[['gameId','playId','x','y']]).merge(fb_bounce_data).rename({'x':'x_punt', 'y':'y_punt'}, axis=1).copy()
fb_locations['dx_air'] = fb_locations['x_land']-fb_locations['x_punt']
fb_locations['dy_air'] = fb_locations['y_land']-fb_locations['y_punt']
fb_locations['dist_air'] = np.sqrt(fb_locations['dx_air']**2 + fb_locations['dy_air']**2).round(2)
fb_locations['v_c'] = calc_const(fb_locations['dist_air'],fb_locations['hangTime'])
fb_locations['v_field_ratio'] = fb_locations['dist_air']/fb_locations['hangTime'] / fb_locations['v_c']

fb_locations['dx_bounce'] = fb_locations['x_final']-fb_locations['x_land']
fb_locations['dy_bounce'] = fb_locations['y_final']-fb_locations['y_land']
fb_locations['dist_bounce'] = np.sqrt(fb_locations['dx_bounce']**2 + fb_locations['dy_bounce']**2).round(2)

# Coordinate rotation from x-y into the direction of the punt and its perpendicular axis
fb_locations['cos_air_bounce'] = (fb_locations['dx_air']*fb_locations['dx_bounce'] + fb_locations['dy_air']*fb_locations['dy_bounce'])/(fb_locations['dist_air'])#*fb_locations['dist_bounce'])
fb_locations['sin_air_bounce'] = (-fb_locations['dy_air']*fb_locations['dx_bounce'] + fb_locations['dx_air']*fb_locations['dy_bounce'])/(fb_locations['dist_air'])#*fb_locations['dist_bounce'])

fb_locations = fb_locations.fillna(0).drop(columns=['x_land','y_land','x_final','y_final','landFrame','frameId','playDirection'])

This model is implemented using a Mixture Density Network.  The network will output the mean and standard deviation in each direction and correlation between them which minimizes the negative log likelihood.  The features for the model are given as:
* Original line of scrimmage
* Y-distance travelled in the air (punter_y - land_y)
* Hang time
* Magnitude of the (3d) velocity of the punt, as found in Section 1 Eq 1
* Ratio of in-field velocity to total velocity

Physically, the final feature differentiates a line-drive type kick, where one would expect a lot of that velocity to carry through the bounce, from a high kick, where there will be less of a bias in one particular direction.

# 5. Optimal Punt Location

Given the models above, the goal is to find the landing coordinates for the football that maximizes the x-position after averaging over the returner's decisions.  The value of each landing position is given by the expectation value for the x-position as:
\begin{align}
\text{Value} &= x_\text{land} - R_\text{len} P(R) + P(NF)E(X|NF)\\
E(X|NF) &= P(\text{Touchback})(90 - x_\text{land}) + (1-P(\text{Touchback}))E(X|\neg\text{Touchback}), \nonumber
\end{align}
where $P(R), P(NF)$ result from the model in Section 2, $R_\text{len}$ is the output from the model in Section 3, and $P(\text{Touchback}), E(X|\neg\text{Touchback})$ are the calculated from the bouncing statistics in Section 4.

In [None]:
ens_df = pd.read_csv('/kaggle/input/exampleoptloc/ExampleOptimalLocation.csv')

bounce_pred = ens_df.iloc[:,-7:-2].values
mu_v = bounce_pred[:,0]
mu_vp = bounce_pred[:,1]

cos = ens_df['cos'].values
sin = ens_df['sin'].values

# Roughly approximate the expected x,y values of the bounces
ens_df['x_exp_bounce'] = mu_v*cos - mu_vp*sin
ens_df['y_exp_bounce'] = mu_v*sin + mu_v*cos

# Additional statistics about the punt and bounce
ens_df['x_to_ez'] = 110 - ens_df['x_land']
ens_df['y_to_oob'] = (80/3-np.abs(80/3 - ens_df['y_land'])) * np.sign(ens_df['y_land']-80/3)
ens_df['y_rat'] = ens_df['y_to_oob']/ens_df['y_exp_bounce']
ens_df['x_exp_notb'] = ens_df['x_exp_bounce']*ens_df['y_rat']
ens_df.loc[(ens_df['y_rat']>1)|(ens_df['y_rat']<0),'x_exp_notb'] = ens_df['x_exp_bounce']
ens_df['p_tb'] = 0
ens_df['ex_ntb'] = (1-ens_df['p_tb'])*ens_df['x_exp_notb']
ens_df['Field_Val'] = ens_df['x_land'] - ens_df['R_len']*ens_df['R_avg'] + ens_df['NF_avg']*(ens_df['p_tb']*(90-ens_df['x_land']) + ens_df['ex_ntb'])
ens_df['has_calcd'] = False



# Bounce model outputs statistics of the bivariate normal distributions
# Functions to be used in calculating the probabilities and expected values

# Standardize the random variables in the exponential of distributions
def standardize(x,mu,sigma):
    return (x-mu)/sigma

# Given a set of distribution parameters, return the bivariate normal distribution function
def get_bivnorm(mx,my, sx,sy, r):
    def biv_norm(x,y):
        x_stand = standardize(x,mx,sx)
        y_stand = standardize(y,my,sy)
        return 1/(2*np.pi*sx*sy*np.sqrt(1-r**2)) * np.exp(-1/(2*(1-r**2))*(x_stand**2+y_stand**2+2*r*x_stand*y_stand))
    return biv_norm

# Rotate on-field coordinates to punt-direction coordinates before acquiring PDF
def get_rot_pdf(mx,my, sx,sy, r, ctheta,stheta):
    pdf = get_bivnorm(mx,my, sx,sy, r)
    def rotated_biv_norm(x,y):
        x_rot = x*ctheta + y*stheta
        y_rot = y*ctheta - x*stheta
        return pdf(x_rot,y_rot)
    return rotated_biv_norm

# Calculate the expected value of x-field coordinate
def get_rot_ExpX(mx,my, sx,sy, r, ctheta,stheta):
    pdf = get_bivnorm(mx,my, sx,sy, r)
    def rotated_expX(x,y):
        x_rot = x*ctheta + y*stheta
        y_rot = y*ctheta - x*stheta
        return x * pdf(x_rot,y_rot)
    return rotated_expX

# Calculate expected value of x-field coordinate when the ball bounces out of bound
# Assuming a straight path from the bounce location to the final location, the x-value is truncated when the ball leaves play
def get_rot_truncatedExpX(mx,my, sx,sy, r, ctheta,stheta, y_thres):
    pdf = get_bivnorm(mx,my, sx,sy, r)
    def rotated_truncexpX(x,y):
        x_rot = x*ctheta + y*stheta
        y_rot = y*ctheta - x*stheta
        return x*y_thres/y * pdf(x_rot,y_rot)
    return rotated_truncexpX

# Returns y-field integration bounds depending on whether the punt is closer to the y = 0 or y = 160/3 sidelines
def y_oob(y_togo):
    ranges = [[-np.inf,y_togo],[y_togo,np.inf]]
    if y_togo>0:
        return ranges
    return [ranges[1],ranges[0]]

# Return x-field integration bounds, for integrating both touchbacks and out of bounds
def get_XBounds(x_ez,y_oob):
    slope = x_ez/y_oob
    def tb_xbound(y):
        x_minbound = slope*(y-y_oob)+x_ez
        return [x_minbound, np.inf]

    def oob_xbound(y):
        x_upbound = slope*(y-y_oob)+x_ez
        return [x_ez, x_upbound]
    return [tb_xbound,oob_xbound]



# For each punt, loop over highest field value locations and perform the full integration of expected values
# If the Field Value remains maximal, the location is the true maximum field value
# This works since we over-approximating the Field Value with earlier approximation
# When the full evaluation remains maximal, or we arrive at a point which we have already calculated, we are done

highest_newVal = 0
converge = False
while not converge:
    best_id = ens_df.sort_values('Field_Val',ascending=False).head(1).index[0]

    xl = ens_df.loc[best_id,'x_land']
    yl = ens_df.loc[best_id,'y_land']

    x_to_ez = ens_df.loc[best_id,'x_to_ez']
    y_to_oob = ens_df.loc[best_id, 'y_to_oob']

    # Only evaluate the integral if we have not explicitly integrated this point already
    if not ens_df.loc[best_id, 'has_calcd']:
        y_bounds = y_oob(y_to_oob)
        x_bound_tb, x_bound_oob = get_XBounds(x_to_ez, y_to_oob)

        # Get relevant functionals
        rotated_PDF = get_rot_pdf(*bounce_pred[best_id], cos[best_id], sin[best_id])
        rotated_ExpX = get_rot_ExpX(*bounce_pred[best_id], cos[best_id], sin[best_id])
        rot_trunc_ExpX = get_rot_truncatedExpX(*bounce_pred[best_id], cos[best_id], sin[best_id],y_to_oob)

        # Evaluate the integrals
        # Note, it is faster to split the conditional integration bounds due to the decrease in function calls
        ex_ip = nquad(rotated_ExpX, [[-110,x_to_ez],y_bounds[0]])[0]
        p_tb_new = nquad(rotated_PDF, [[x_to_ez,np.inf],y_bounds[0]])[0] + nquad(rotated_PDF, [x_bound_tb,y_bounds[1]])[0]
        ex_oob = nquad(rot_trunc_ExpX,[[-110,x_to_ez],y_bounds[1]])[0] + nquad(rot_trunc_ExpX,[x_bound_oob,y_bounds[1]])[0]

        # When evaluating a point near the sideline, can update touchback probabilities in the middle of the field
        # Unless already explicitly evaluated those locations
        # The logic is identical to the previous step
        # Note, this is only a useful speed-up when punting near the endzone
        interior_locations = ens_df.loc[
            (ens_df['x_land']==xl) \
            & (~ens_df['has_calcd']) \
            & (ens_df['y_land'].between(*sorted([80/3, ens_df.loc[best_id, 'y_land']]),inclusive='both'))].index.values

        # Update the integrated values for locations closer to the middle of the field than the current location
        ens_df.loc[interior_locations, 'p_tb'] = p_tb_new

        # E(X|no TB) * P(no TB) unsure whether it should generally increase as we move towards from the middle of the field
        # Therefore, do not update this value except when exactly calculating it
        ens_df.loc[best_id, 'ex_ntb'] = (ex_ip + ex_oob)

        # Update expected value of field position for any interior point
        ens_df.loc[interior_locations,'Field_Val'] = ens_df.loc[interior_locations,'x_land'] \
                - ens_df.loc[interior_locations,'R_avg']*ens_df.loc[interior_locations,'R_len'] \
                + ens_df.loc[interior_locations,'NF_avg']*( \
                            ens_df.loc[interior_locations,'p_tb']*(90-ens_df.loc[interior_locations,'x_land']) \
                            + ens_df.loc[interior_locations,'ex_ntb'])

    ens_df.loc[best_id, 'has_calcd'] = True
    if best_id == ens_df.sort_values('Field_Val',ascending=False).head(1).index[0]:
        converge = True
        
# This code provides the algorithm needed to calculate the optimal field location.
# Here, it is applied to only one punt, but the same algorithm can be used to generate all of the optimal landing locations for league-average punts.
# For the sake of time, we have included these a data set for all the 2020 punts, which we will use below

optloc = pd.read_csv('/kaggle/input/fieldvalue2020/PunterOptLoc2020.csv')
optloc['FVpastLoS'] = optloc['Field_Val'] - optloc['x_fb']

fig, ax = plt.subplots(2,1,figsize=(10,8),sharex=True)
optloc.plot.scatter(x='x_fb',y='FVpastLoS', ax=ax[0])
optloc.plot.scatter(x='x_fb',y='p_tb', ax=ax[1])
plt.xlabel('Line of Scrimmage (Absolute Yardline)')
ax[0].set_ylabel('Average Yards Gained')
ax[1].set_ylabel('Touchback Probability')
plt.xlim(10,80)
ax[0].set_ylim(25,55)
ax[1].set_ylim(0,0.35)
plt.show()

Using the 2020 punt plays, the optimal landing location achievable by a league-average punter is calculated.  The average yards gained by such a punt remains relatively constant in the middle of the field, although there is a larger variance in these punts due to return teams setting up for returns.  Once the line of scrimmage is past the punt team's own 45 yardline (i.e. 55 absolute yardline), the punts are at risk of bouncing into the endzone as returners tend to let the ball bounce in hopes of a touchback.  However, the touchback probability tends to decrease during this region as the punt can be kicked higher allowing the gunners to stop the ball from entering the endzone.  The optimal strategy adds to this effect by placing the ball near the sidelines to facilitate the ball bouncing out of bounds before bouncing into the endzone.

# 6. Comparison of Punters in 2020

Each individual punt with the punt statistics can be evaluated using the models above to find the average expected value of the punt.  In principle, the difference between the average expected value and the actual value can be used to relative values to the gunners, jammers, or returners.  Instead, comparing the average expected value of the actual punt to the optimal location for the league-average punter (labeled ΔFV) can be used to score the performance of each punter's decision making and control of the football.

In [None]:
players = pd.read_csv(os.path.join(dir_name,'players.csv'))
id_to_name_dict = players.set_index('nflId')['displayName'].to_dict()

real_val = pd.read_csv('/kaggle/input/fieldvalue2020/RealLocationValues.csv')
real_val['Field_Val'] = real_val['x_land'] - real_val['R_avg']*real_val['R_len'] + real_val['NF_avg']*(real_val['p_tb']*(90-real_val['x_land'])+real_val['ex_ntb'])

# Balls kicked directly out of bounds or into the endzone have their values set
real_val.loc[(real_val['y_land']<0)|(real_val['y_land']>160/3), 'Field_Val'] = real_val.loc[(real_val['y_land']<0)|(real_val['y_land']>160/3), 'x_land']
real_val.loc[real_val['x_land']>110,'Field_Val'] = 90

compare_rv_opt = real_val.merge(optloc[['gameId','playId','x_land','y_land','p_tb','Field_Val','kickerId']], on=['gameId','playId'], suffixes=['', '_opt'])
compare_rv_opt['dFV'] = compare_rv_opt['Field_Val'] - compare_rv_opt['Field_Val_opt']

num_punts = compare_rv_opt.groupby('kickerId').count()['gameId']
total_diff = compare_rv_opt.groupby('kickerId')['dFV'].sum()

punter_stats = pd.concat([total_diff.div(num_punts),num_punts],axis=1).reset_index().rename({0:'ΔFV Total', 'gameId':'num_punts'},axis=1)

punter_vel = real_val.merge(punter[['gameId','playId','x','y']]).merge(all_pos[['gameId','playId','hangTime']]).copy()
punter_vel['dist'] = np.sqrt((punter_vel['x']-punter_vel['x_land'])**2 + (punter_vel['y']-punter_vel['y_land'])**2)
punter_vel['v_punt'] = calc_const(punter_vel['dist'], punter_vel['hangTime'])
punter_stats = punter_stats.merge(punter_vel[(punter_vel['y_land']>0)&(punter_vel['y_land']<160/3)&(punter_vel['x_land']<110)].merge(optloc[['gameId','playId','kickerId']]).groupby('kickerId')['v_punt'].mean().round(2).reset_index())
punter_stats['ΔVel'] = (punter_stats['v_punt'] - vc_avg).round(2)

midfield_punts = compare_rv_opt[compare_rv_opt['x_fb']<50].copy()
punter_stats = punter_stats.merge(midfield_punts.groupby('kickerId')['dFV'].sum().div(midfield_punts.groupby('kickerId').count()['gameId']).reset_index().rename({0:'ΔFV Mid Field'},axis=1))

pinning_punts = compare_rv_opt[compare_rv_opt['x_fb']>=50].copy()
punter_stats = punter_stats.merge(pinning_punts.groupby('kickerId')['dFV'].sum().div(pinning_punts.groupby('kickerId').count()['gameId']).reset_index().rename({0:'ΔFV Deep'},axis=1))

punter_stats['Name'] = punter_stats['kickerId'].map(id_to_name_dict)
punter_stats = punter_stats.sort_values('ΔFV Total',ascending=False).set_index('Name')
punter_stats = punter_stats[['kickerId', 'ΔFV Total', 'ΔFV Mid Field', 'ΔFV Deep', 'ΔVel', 'v_punt','num_punts']]

top_perf = punter_stats.iloc[:,1:5].style.background_gradient(cmap='coolwarm', subset=pd.IndexSlice[:, ['ΔFV Total', 'ΔFV Mid Field', 'ΔFV Deep']])\
     .background_gradient(cmap='coolwarm', subset=pd.IndexSlice[:, 'ΔVel'], vmin=-1.5, vmax=1.5)

display(top_perf)

For the 2020 data, the average difference in expected field position when compared to the optimal location for an average punter is calculated in three ranged: When all of the punts are included (ΔFV Total), punts where the original line of scrimmage is before the punt team's 40 yardline (ΔFV Mid Field), and when the line of scrimmage is past the punt team's 40 yardline (ΔFV Deep).  A better performance indicates a larger value (positive values, or least negative values).  The values of ΔFV Deep are generally much lower since optimal punts have relatively low probabilities of a touchback.  However being on the low side of this metric indicates much higher than expected rates for touchbacks.

### Appendix

Full code and instructions to reproduce the models above can be found at: github.com/rxsims/NFLBigDataBowl2022

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session