# NFL Punt Analytics Submission

#### Punt plays, in particular, punt returns, can be exciting plays to watch. However, due to their chaotic nature, a large number of violent collisions between players can occur, leading to serious concussions. The goal of this notebook is to provide evidence to support a rule change for punt plays that will limit concussions.


### Taking Inspiration From Recent Kickoff Rule Changes

Kickoff plays are similar to punt plays in that many potentially violent collisions occur. In recent years, the kickoff rules have changed in order to decrease the occurence of concussions. Three notable changes have been moving the kickoff from the 30 yardline to the 35 yardline, not allowing players from the kicking team to have a running start, and moving touchbacks from the 20 yardline to the 25 yardline. The first change increased the likelihood of touchbacks thereby reducing the likelihood of a kickoff return at all. Also, without a running start, it'd be harder for players from the kicking team to achieve top speeds, lessening the chance of violent collisions. Lastly, moving touchbacks forward made kickoff returns even less likely as returners often choose to return kicks if they believe they can make it past the touchback yardline. Overall, these rules changes have led to a decrease in concussions on kickoff plays while still keeping the integrity of the game as kickoff return itself has not been completely eliminated.

Therefore, in coming up with a proposed rule change for punt plays, I decided to follow the following principles which will be supported by the data below

1. Incentivizing the punt returner to fair catch the ball will decrease concussions.
1. Decreasing the amount of punt returns by increasing touchbacks and punting the ball out of bounds will decrease concussions.


For my proposed rule change, I decided to propose only a slight change for punt plays that begin at or behind the posession team's 35 yardline. Such plays mimic in some ways a kickoff as kickoffs begin at the 35 yardline as well. In addition, the most common strategy of the possession team is for the punter to kick the ball deep into the opponent's territory. Ideally, the ball would be downed by the punting team, the ball would go out of bounds very close to the opposing team's endzone, or there would be a touchback. A return of a very deep punt is not always desired as the possession team's gunners may not reach the punt returner in time to make a quick tackle possibly leading to a long punt return. As these punts are often long, players can reach top speeds leading to violent blocks and tackle attempts. I aim to propose a rule change that would incentivize the punt returner to not return the punt.


# Proposed Rule

## Fair Catch Advancement For Deep Punt Plays: 

### *If the punt play occurs at or behind the possession team's 35 yard line and is fair caught by the opposing team on their side of the field, the opposing team's ensuing possesion begins 5 yards past the yardline the ball is fair caught or the 50 yardline, whichever yardline is farther from endzone of the receiving team.*

In [None]:
# import packages
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [None]:
# import data
df = pd.read_csv('../input/play_information.csv')
df1 = pd.read_csv('../input/video_review.csv')
df=df.merge(df1, on=['Season_Year','GameKey', 'PlayID'], how='left')

## Data Pre-processing Step

We will use regex to extract import info from the Play Description column such as if the ball was fair caught, how many yards the punt is returned, etc.

In [None]:
df['concussion'] = df['Turnover_Related'].apply(lambda x: x=='No')

In [None]:
df['PlayDescription'] = df['PlayDescription'].astype(str)

# punt length 
import re 
punt_length = []
for row in df['PlayDescription']:
    match = re.search('punts (\d+)', row)
    if match:
        punt_length.append(match.group(1))
    elif match is None:
        punt_length.append(0)
        
# return length
# to allow for negative or zero return yards , if the ball is not returned we set a default value of -100
return_length = []
for row in df['PlayDescription']:
    match = re.search('for ([-\d]+)', row)
    if match:
        return_length.append(match.group(1))
    elif match is None:
        return_length.append(-100)
        
# fair catch
fair_catch = []
for row in df['PlayDescription']:
    match = re.search('fair catch', row)
    if match:
        fair_catch.append(1)
    elif match is None:
        fair_catch.append(0)

# injury
injury = []
for row in df['PlayDescription']:
    match = re.search('injured', row)
    if match:
        injury.append(1)
    elif match is None:
            injury.append(0)
            
# penalty         
penalty = []
for row in df['PlayDescription']:
    if 'Penalty' in row.split():
        penalty.append(1)
    elif 'PENALTY' in row.split():
        penalty.append(1)
    elif 'Penalty' not in row.split():
        penalty.append(0)
    elif 'PENALTY' not in row.split():
        penalty.append(0)
        

# downed
downed = []
for row in df['PlayDescription']:
    match = re.search('downed', row)
    if match:
        downed.append(1)
    elif match is None:
        downed.append(0)
        
# fumble
fumble = []
for row in df['PlayDescription']:
    match = re.search('FUMBLES', row)
    if match:
        fumble.append(1)
    elif match is None:
        fumble.append(0)

# muff
muff = []
for row in df['PlayDescription']:
    match = re.search('MUFFS', row)
    if match:
        muff.append(1)
    elif match is None:
        muff.append(0)
        
# Touchback
touchback = []
for row in df['PlayDescription']:
    match = re.search('Touchback', row)
    if match:
        touchback.append(1)
    elif match is None:
        touchback.append(0)

# Touchdown
touchdown = []
for row in df['PlayDescription']:
    match = re.search('TOUCHDOWN', row)
    if match:
        touchdown.append(1)
    elif match is None:
        touchdown.append(0)
        
df["punt_length"] = punt_length
df["return_length"] = return_length
df["fair_catch"] = fair_catch
df["injury"] = injury
df["penalty"] = penalty
df["downed"] = downed
df["fumble"] = fumble
df['muff'] = muff
df['touchback'] = touchback
df['touchdown'] = touchdown


In [None]:
df[["punt_length", "return_length"]] = df[["punt_length", "return_length"]].apply(pd.to_numeric)

In [None]:
# check if punt begins on possession team's side of field and what yardline the play starts
df['Side_of_Field'] = df['YardLine'].apply(lambda x: re.sub(r'[0-9]+', '', x))
df['Side_of_Field'] = df['Side_of_Field'].apply(lambda x: x.strip())
df['Own_Side']= (df['Side_of_Field']==df['Poss_Team'])
df['start_yardline'] = df['YardLine'].apply(lambda x: [int(s) for s in x.split() if s.isdigit()][0])

In [None]:
df['deep'] = df['start_yardline']+df['punt_length']

In [None]:
# check if punt goes out of bounds
df['OOB'] = df['PlayDescription'].str.contains('out of bounds', case=False)

# Analysis Begins Here

### Overall, punt plays have a .55% chance of concussion

In [None]:
punt_risk = df[df['concussion']].shape[0]/float(df.shape[0])

print("Punt plays have a "+ str(round(punt_risk*100,2))+" percent chance of concussion")

### Hypothesis:

#### Punts behind the possession team's 35 yardline carry a higher risk of concussion as such plays ressemble kickoffs and returns are more likely

## "Deep" Punt definition

### *We define punts that occur at or behind the possession team's 35 yardline as "deep" punts.*

### Teams that punt deep in their own territory have punts go deep into their opponent's territory

In [None]:
df[df['Own_Side']][df['start_yardline']<=35][df['punt_length']>0]['deep'].hist(bins=16, grid=False, figsize=(8,8))
plt.title("Distribution of Starting Field Position of Non-Punting Team After a Deep Punt")
plt.xlabel("Yards Past Punting Team\'s endzone")

In [None]:
df[df['Own_Side']][df['start_yardline']<=35][df['punt_length']>0]['start_yardline'].hist(grid=False, figsize=(6,6))
plt.title('Distribution of Yardline for Deep Punt Plays')
plt.xlabel("Yards from Punting Team\'s Endzone")

In [None]:
punt_length_mean = df[df['Own_Side']][df['start_yardline']<=35][df['punt_length']>0]['punt_length'].mean()
yard_line_mean = df[df['Own_Side']][df['start_yardline']<=35][df['punt_length']>0]['start_yardline'].mean()

print("Average punt length on deep punts: "+ str(round(punt_length_mean))+ ' yards')
print("Average starting yardline on deep punt plays: "+ str(round(yard_line_mean))+' yardline')

### Punt plays where a punt return does not occur have a very low chance of concussion

In [None]:
non_return_risk = df[(df['fair_catch']==True) | (df['OOB']==True) | (df['touchback']==True)][df['concussion']].shape[0]/float(df[(df['fair_catch']==True) | (df['OOB']==True) | (df['touchback']==True)].shape[0])

print("Punt plays where the punt is not returned (i.e a touchback, fair catch, or out of bounds occurs) have a "+str(round(non_return_risk*100,2))+ " percent risk of concussion")

In [None]:
return_risk = df[df['return_length']>-90][df['concussion']].shape[0]/float(df[df['return_length']>-90].shape[0])

print("Punt plays where the punt is returned have a "+str(round(return_risk*100,2))+" percent risk of concussion")

### Overall, deep punt plays are riskier than other punt plays

We believe deep punts are riskier as the likelihood of the punt being returned is greater and the punts are longer, allowing players to collide at high speeds.

In [None]:
deep_punt_risk = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0)][df['concussion']].shape[0]/float(df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0)].shape[0])

deep_punt_risk = round(deep_punt_risk*100,4)

total_deep_punt_returns = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90)].shape[0]
deep_punt_return_risk = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90)][df['concussion']].shape[0]*100/float(total_deep_punt_returns)



print("Deep punt plays have a "+str(deep_punt_risk)+" percent chance of concussion")
print("Deep punt plays where the punt is returned have a "+str(round(deep_punt_return_risk,2))+" percent chance of concussion")

In [None]:
deep_punt_length= df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0)]['punt_length'].mean()
non_deep_length = df[(df['Own_Side']==True & (df['start_yardline']>35)) | (df['Own_Side']==False)][df['punt_length']>0]['punt_length'].mean()

print("Average punt length for deep punts: "+str(round(deep_punt_length,2))+ " yards")
print("Average punt length for non-deep punts: "+str(round(non_deep_length,2))+ " yards")

In [None]:
deep_punt_return= df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) &(df['return_length']>-90)].shape[0]/float(df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0)].shape[0])
non_deep_return = df[(df['Own_Side']==True & (df['start_yardline']>35)) | (df['Own_Side']==False)][df['punt_length']>0][df['return_length']>-90].shape[0]/float(df[(df['Own_Side']==True & (df['start_yardline']>35)) | (df['Own_Side']==False)][df['punt_length']>0].shape[0])

print("Deep punts are returned "+str(round(deep_punt_return*100,2))+" percent of the time")
print("Non-deep punts are returned "+ str(round(non_deep_return*100,2))+" percent of the time")


### Non-deep punts are more than 3 times as safe, from a concussion standpoint, as deep punts

In [None]:
non_deep_risk = df[(df['Own_Side']==True & (df['start_yardline']>35)) | (df['Own_Side']==False)][df['punt_length']>0][df['concussion']].shape[0]/float(df[(df['Own_Side']==True & (df['start_yardline']>35)) | (df['Own_Side']==False)][df['punt_length']>0].shape[0])

non_deep_risk = round(non_deep_risk*100,4)

print("Non-deep punt plays have a "+str(non_deep_risk)+" percent chance of concussion")

### Takeaway: Reducing the amount of punt returns on deep punts will reduce concussions.

# Estimating the impact of the proposed rule change

As the strategy of both teams may potentially change, it is difficult to estimate the impact of the proposed rule change. However, we can estimate the amount of returns that will become fair catches as follows. We estimate that the amount of returns in our data that will now become fair catches is between the number of deep punt returns that go for little (<1 yard) or negative yardage and deep punt returns that are returned for no more than 5 yards. With this proposed rule, punt returners should only field a punt if they believe they can gain 5 or more yards.

In [None]:
super_short_returns = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90) & (df['return_length']<=1)].shape[0]
short_returns = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90) & (df['return_length']<=5)].shape[0]


print("Number of deep punts returned: "+ str(total_deep_punt_returns))
print("Number of deep punts returned for very short ( <=1 yard) or negative yardage: "+ str(super_short_returns))
print("Number of deep punts returned for short (<= 5 yards) yardage: "+ str(short_returns))

## With the proposed rule change, we estimate the overall number of concussions occuring on punt plays to be reduced by 5% to 19%

In order to arrive at these estimates, we assume that the risk of concussions on punt returns for deep punt plays to remain the same, 1.1 percent, as calculated above. The proposed rule will decrease the amount of punt returns overall, by increasing the number of fair catches.

In [None]:
low_est = deep_punt_return_risk*(total_deep_punt_returns-super_short_returns)/100
high_est = deep_punt_return_risk*(total_deep_punt_returns-short_returns)/100

print("Total number of concussions occuring on punt plays: "+str(df[df['concussion']].shape[0])+" concussions")
print("Number of concussions from deep punt plays where the punt is returned: 22 concussions")
print("Estimated number of concussions from deep punt returns with proposed rule change: "+str(int(round(high_est)))+"-"+str(int(round(low_est)))+ " concussions")

print("Estimated percent reduction in concussions on deep punt returns: "+str(round(2*100./22,2))+" to "+str(round(7*100./22,2))+" percent")
print("Estimated overall percent reduction in concussions on all punt plays: "+str(round(2*100./37,2))+" to "+str(round(7*100./37,2))+" percent")

### More Favorable Field Position for the Receiving Team is Possible

More favorable field position for the receiving team is possible on approximately 30 percent of deep punts

In [None]:
short_punt_rate = df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90) & (df['return_length']<=5)].shape[0]/float(df[(df['Own_Side']==True) & (df['start_yardline']<=35) & (df['punt_length']>0) & (df['deep']>50) & (df['return_length']>-90)].shape[0])

print("Percentage of deep punts returned no more than 5 yards: "+str(round(short_punt_rate*100,2) )+ "%")

## Extra insight: Expanding the Definition of Blindslide Block is Difficult

While many concussions seem to occur from blindslide-type blocks, expanding the definition of such blocks would negatively affect the viewing experience of the game. Already, penalties occur at a much higher rate on punt plays versus regular plays. Thus, adding an additional penalty for certain types of blocks would make this rate even higher.

* stats for penalties taken from http://www.nflpenalties.com/  *

In [None]:
penalty_rate = df[df['penalty']==1].shape[0]/float(df.shape[0])

print("A penalty occurs on "+str(round(penalty_rate*100))+" percent of all punt plays")
print("A penalty occurs on "+str(round(3484*100./45840))+" percent of all plays (punt and non-punt)")