In [None]:
from IPython.display import HTML
style = """
<style>
    .header1 { font-family:'Arial';font-size:30px; color:Black; font-weight:800;}
    .header2 { 
        font-family:'Arial';
        font-size:18px; 
        color:Black; 
        font-weight:600;
        border-bottom: 1px solid; 
        margin-bottom: 8px;
        margin-top: 8px;
        width: 100%;
        
    }
    .header3 { font-family:'Arial';font-size:16px; color:Black; font-weight:600;}
    .para { font-family:'Arial';font-size:14px; color:Black;}
    .flex-columns {
        display: flex;
        flex-direction: row;
        flex-wrap: wrap;
    }
    .flex-container {
         padding: 20px;
    }
    
    .flex-container-large {
         padding: 20px;
         max-width: 40%;
    }
    
    .flex-container-small {
         padding: 20px;
         max-width: 17.5%;
    }
    
    .list-items {
        margin: 10px;
    }
    
    .list-items li {
        color: #3692CC;
        font-weight: 500;
    }
</style>
"""
HTML(style)

# Introduction
There are many ways to make the punt play safer.  Concussions, fortunately, are rare events.  On over 6000 punts, there were 37 concussions.  With this type of data sparsity, it would be easy to see signals that are not there.  To address that problem, I took the following approach.

* Clean and label the data -- This kernal analyzes concussion data from 2016 and 2017.  There were rule changes before the 2017 and 2018 season. I identify the concussions covered by the current rules and remove them from the analysis. This cleaning required detailed video analysis and review of the NFL rules. I used the NGS data to produce play diagrams and impact charts to help identify exactly where, when, and who were involved in the concussion. During this cleaning, I found and fixed multiple problems.  After this process, I had a root cause determined for each concussion.

* Brainstorm a large list of possible rules -- Thinking about what rules are possible first allowed me to apply these possible rules to each of the videos and determined if a particular rule would prevent a particular concussions.  For instance if the problem was a blocking technique, there are multiple possible ways to prevent that technique.

* Visualizations -- Bar charts make differences between small numbers seem more significant.   A difference between 2 and 4 concussions out of 6000 punts is completely insignificant but a bar char would make that 4 look twice as important as the 2.  For that reason, I have used tables when summarizing concussions.  When I used graphs, I focused on visualizations that represented all of the data instead of a summary.

### Audience
There is a variety of people who hopefully will be reading this kernal. I tried to put enough data analysis to keep data scientists happy and enough domain knowledge for everyone else.  For those that want to skip the analytics, I have included a summary that highlights the key takeaways from each section


### Rule Types
A brief overview of rules that will be evaluated. This overview is deliberately brief. After analyzing the data, I will focus on a few rules. At that point,  more details will be provided.

**Reduce Punts**
* CFL Rouge/Single - https://en.wikipedia.org/wiki/Single_(football)
* Fair Catch Bonus Yards - A fair catch is treated as a 5 or 10 yard return
* Encourage teams to keep kicks out of play
    * Out of Bounds Bonus Yards - Converse of previous rule. Out of bound punts are rewarded
    * Reduce touchback distance
* No Preseason Punts
* Encourage going for it on 4th down - On missed FG's, keep ball at line of scrimmage


**Make Punts Safer**
* Formation (borrow from Kickoff Rules)
    * Limit receiving team players outside the numbers
    * Require same number of players on each side of the ball
* Tackling
    * Stricter targeting rules
* Blocking
    * No Wedge Blocks
    * Expand Blindside block rules
* Returner Protection
    * Play calls dead early
        * On muffed punts
        * On fair catch
    * CFL buffer rule
    * Fair Catch even when ball bounces

**Eliminate Punts**
* Automatic Punt - Instead of punting, move the ball as if there was a 40 yard net punt.
* Half Punt - If team fails on 4th down, treat it like a 20 yard net punt.

### Integrity of the game
Rules can address three phases of the play:
1. Before
    * Formation rules, illegal player downfield
    * Attempt to prevent bad situations from happening
    * Impacts all plays to prevent bad plays
2. During
    * Blindside blocks, targeting, landing on the QB, CFL 5 yard buffer
    * Target very specific behavior
    * Biggest impact on both concussions and integrity of game.
3. End
    * Touchbacks, spot of the ball after missed field goal, CFL rouge
    * Encourages players to avoid bad situations
    * Selectively applied if player chooses it
    
    All other things being equal, a rule that is applied at the end of the play changes the game the least.





# Load Data and Basic EDA

In [None]:
import pandas as pd
import glob
from plotly import offline
import plotly.graph_objs as go
import os
import numpy as np
pd.set_option('max.columns', None)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
offline.init_notebook_mode()
config = dict(showLink=False)
from IPython.display import IFrame
from IPython.display import display
from IPython.display import Image

#Load provided files
play_information = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_information.csv')
video_review = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_review.csv')
video_footage_injury = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_footage-injury.csv')
video_footage_control = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_footage-control.csv')
game_data = pd.read_csv('../input/NFL-Punt-Analytics-Competition/game_data.csv')
player_jersey = pd.read_csv('../input/NFL-Punt-Analytics-Competition/player_punt_data.csv')
player_role = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_player_role_data.csv')

### Parse information from play_information.csv
Play_information.csv has information on all punts for 2016-2017.

The play description column on play information has a wealth of information but required some basic text extractions.  The play description had the result of the punt, punt distance, return distance, 

The punt location was recorded in the standard team yardline format (ATL 48 or SF 35).  For analysis, I converted these into yards from TD.

For yard metrics, I used the qcut feature that divided the total data into 5 equal groups - 0-20th percentile, 20-40th percentile, etc.  I combined that overall data with the concussion data to help see if concussions were more frequent in certain groups.  For instance, were there more concussions on the longest punts?  The benefit of this technique is that for every variable I get a simple 5 row table. At a quick glance, I can see if the rows are roughly equal or if one row has most of the concussions.  For instance if return yards were correlated with concussions, I would expect the 80th-100th percentile group to have the most concussions. 

In [None]:
# replace ". " with "," to make the descriptions easier to separate
play_information.PlayDescription=play_information.PlayDescription.str.replace('. ',',',regex=False)

#Look for key words.  0 = not found, 1 = found
play_information['Center']=np.where(play_information.PlayDescription.str.find('Center')==-1,0,1)
play_information['Penalty']=np.where(play_information.PlayDescription.str.find('PENALTY')==-1,0,1)
play_information['Downed']=np.where(play_information.PlayDescription.str.find('downed')==-1,0,1)
play_information['Touchback']=np.where(play_information.PlayDescription.str.find('Touchback')==-1,0,1)
play_information['OutofBounds']=np.where(play_information.PlayDescription.str.find('out of bounds')==-1,0,1)
play_information['FairCatch']=np.where(play_information.PlayDescription.str.find('fair catch')==-1,0,1)
play_information['TD']=np.where(play_information.PlayDescription.str.find('TOUCHDOWN')==-1,0,1)
play_information['Injury']=np.where(play_information.PlayDescription.str.find('njur')==-1,0,1)

#Divide the play description into 4 parts -- Punter, Center, Returner, Extra 
play_information[['Punter','Center','Returner','Extra']]=play_information.PlayDescription.str.split(',', expand=True, n=3)
play_information.PlayDescription=play_information.PlayDescription.str.replace('(Punt formation)','',regex=False)
play_information['ReturnYards'] = play_information.Returner.str.extract(r'for.([\w]+)')
play_information['PuntYards'] = play_information.Punter.str.extract(r'punts.([\w]+)')

#Clean up errors
play_information['PuntYards'] = pd.to_numeric(play_information['PuntYards'], errors='coerce')
play_information['ReturnYards'] = pd.to_numeric(play_information['ReturnYards'].str.replace('no','0',regex=False), errors='coerce')
play_information['PuntYards_qcut'] = pd.qcut(play_information['PuntYards'],5)
play_information['ReturnYards_qcut'] = pd.qcut(play_information['ReturnYards'],5)

#Calculate how close the punting team was to a TD when they punted.
#If the ball is on the Possession team's side, this will calculate the distance to the endzone
#If the ball is on the receiver team's side, this will return 1.
play_information['YardsforTD_possession']=100-pd.to_numeric(play_information.apply(lambda x : x['YardLine'].replace((x['Poss_Team']+' '),''),1), errors='coerce')
play_information['YardsforTD_receive']=play_information.YardLine.str.extract(r'([\d]+)')
play_information.YardsforTD_receive = pd.to_numeric(play_information.YardsforTD_receive)
#Find the larger of the two to get the real value.  Probably could have done this in one line, but 
#I think it's clearer this way
play_information['YardsforTD']= play_information[['YardsforTD_possession','YardsforTD_receive']].max(axis=1)
play_information['YardsforTD_qcut'] = pd.qcut(play_information['YardsforTD'],5)
#Calculate netyards by starting with punt yards and subtracting either return or touchback
play_information['NetYards'] = play_information.PuntYards
play_information['NetYards'] = play_information['NetYards'] - play_information.Touchback.apply(lambda x: 20 if x > 0 else 0)
play_information['NetYards'] =  play_information['NetYards'] - play_information.ReturnYards.apply(lambda x: 0 if pd.isna(x) else x)
play_information['NetYards_qcut'] = pd.qcut(play_information['NetYards'],5)
#Make the score usable
play_information[['Score_Home','Score_Visiting']] = play_information.Score_Home_Visiting.str.split('-',expand=True,n=2)
play_information['Score_Diff'] = abs(pd.to_numeric(play_information.Score_Home) - pd.to_numeric(play_information.Score_Visiting))
play_information['Score_Diff_qcut'] = pd.qcut(play_information['Score_Diff'],4)

#Determine receiving team
play_information['Rec_Team'] =play_information.apply(lambda x : x['Home_Team_Visit_Team'].replace(x['Poss_Team'],''),1)
play_information['Rec_Team'] = play_information.apply(lambda x : x['Rec_Team'].replace('-',''),1)

#Delete intermediate variables
play_information.drop(['Punter','Center','Returner','Extra','YardsforTD_possession','YardsforTD_receive'], axis=1, inplace=True)

#play_information.head()

## Combine Video footage and video review.  
I am using video injury and video control to get the URL of the plays.
The details of the play I will get from play_information.

In [None]:
#Injury Data --  Need only URL column.  All other columns are in play_information
video_footage_injury = video_footage_injury[['season','gamekey','playid','PREVIEW LINK (5000K)']]
video_footage_injury.columns = ['Season','GameKey','PlayID','URL']

#Control data -- Need only URL column.  All other columns are in play_information
video_footage_control = video_footage_control[['season','gamekey','playid','Preview Link']]
video_footage_control.columns = ['Season','GameKey','PlayID','URL']

#for injury data, add in video review columns
video = pd.merge(video_footage_injury,video_review, on=['GameKey','PlayID'])
#video.head()

### Format Data to Analyze Formations

I created a file to summarize the position data.  There are 8 different types of defensive lineman, but they are all essentially the same position.  By looking at them at a group, I can better detect patterns.

Note: There is a nice symmetry between the roles.  Each role has a counterpart on the other team


Coverage Members

| Role     | Description                                                                |
|----------|----------------------------------------------------------------------------|
|Gunner    | Lines up outside and tries to reach punt returner before the ball arrives|
|OLine     | Offensive Line.  5 members must be on the line of scrimmage                  |
|Wing      | Line up on the sides of the Oline                                          |
|Backfield | Off the line of scrimmage                                                  |
|Punter    | Lines up 15 yards behind line of scrimmage                                 |

Returning Members

| Role       | Description                                                                        |
|------------|:-----------------------------------------------------------------------------------|
|Jammer      | Blocks Gunner                                                                      |
|DLine       | Defensive Line. Line up on line of scrimmage                                       |
|LineBacker  | Lines up behind the line of scrimmage                                                 |
|Front Block | Back with returner to block or catch short punts                                   |
|Returner    | Lines up 40 yards behind line of scrimmage. chooses to return or signal fair catch |


In [None]:
Image("../input/images/Players_Roles.png")

One of the new Kickoff rules deals with formations:
* Kicking team players must have 5 on either side of the ball, at least 2 between hashmarks and numbers, at least 2 outside numbers
* Receiving team must have 8 players within the “setup zone” (15-yd zone between 10-25 yards from the ball)
https://operations.nfl.com/the-rules/nfl-video-rulebook/kickoff/

A FG rule also addresses defense formation:

*  No more than six Team B (defense) players may be on the line of scrimmage on either side of the snapper at the snap.
https://operations.nfl.com/the-rules/nfl-video-rulebook/illegal-formation/


I will use role and side data to determine if particular formations lead to more concussions.  In punting terms, limiting the team to 2 outside the numbers would be equivalent of limiting a team to 2 Jammers.
The 8 players in the setup zone would also eliminate front blockers and more than 2 gunners.  

For line balance, both FG's and Kickoffs restrict the number of players that can be on one side of the ball.

In [None]:
#File I created that adds team (coverage vs. return), side (left, right, center)
# and Group (Gunner, Jammer, Dline, Wing)
expanded_role = pd.read_csv('../input/extradata/Expanded_Roles.csv')

#Players rarely change positions but they frequently change numbers (pre-season vs regular season, 2016 vs 2017)
player_jersey = player_jersey.groupby('GSISID').agg({'Number': ', '.join, 
                             'Position': 'first' }).reset_index()

#Merge 3 DFs into one DF
player = pd.merge(player_role,player_jersey,on='GSISID')
player = pd.merge(player,expanded_role,on="Role")

#Formation - How players are lined up
RoleGroup = player.groupby(['Season_Year','GameKey','PlayID','Cov_Ret','Group']).agg({'Role': 'count'}).reset_index()
Role = RoleGroup.sort_values(['Season_Year','GameKey','PlayID','Role'])
Role['RoleCount'] = Role.Group + '-' + Role.Role.map(str)
Role =  Role.groupby(['Season_Year','GameKey','PlayID','Cov_Ret']).agg({'RoleCount': ', '.join}).reset_index()
Role.columns = ['Season_Year','GameKey','PlayID','Cov_Ret','Formation']

### Player Roles
Let's start with looking for connections between player roles and concussions

In [None]:
#Add Concussion column to Video to help separate concussion plays
video['Concussion']='YES'
Role_Con = pd.merge(Role,video,on=['Season_Year','GameKey','PlayID'],how='left')
Role_Con.Concussion = Role_Con.Concussion.fillna('NO')

#Group by formation.  Separate formations by coverage and return teams
Role_Agg = Role_Con.groupby(['Cov_Ret','Formation','Concussion'])['Season_Year'].agg('count').reset_index()

#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Role_Yes = Role_Agg[Role_Agg.Concussion=='YES']
Role_No = Role_Agg[Role_Agg.Concussion=='NO']

Role_Percent=pd.merge(Role_Yes,Role_No,on=['Cov_Ret','Formation'],suffixes=['_Yes','_No'],how='left')

#The second calculation is a little non-intuitive.  I could have hard coded, but I 
# wanted the code to work if the data changed
# Len(Role_Con) = 2*plays because each play has a coverage formation and return formation
Role_Percent['Season_Year_Yes_Percent'] = Role_Percent.Season_Year_Yes/len(video)
Role_Percent.Season_Year_No = Role_Percent.Season_Year_No/(len(Role_Con)/2-len(video))

#High Number indicate bad formations
Role_Percent['Ratio'] = Role_Percent.Season_Year_Yes_Percent/Role_Percent.Season_Year_No

#Drop unnecessary columns and clean up names
Role_Percent = Role_Percent[['Cov_Ret','Formation','Season_Year_Yes','Season_Year_No','Season_Year_Yes_Percent','Ratio']]
Role_Percent.columns = ['Cov_Ret','Formation','Concussion_Count','Not_Concussion_Percent','Concussion_Percent','Ratio']
display(Role_Percent)

The coverage team had the same formation for all concussions.  This formation is also the most popular formation by far.  From this point, I will focus only on the return teams' formation.
The return team had a wide variety  of formations, but there were a few positions that seemed imporant.

### Jammers & Front Blockers
Jammers look like they are important.  More than 2 appear to cause problems. 

Front Blockers look like they might be a problem as well, but the numbers are small (3).  Teams rarely use a front blocker.  If any rules would increase the use of front blockers, that might be a problem.  At the very least, such a rule would need to be evaluated in preseason to determine if it inadvertantly increased concussions.

Let us rerun the analysis focused only on the returning team positions and looking at number in those Roles. I ignore returner because it was 1 in all cases. (theoretically, a team could use two punt returners, but I did not see that formation in the concussion data) 

In [None]:
#Fortunately I already have that data.  In hindsight, I should have looked at this first. 
RoleGroup_Ret = RoleGroup[(RoleGroup.Cov_Ret=='Return') & (RoleGroup.Group != 'Returner')]

#Very similar to above, but with a few tweaks
Role_Con = pd.merge(RoleGroup_Ret,video,on=['Season_Year','GameKey','PlayID'],how='left')
Role_Con.Concussion = Role_Con.Concussion.fillna('NO')

#Group by formation.  Separate formations by coverage and return teams
Role_Agg = Role_Con.groupby(['Group','Role','Concussion'])['Season_Year'].agg('count').reset_index()
Role_Agg.columns = ['Group','Role','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Role_Yes = Role_Agg[Role_Agg.Concussion=='YES']
Role_No = Role_Agg[Role_Agg.Concussion=='NO']

Role_Percent=pd.merge(Role_Yes,Role_No,on=['Group','Role'],suffixes=['_Yes','_No'],how='left')
Role_Percent['Count_Yes_Percent'] = Role_Percent.Count_Yes/len(video)
Role_Percent['Count_No_Percent'] = Role_Percent.Count_No/(len(play_information)-len(video))

#High Number indicate bad formations
Role_Percent['Ratio'] = Role_Percent.Count_Yes_Percent/Role_Percent.Count_No_Percent
Role_Percent=Role_Percent.rename(columns = {'Role':'Players'})
display(Role_Percent[['Group','Players','Count_Yes','Count_No','Ratio']])

Ratio compares the % of plays in the concussion dataset to the % of plays in the overall dataset.   Jammer and Front Block have the most interesting data.  A low number of defensive lineman and linebackers would be associated with both additional jammers and using a front blocker.  From here on out, I will focus on Jammers.

### Side Data 
In this section, I'll compare balanced lines (equal number of players on left and right).   
Going to take Side Data and do Left-Right to get an idea of overloaded/lopsided lines
* 0 = perfectly balanced
* +1 more player on left
* -1 = one more player on right

This will also remove the problem with center players.  A player in the center would be scored a 0 and not effect the balance.

In [None]:
#Calculate how players are lined up
#Based on above will only look at Dline and Linebacker
SidePlayer = player[player.Group.isin(['DLine','Linebacker'])]
SideGroup = SidePlayer.groupby(['Season_Year','GameKey','PlayID','Side']).agg({'Role': 'count'}).reset_index()
Side_Left = SideGroup[SideGroup.Side=='Left']
Side_Right = SideGroup[SideGroup.Side=='Right']
Side_Con = pd.merge(Side_Left,Side_Right,on=['Season_Year','GameKey','PlayID'],how='inner',suffixes=['_Left','_Right'])
Side_Con['Delta']= Side_Con.Role_Left-Side_Con.Role_Right
Side_Con = Side_Con[['Season_Year','GameKey','PlayID','Delta']]
Side_Con = pd.merge(Side_Con,video,on=['Season_Year','GameKey','PlayID'],how='left')
Side_Con.Concussion = Side_Con.Concussion.fillna('NO')

#Group by formation.  Separate formations by coverage and return teams
Side_Agg = Side_Con.groupby(['Delta','Concussion'])['Season_Year'].agg('count').reset_index()
Side_Agg.columns = ['Delta','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Side_Yes = Side_Agg[Side_Agg.Concussion=='YES']
Side_No = Side_Agg[Side_Agg.Concussion=='NO']

Side_Percent=pd.merge(Side_Yes,Side_No,on=['Delta'],suffixes=['_Yes','_No'],how='left')
Side_Percent['Count_Yes_Percent'] = Side_Percent.Count_Yes/len(video)
Side_Percent['Count_No_Percent'] = Side_Percent.Count_No/(len(play_information)-len(video))

#High Number indicate bad formations
Side_Percent['Ratio'] = Side_Percent.Count_Yes_Percent/Side_Percent.Count_No_Percent
display(Side_Percent[['Delta','Count_Yes','Count_No','Ratio']])

* Breaking out by Delta is too granular.  I will try Delta==0 (completely balanced) and Delta!=0 (not balanced)

In [None]:
Side_Con['Balanced']=np.where(Side_Con.Delta==0,True,False)
Side_Agg = Side_Con.groupby(['Balanced','Concussion'])['Season_Year'].agg('count').reset_index()
Side_Agg.columns = ['Balanced','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Side_Yes = Side_Agg[Side_Agg.Concussion=='YES']
Side_No = Side_Agg[Side_Agg.Concussion=='NO']

Side_Percent=pd.merge(Side_Yes,Side_No,on=['Balanced'],suffixes=['_Yes','_No'],how='left')
Side_Percent['Count_Yes_Percent'] = Side_Percent.Count_Yes/len(video)
Side_Percent['Count_No_Percent'] = Side_Percent.Count_No/(len(play_information)-len(video))

#High Number indicate bad formations
Side_Percent['Ratio'] = Side_Percent.Count_Yes_Percent/Side_Percent.Count_No_Percent
display(Side_Percent[['Balanced','Count_Yes','Count_No','Ratio']])

Balanced looks slightly worse.  A rule forcing a non-balanced line would be very different than all other formation rules.  
With such sparse data, this small difference is not significant.

## Game Data
A rule could not be based on StadiumType, Turf, GameWeather, Temperature, OutdoorWeather. If a particular turf or weather causes more concussions, then there may be equipment or stadium designs needed.

# Player Role Summary
The number of Jammers, the players who block gunners, appear to be important.  Line imbalance does not.

# EDA - quick look at multiple small tables
Let's take a quick look at some variables that might be important.  I'll just do quick tables because the purpose of this analysis is to get a feel for the data.  For numerical values, I'm using qcut with 5 bins.  This approach bins the data using all 6000+ punts and then sees how the concussion data.  If all the bins are about the same, then the variable is likely not important.  If there are wide variations, then it will be worth exploring.  

In [None]:
#Combine the data to use
conc_player = pd.merge(video_review,player, left_on=['Season_Year','GameKey','PlayID','GSISID'],
                       right_on=['Season_Year','GameKey','PlayID','GSISID'],how='left')
conc_player = pd.merge(conc_player,play_information,on=['Season_Year','GameKey','PlayID'] ,how='left')
#conc_player.head()

In [None]:
display(conc_player.groupby('Player_Activity_Derived')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Evenly split between blocking and tackling')

display(conc_player.groupby('Primary_Impact_Type')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Most concussions involve hitting another helmet or body')

display(conc_player.groupby('Score_Diff_qcut')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('This is an unlikely variable, but I can get a sense of what random looks like for this dataset.')

display(conc_player.groupby('YardsforTD_qcut')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Very evenly split.  Interesting that we can see that 80% of returns are <15 yards.  Only 20% are "big" returns. ')

display(conc_player.groupby('NetYards_qcut')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Bad punt/punt coverage has slightly more concussions.')

display(conc_player.groupby('PuntYards_qcut')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('"Outkicking the coverage" is associated with long returns.  The longest punts are about equivalent for concussions.')

display(conc_player.groupby('Group')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Offensive Line has by far the most concussions. Second is Wing, which is a very similar position. One factor to take into account is that there are 5 OL, 2 Wings, 2 gunners, and 1 Returner on each punt.')

display(conc_player.groupby('Position')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Comparing this table with the previous table, and the obvious question is "What happened to the offensive line?"  Not a single offensive lineman received a concussion even though 14 players play o-line did.  Looking into this in more detailed revealed that to improve punt coverage, players play out of position on the offensive line.')

display(conc_player.groupby('Cov_Ret')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('3x injuries for the coverage team.  This finding was definitely unexpected. The difference is very large so should be significant even at these low numbers.')

display(conc_player.groupby('Side')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('No difference seen.')

display(conc_player.groupby('Season_Type')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('There are 4 preseason games and 16 regular season games.  On a per game basis, preseason has twice the concussions of regular season.')

display(conc_player.groupby('Season_Year')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Despite the 2017 rule change, no obvious progress')

display(conc_player.groupby('Friendly_Fire')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Most concussions do not involve friendly fire')

display(conc_player.groupby('Penalty')['GSISID'].count().reset_index().rename(columns = {'GSISID':'count'}))
print('Over a quarter of concussions are on penalty plays.  I will review the video to determine if the penalty and concussion are related. ')

## EDA Summary
The quick EDA gave an idea of some variables to keep an eye on.  I'll use the videos to validate the data and then revisit. 
The ones that stood out to me are:
* **season_type** - preseason has more injuries than regular season
* **position_group** - Offensive lineman have the most concussions., While they play the offensive line, they are actually linebackers, cornerbacks, receivers, and running backs
* **coverage team** - players covering punts are 3x more likely to get injured

# Load and Clean NGS Data
Before reviewing the videos, I will use NGS data to create diagrams on the plays and identify the speed at impact.

## Load NGS Data
To save space only loading plays with concussions. Could save more space by only loading GSISID in concussion but want to see what other players are doing during the play.

In [None]:
NGS_Key = video[:][['Season_Year','GameKey','PlayID']]
# gets all csv with NGS in their filename
path = "../input/NFL-Punt-Analytics-Competition/"
NGS_csvs = [path+file for file in os.listdir(path) if 'NGS' in file]

NGS = pd.DataFrame() #initialize an empty dataframe

# loop to csv then appends it to df
for path_csv in NGS_csvs:
    _df = pd.read_csv(path_csv,low_memory=False)
    _df = pd.merge(NGS_Key,_df,how='left', on=['Season_Year','GameKey','PlayID'])
    NGS = NGS.append(_df,ignore_index=True)
    del _df # deletes the _df to free up memory
NGS = NGS.dropna(subset=['GSISID','x','y','dis','o','dir'])       

Note: 2 of the plays do not have NGS data.  That will not have a major impact on the analysis
The NGS data does include a long timespan for each play.  For analysis, I will narrow the time to the line_set to 2 seconds after the conclusion (tackle, out of bounds, fair catch, touchdown, downed punt)

In [None]:
NGS['Time'] = pd.to_datetime(NGS.Time)
NGS.sort_values(['Season_Year','GameKey','PlayID','Time'])
NGS['Event'] = NGS.groupby(['Season_Year','GameKey','PlayID','Time'])['Event'].fillna(method='ffill')
NGS['Event'] = NGS.groupby(['Season_Year','GameKey','PlayID','Time'])['Event'].fillna(method='bfill')

In [None]:
def probplot(df,groupCol,varCol):
#if 1==1:
    #df = Tackle[:]
    #groupCol ='GameKey'
    #varCol='DIS_diff'
    g=df[[groupCol,varCol]].dropna()
    h=g.groupby(groupCol)[varCol].rank(pct=True,method='dense').reset_index()
    g=pd.merge(g,h,right_on='index',left_index=True,suffixes=('','_pct'))
    g=g.sort_values([groupCol,varCol])
    traces=[]
    for hue in sorted(g[groupCol].unique()):
            trace = go.Scatter(y=g[g[groupCol]==hue][varCol + '_pct'], x=g[g[groupCol]==hue][varCol],name=str(hue), showlegend=True)
            traces.append(trace)
    data = traces
    layout = go.Layout(
    autosize=False,
    width=500,
    height=500)
    fig = dict(data=data, layout=layout)
    #print("\n\n\t",play_description)
    #print(GameKey,PlayID)
    #print(URL)
    offline.iplot(fig)#, config=config) 

In [None]:
#Inner Join Player to remove players who were not on punt play and to remove players without NGS data
Tackle = pd.merge(NGS,player,on=['Season_Year','GameKey','PlayID','GSISID'],how='inner')
Tackle = pd.merge(NGS,video[['Season_Year','GameKey','PlayID','GSISID']], 
                             on=['Season_Year','GameKey','PlayID'],suffixes=['','_Conc'])

del NGS # deletes the _df to free up memory
#Run for Debugging
#GSISIDGroup = player.groupby(['GameKey','PlayID']).agg({'GSISID': 'count'}).reset_index()
#Note: GameKey=89, PLayID=4662 has 1 GSISID GameKey=319, PlayID=3019 has 16 GSISID

#Copy Concussed GSISID Data to a new column
Tackle['Conc_X'] = np.where(Tackle.GSISID==Tackle.GSISID_Conc,Tackle.x,np.nan)
Tackle['Conc_Y'] = np.where(Tackle.GSISID==Tackle.GSISID_Conc,Tackle.y,np.nan)
Tackle['Conc_Dis'] = np.where(Tackle.GSISID==Tackle.GSISID_Conc,Tackle.dis,np.nan)
Tackle['Conc_Dir'] = np.where(Tackle.GSISID==Tackle.GSISID_Conc,Tackle.dir,np.nan)

#Is there a better way to do this???
#Copy Conc_ position to all other positions.
#Note: calling ffill and then bfill on same line did not act as I expected.  Groupby only applied to first call
Tackle.Conc_X = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_X'].fillna(method='ffill')
Tackle.Conc_Y = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Y'].fillna(method='ffill')
Tackle.Conc_X = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_X'].fillna(method='bfill')
Tackle.Conc_Y = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Y'].fillna(method='bfill')
Tackle.Conc_Dis = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Dis'].fillna(method='ffill')
Tackle.Conc_Dir = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Dir'].fillna(method='ffill')
Tackle.Conc_Dis = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Dis'].fillna(method='bfill')
Tackle.Conc_Dir = Tackle.groupby(['Season_Year','GameKey','PlayID','Time'])['Conc_Dir'].fillna(method='bfill')

#How Far is the player from the Conc_player  (Conc_player will always be 0)
Tackle['Distance']= ((Tackle.x-Tackle.Conc_X)**2+(Tackle.y-Tackle.Conc_Y)**2)**0.5

#Get DF in right order


Tackle.sort_values(['Season_Year','GameKey','PlayID','GSISID','Time'],inplace=True)
Tackle['Speed'] = Tackle.groupby(['Season_Year','GameKey','PlayID','GSISID'])['Distance'].diff(1)
Tackle['Acceleration'] = Tackle.groupby(['Season_Year','GameKey','PlayID','GSISID'])['Speed'].diff(1)


#Calculate beginning of each play
minSeconds = Tackle.groupby(['Season_Year','GameKey','PlayID'])['Time'].min().reset_index()
minSeconds.columns = ['Season_Year','GameKey','PlayID','PlayStart']

#Merge with Tackle DF and calculate seconds for each play
Tackle = pd.merge(Tackle,minSeconds,on=['Season_Year','GameKey','PlayID'],how='left')
Tackle['seconds'] = (Tackle['Time']-Tackle.PlayStart).dt.total_seconds()

#Sort again
Tackle.sort_values(['Season_Year','GameKey','PlayID','GSISID','seconds'],inplace=True)

#delta seconds & shift_seconds are used to address gaps in NGS data
#If there is a gap in the previous 2 records, then acceleration will be wrong.
Tackle['delta_seconds'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['seconds'].diff(1)
Tackle['shift_seconds'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['delta_seconds'].shift(1)


#Remove bad data rows
Tackle['Good_time'] = np.where((Tackle['delta_seconds']>0.15)|(Tackle['shift_seconds']>0.15),False,True)

Tackle['Conc_DIS_diff'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['Conc_Dis'].diff(1)
Tackle['Conc_DIS_diffA'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['Conc_DIS_diff'].shift(1)
Tackle['Conc_DIS_diffB'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['Conc_DIS_diff'].shift(-1)
Tackle['DIS_diff'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['dis'].diff(1)
Tackle['DIS_diffA'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['DIS_diff'].shift(1)
Tackle['DIS_diffB'] = Tackle.groupby(['GameKey','PlayID','GSISID'])['DIS_diff'].shift(-1)

During video analysis, I noted that players appeared to move quickly when they were under piles of other players.  To fix this problem, I looked at the change in speed and removed outliers.  To do that I used a probplot.
Since many people are not famiilar with a probplot or qqplot, I'll give a brief overview.  You can imagine you take a histogram and pull the right lower corner to the right upper corner.  If the distribution is normal, the data will make a perfectly straight line.  This is a great tool to compare multiple distributions and to spot outliers.  For more info, see https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot

In [None]:
Image("../input/images2/threegraphs.png")

The above image shows the same exact data in 3 different graphs -- probplot, stacked histogram, and boxplot.  As can be seen, the prob plot provides more detail than the other two and is able to show many more groups without losing information.
As an added bonus, it also shows if the data is normally distributed.  The dummy data was randomly generated and that is obvious only from the prob plot.

In [None]:
probplot(Tackle,'GameKey','DIS_diff')

From the probplot it is obvious that there are severe outliers.  I drew the limits at -0,2 and 0.2

In [None]:

Tackle['Good_dis'] = np.where((Tackle.DIS_diff<-0.2) | (Tackle.DIS_diff>0.2) | 
                              (Tackle.DIS_diffA<-0.2) | (Tackle.DIS_diffA>0.2) | 
                              (Tackle.DIS_diffB<-0.2) | (Tackle.DIS_diffB>0.2) |
                              (Tackle.Conc_DIS_diff<-0.2) | (Tackle.Conc_DIS_diff>0.2) |
                              (Tackle.Conc_DIS_diffA<-0.2) | (Tackle.Conc_DIS_diffA>0.2) | 
                              (Tackle.Conc_DIS_diffB<-0.2) | (Tackle.Conc_DIS_diffB>0.2), False, True)
Tackle = Tackle[Tackle.Good_time & Tackle.Good_dis]
Tackle['Shift_Speed'] = Tackle['Speed'].shift(-1)
Tackle['Hit_Speed'] = np.where(((Tackle['Shift_Speed']/Tackle['Speed']<0) &
                               (Tackle['Distance']<5)),abs(Tackle['Speed']),0)

In [None]:
probplot(Tackle,'GameKey','DIS_diff')

After removing the outliers, the prob plots look much cleaner

In [None]:

#Add Partner where available
conc_player.Primary_Partner_GSISID = pd.to_numeric(conc_player.Primary_Partner_GSISID,errors='coerce') 
conc_pair = pd.merge(conc_player,player,how='left', left_on=['Season_Year','GameKey','PlayID','Primary_Partner_GSISID'],
                    right_on=['Season_Year','GameKey','PlayID','GSISID'], suffixes=('','_Partner'))

#Add in Jammers based on EDA
Role_Add = Role_Con[Role_Con.Group=='Jammer'][['Season_Year','GameKey','PlayID','Role']].rename(columns = {'Role':'Jammers'})
conc_pair = pd.merge(conc_pair,Role_Add, on=['Season_Year','GameKey','PlayID'], how='left')

#Add in Balance based on EDA
conc_pair = pd.merge(conc_pair,Side_Con[['Season_Year','GameKey','PlayID','Balanced']], on=['Season_Year','GameKey','PlayID'], how='left')

conc_pair = pd.merge(conc_pair,video[['Season_Year','GameKey','PlayID','URL']], on=['Season_Year','GameKey','PlayID'], how='left')

# Videos and Graphs

## Graph Layout
Use the layout from the starter kernal because it is nice.


In [None]:
def load_layout():
    """
    Returns a dict for a Football themed Plot.ly layout 
    """
    layout = dict(
        title = "Player Activity",
        plot_bgcolor='darkseagreen',
        showlegend=False,
        width=640,
        height=400,
        margin = dict(t=1),
        xaxis=dict(
            autorange=False,
            range=[0, 120],
            showgrid=False,
            zeroline=False,
            showline=True,
            linecolor='black',
            linewidth=1,
            mirror=True,
            ticks='',
            tickmode='array',
            tickvals=[10,20, 30, 40, 50, 60, 70, 80, 90, 100, 110],
            ticktext=['Goal', 10, 20, 30, 40, 50, 40, 30, 20, 10, 'Goal'],
            showticklabels=True
        ),
        yaxis=dict(
            title='',
            autorange=False,
            range=[-3.3,56.3],
            showgrid=False,
            zeroline=False,
            showline=True,
            linecolor='black',
            linewidth=1,
            mirror=True,
            ticks='',
            showticklabels=False
        ),
        shapes=[
            dict(
                type='line',
                layer='below',
                x0=0,
                y0=0,
                x1=120,
                y1=0,
                line=dict(
                    color='white',
                    width=2
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=0,
                y0=53.3,
                x1=120,
                y1=53.3,
                line=dict(
                    color='white',
                    width=2
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=10,
                y0=0,
                x1=10,
                y1=53.3,
                line=dict(
                    color='white',
                    width=10
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=20,
                y0=0,
                x1=20,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=30,
                y0=0,
                x1=30,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=40,
                y0=0,
                x1=40,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=50,
                y0=0,
                x1=50,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=60,
                y0=0,
                x1=60,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=70,
                y0=0,
                x1=70,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=80,
                y0=0,
                x1=80,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=90,
                y0=0,
                x1=90,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=100,
                y0=0,
                x1=100,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=110,
                y0=0,
                x1=110,
                y1=53.3,
                line=dict(
                    color='white',
                    width=10
                )
            )
        ]
    )
    return layout

end_where = ((Tackle.Event=='tackle') | 
             (Tackle.Event=='punt_downed') |
             (Tackle.Event=='out_of_bounds') |
             (Tackle.Event=='touchdown') |
             (Tackle.Event=='fair_catch'))

In [None]:
# Loading and plotting functions
def plot_play(play_number):
#play_number = 1
#if play_number!=-1:    
    """
    Plots player movements on the field for a given game, play, and two players
    """
    #print (Primary.loc[play_number])#[['Season_Year','Season_Type','Week','GameKey','PlayID'])
    GSISID = conc_pair.loc[play_number]['GSISID']
    GameKey = conc_pair.loc[play_number]['GameKey']
    PlayID = conc_pair.loc[play_number]['PlayID']
    play_description = conc_pair.iloc[play_number]["PlayDescription"]
    URL = conc_pair.iloc[play_number]["URL"]
    Second = conc_pair.loc[play_number]['GSISID_Partner']
    
    print ('Season_Type:', conc_pair.iloc[play_number]['Season_Type'],
           '  Season_Year:', conc_pair.iloc[play_number]['Season_Year'],
           '  GameKey:', conc_pair.iloc[play_number]['GameKey'],
           '  PlayID:', conc_pair.iloc[play_number]['PlayID'],
           '  Jammers:', conc_pair.iloc[play_number]['Jammers'],
           '  Balanced:', conc_pair.iloc[play_number]['Balanced'], 
           '  Penalty:', conc_pair.iloc[play_number]['Penalty'])
    display ()
    print ("GSISID:", GSISID,
               "  Role:", conc_pair.iloc[play_number]['Group'],
               "  Number:", conc_pair.iloc[play_number]['Number'],
               "  Position:", conc_pair.iloc[play_number]['Position'],
               "  Activity:", conc_pair.iloc[play_number]['Player_Activity_Derived'])
    if pd.isna(Second)==False:
        display()
        print ("Partner GSISID:", Second,
           "  Role:", conc_pair.iloc[play_number]['Group_Partner'],
           "  Number:", conc_pair.iloc[play_number]['Number_Partner'],
           "  Position:", conc_pair.iloc[play_number]['Position_Partner'],
           "  Activity:", conc_pair.iloc[play_number]['Primary_Partner_Activity_Derived'],
           "  Friendly Fire:", conc_pair.iloc[play_number]['Friendly_Fire'])
    display (play_description)

    game_df = Tackle[(Tackle.PlayID==PlayID) & (Tackle.GameKey==GameKey)].sort_values("Time")
    playstart = game_df[game_df.Event == "line_set"]["Time"].min()

    end_where = ((game_df.Event=='tackle') | 
             (game_df.Event=='punt_downed') |
             (game_df.Event=='out_of_bounds') |
             (game_df.Event=='touchdown') |
             (game_df.Event=='fair_catch'))
    
    playend = game_df[end_where]["Time"].min() + pd.to_timedelta(2, unit='s')
    
    game_df = game_df[(game_df.Time > playstart) & (game_df.Time < playend)]
    
    
    if len(game_df)==0:
          game_df = Tackle[(Tackle.PlayID==PlayID) & (Tackle.GameKey==GameKey)].sort_values("Time")
          #return URL
    #GameKey=str(pd.unique(game_df.GameKey)[0])
    #HomeTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[0]
    #VisitingTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[1]
    #YardLine = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)]['YardLine'].iloc[0]
    
    traces=[]   
    game_df['Delta'] = game_df.Time - game_df.Time.min()
    game_df.Delta = game_df.Delta.dt.total_seconds()   
    
    playerid = int(GSISID)
    playernumber = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Number'].values[0]
    playerGroup = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Group'].values[0]
    trace = go.Scatter(
        x = game_df[game_df.GSISID==playerid].x,
        y = game_df[game_df.GSISID==playerid].y,
        name ='Position: '+str(playerGroup) + ' Number: '+str(playernumber),
        mode='markers',
        marker = dict(
        size = np.minimum(game_df[game_df.GSISID==playerid].Delta+6,10),
            color = 'rgba(255,255,0, .8)'))
    traces.append(trace)
    
    #Partner
    if pd.isna(Second)==False:
        playerid = int(Second)
        playernumber = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                                  (player.GameKey==GameKey)]['Number'].values[0]
        playerGroup = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                                  (player.GameKey==GameKey)]['Group'].values[0]
        trace = go.Scatter(
            x = game_df[game_df.GSISID==playerid].x,
            y = game_df[game_df.GSISID==playerid].y,
            name ='Position: '+str(playerGroup) + ' Number: '+str(playernumber),
            mode='markers',
            marker = dict(
            size = np.minimum(game_df[game_df.GSISID==playerid].Delta+6,10),
                color = 'rgba(0,255,0, .8)'))
        traces.append(trace)
    
    #get coverage    
    for playerid in pd.unique(player[(player.PlayID==PlayID) & (player.GameKey==GameKey) & 
                                    (player.Cov_Ret=='Coverage')]['GSISID']):
        playerid = int(playerid)
        playernumber = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Number'].values[0]
        playerGroup = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Group'].values[0]
        trace = go.Scatter(
            x = game_df[game_df.GSISID==playerid].x,
            y = game_df[game_df.GSISID==playerid].y,
            name ='Position: '+str(playerGroup) + ' Number: '+str(playernumber),
            mode='markers',
            marker = dict(
            size = 2,
            color = 'rgba(0, 0, 255, .8)'))
        traces.append(trace)
    
    #get receivers    
    for playerid in pd.unique(player[(player.PlayID==PlayID) & (player.GameKey==GameKey) & 
                                     (player.Cov_Ret=='Return')]['GSISID']):
        playerid = int(playerid)
        trace = go.Scatter(
            x = game_df[game_df.GSISID==playerid].x,
            y = game_df[game_df.GSISID==playerid].y,
            name ='Position: '+str(playerGroup) + ' Number: '+str(playernumber),
            mode='markers',
            marker = dict(
            size = 2,
            color = 'rgba(255,0,0, .8)'))
        traces.append(trace)

    
    layout = load_layout()
   # layout['title'] =  HomeTeam + \
   # ' vs. ' + VisitingTeam + \
   # '<br>Possession: ' + \
   # YardLine.split(" ")[0] +'@'+YardLine.split(" ")[1]
    data = traces
    fig = dict(data=data, layout=layout)
    #print("\n\n\t",play_description)
    #print(GameKey,PlayID)
    #print(URL)
    offline.iplot(fig, config=config)
    
    return URL
    #HTML('<video width="560" height="315" controls> <source src=a type="video/mp4"></video>')


In [None]:
# Loading and plotting functions
def plot_acceleration(play_number):
#play_number=16
#if play_number!=-1:
    GSISID = conc_pair.loc[play_number]['GSISID']
    GameKey = conc_pair.loc[play_number]['GameKey']
    PlayID = conc_pair.loc[play_number]['PlayID']   
    
    game_df = Tackle[(Tackle.PlayID==PlayID) & (Tackle.GameKey==GameKey)].sort_values("Time")
    playstart = game_df[game_df.Event == "line_set"]["Time"].min()

    end_where = ((game_df.Event=='tackle') | 
             (game_df.Event=='punt_downed') |
             (game_df.Event=='out_of_bounds') |
             (game_df.Event=='touchdown') |
             (game_df.Event=='fair_catch'))
    
    playend = game_df[end_where]["Time"].min() + pd.to_timedelta(2, unit='s')
    
    game_df = game_df[(game_df.Time > playstart) & (game_df.Time < playend)] 
    
    if len(game_df)==0:
          game_df = Tackle[(Tackle.PlayID==PlayID) & (Tackle.GameKey==GameKey)].sort_values("Time")
    game_df['Delta'] = game_df.Time - game_df.Time.min()
    game_df.Delta = game_df.Delta.dt.total_seconds()
    Fast_Hits = game_df.groupby('GSISID')['Hit_Speed'].max().reset_index()
    Fast_Hits = Fast_Hits[Fast_Hits.Hit_Speed> 0.1]
    if len(Fast_Hits)==0:
        print("No Fast Hits")
        return
    traces=[]  
    for playerid in pd.unique(Fast_Hits.GSISID):
        playerid = int(playerid)
        playernumber = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Number'].values[0]
        playerGroup = player[(player.GSISID==playerid)&(player.PlayID==PlayID) & 
                              (player.GameKey==GameKey)]['Group'].values[0]

        trace = go.Scatter(
            x = game_df[game_df.GSISID==playerid].Delta,
            y = game_df[game_df.GSISID==playerid].Hit_Speed*36000/1760,
            name ='Position: '+str(playerGroup) + ' Number: '+str(playernumber),
            mode='lines+markers',
            marker = dict(
            size = 10
            #,color = 'rgba(152, 0, 0, .8)'
            ))
        traces.append(trace)
    layout = go.Layout(
    title='Hit Speed vs Time',
        width=800,
        height=350,
        showlegend=True,
    yaxis=dict(
            title='Hit Speed (MPH)'
            )
        )
    fig = go.Figure(data=traces,layout=layout)
    offline.iplot(fig)#, config=config)
    return
    
    #HTML('<video width="560" height="315" controls> <source src=a type="video/mp4"></video>')


## Main Graph
The code below was integral to my video analyis.  This view included all the information I needed to review each of the videos.  It shows the time, speed, location, and participants for each concussion.  It also shows where everyone else on the field was during the play.

Below is an example of a play.  

### Text
The first 4 lines of text give all the key information about the play. Included is the variables I flagged from the EDA above.

### Play Diagram
The concussed player is identified in yellow.  The partner in green.  Red indicates coverage players, blue indicates punters.  Each dot is 0.1 seconds apart so dense lines indicate slowness and dotted lines indicate high speed.

### Impact Graph
For the concussed player, the distance to all other players was calculated use x,y data. The change in that distance was then used as speed. The speed when distance crossed 0 was then used as the impact speed.  One side effect of this approach was that near misses and collisions were both recorded.  

Distance Between Players: $ \sqrt (X_1 - X_2)^2 + (Y_1-Y_2)^2) $

Speed:  $ (D_{t=2}-D_{t=1})/0.1 $

Impact Speed: Speed when Distance = 0

The speed was converted to miles per hour to give a common frame of reference to the speed. As a quick refresher, 6 MPH is a 10 minute mile or a light jog.  10 MPH is a run or 6 minute mile.  15 MPH and above would be a sprint.  Because this speed measures the difference between two elite atheletes, there are values of 30 MPH or above.  That would occur when two players were running at full speed directly at each other. 

The time on the x-axis is seconds and 0 is equal to line_set in the NGS data.  I used that value because aligned very closely with most of the videos.

### Video
The URL and the video are provided so that video can be watched either in the notebook or on a second monitor.

### Running the cell
Each of the 37 plays could be called by giving the row number. I did this to make sure, I did not overlook any play.  In addition a for loop could then run all 37 plays. For submission, I did not run the loop because I did not want to overwhelm reviewers.  

In [None]:
#for play_no in range(0,36):
play_no=12
URL=plot_play(play_no)
plot_acceleration(play_no)
print(URL)
HTML("""<video width="840" height="350" controls=""><source src="{0}"> type="video/mp4"</video>""".format(URL))

### Speed vs. concussions

In [None]:
SpeedChart = pd.merge(Tackle[(Tackle.Hit_Speed>0.4)],conc_pair[['Season_Year','GameKey','PlayID','Primary_Partner_GSISID']],
                      how='left',left_on=['Season_Year','GameKey','PlayID','GSISID'],
                       right_on=['Season_Year','GameKey','PlayID','Primary_Partner_GSISID'])
SpeedChart['Partner'] = np.where(pd.isna(SpeedChart.Primary_Partner_GSISID),'NotPartner','Partner')
SpeedChart['Hit_Speed'] = SpeedChart['Hit_Speed']*36000/1760 

In [None]:
probplot(SpeedChart,'Partner','Hit_Speed')

The probplot show that speed of collisions is not a key differentiator between concussions and not concussions.  Collisions that caused concussions and those that did not had the same range of speeds.   I would have expected that speed would be an important variable.  After looking at the videos with the hit speed chart, I found that even relatively low speed hits can cause concussions.  The most consistent problem I found was group tackles.  With multiple bodies coliding, one person would get hit in the head.   

###Video Review Key Summary
 There were many small groups of reasons for concussions.  
 1. Concussions are covered by current rules,
 2. Concussions not during a punt return. (Fakes and before the punt return)
 3. Group Tackles
 4. Secondary collision.  One player would be blocked into another.
 5. Slipping
 6. Punt muffs
 7. Short punts. The returner would catch the ball on the run and be immediately hit.
 
** Notably, speed of collision and blindside blocks were not a problem. ** 
 
 For futher analysis, I grouped the data into three groups of about equal size.
 * Exclude from further analysis - already covered and not punt returns
 * Group Tackles
 * Everything else

# Punt return concussions not covered by existing rules
* The 2018 rulebook was used to determine if a penalty should have been called. Most of the violations were flagged by the onfield officials and I deferred to their judgment. There is a very fine distinction between a legal block and an illegal block.  I was conservative and added calls that were obvious and preferrably were not in the official rules at the time.  On one play, I saw what appeared to be a missed illegal block. That block happened way before the punt returner received the ball.  Because that one play violated both of my conditions, I felt comfortable removing it.

* 5 other plays were removed because the concussion did not occur on a punt return.  One was on a fake. Three were before the punt.  One is the punt mentioned above.


In [None]:
bad = pd.read_csv('../input/extradata/Existing_Penalties.csv')
display(bad)
Explained = pd.merge(bad,conc_player,on=['Season_Year','GameKey','PlayID'],how='right')
print('Table shows specific reason each play was excluded from further analysis')

In [None]:
Explained=Explained[pd.isna(Explained.Reason)]

In [None]:
def Display_Explained(strColumn):
#if 1==1:
    #strColumn = 'Player_Activity_Derived'
    clean = Explained.groupby(strColumn)['GSISID'].count().reset_index().rename(columns = {'GSISID':'Clean'})
    data = conc_player.groupby(strColumn)['GSISID'].count().reset_index().rename(columns = {'GSISID':'Raw'})
    merged = pd.merge(clean,data)
    merged['Removed']=merged.Clean-merged.Raw
    display(merged)
    

The analysis above is repeated with the cleaned dataset.  After cleaning the data, are there different patterns?

In [None]:
Display_Explained('Player_Activity_Derived')
print('Tackling is the biggest remaining issue. Looking at videos, the main problem was multiple players tackling. This situation caused players heads to bump into each other.  I can not think of a rule that would prevent group tackles and maintain the integrity of the game.')

Display_Explained('Primary_Impact_Type')
print('Helmet-to-helmet hits were the most reduced by rules changes')

Display_Explained('YardsforTD_qcut')
print('Almost all concussions occur when the punter is punting for distance')

Display_Explained('Cov_Ret')
print('The initial analysis of 3x injuries for the coverage team is unchanged.')

Display_Explained('Season_Type')
print('There problem with preseason games appears even worse after cleaning the data. Remember there are 4x more regular season games than preseason games')

Display_Explained('Season_Year')
print('2017 now looks much worse than 2016 but the numbers are so low the difference is not meaningful')

Display_Explained('Penalty')
print('5 concussions occurred on plays that were nullified for other penalties.')

In [None]:
#Calculate how players are lined up
#Based on above will only look at Dline and Linebacker
SidePlayer = player[player.Group.isin(['DLine','Linebacker'])]
SideGroup = SidePlayer.groupby(['Season_Year','GameKey','PlayID','Side']).agg({'Role': 'count'}).reset_index()
SideGroup = pd.merge(bad,SideGroup,on=['Season_Year','GameKey','PlayID'],how='right')
SideGroup = SideGroup[pd.isna(SideGroup.Reason)]
Side_Left = SideGroup[SideGroup.Side=='Left']
Side_Right = SideGroup[SideGroup.Side=='Right']
Side_Con = pd.merge(Side_Left,Side_Right,on=['Season_Year','GameKey','PlayID'],how='inner',suffixes=['_Left','_Right'])
Side_Con['Delta']= Side_Con.Role_Left-Side_Con.Role_Right
Side_Con = Side_Con[['Season_Year','GameKey','PlayID','Delta']]
Side_Con = pd.merge(Side_Con,video,on=['Season_Year','GameKey','PlayID'],how='left')
Side_Con.Concussion = Side_Con.Concussion.fillna('NO')

#Group by formation.  Separate formations by coverage and return teams
Side_Agg = Side_Con.groupby(['Delta','Concussion'])['Season_Year'].agg('count').reset_index()
Side_Agg.columns = ['Delta','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Side_Yes = Side_Agg[Side_Agg.Concussion=='YES']
Side_No = Side_Agg[Side_Agg.Concussion=='NO']

Side_Percent=pd.merge(Side_Yes,Side_No,on=['Delta'],suffixes=['_Yes','_No'],how='left')
Side_Percent['Count_Yes_Percent'] = Side_Percent.Count_Yes/len(video)
Side_Percent['Count_No_Percent'] = Side_Percent.Count_No/(len(play_information)-len(video))

#High Number indicate bad formations
Side_Percent['Ratio'] = Side_Percent.Count_Yes_Percent/Side_Percent.Count_No_Percent
Side_Percent
Side_Con['Balanced']=np.where(Side_Con.Delta==0,True,False)
Side_Agg = Side_Con.groupby(['Balanced','Concussion'])['Season_Year'].agg('count').reset_index()
Side_Agg.columns = ['Balanced','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Side_Yes = Side_Agg[Side_Agg.Concussion=='YES']
Side_No = Side_Agg[Side_Agg.Concussion=='NO']

Side_Percent=pd.merge(Side_Yes,Side_No,on=['Balanced'],suffixes=['_Yes','_No'],how='left')
Side_Percent['Count_Yes_Percent'] = Side_Percent.Count_Yes/len(Explained)
Side_Percent['Count_No_Percent'] = Side_Percent.Count_No/(len(play_information)-len(Explained))

#High Number indicate bad formations
Side_Percent['Ratio'] = Side_Percent.Count_Yes_Percent/Side_Percent.Count_No_Percent
Side_Percent[['Balanced','Count_Yes','Count_No','Ratio']]

In [None]:
#Fortunately I already have that data.  In hindsight, I should have looked at this first. 
RoleGroup_Ret = RoleGroup[(RoleGroup.Cov_Ret=='Return') & (RoleGroup.Group != 'Returner')]
RoleGroup_Ret= pd.merge(bad,RoleGroup_Ret,on=['Season_Year','GameKey','PlayID'],how='right')
RoleGroup_Ret = RoleGroup_Ret[pd.isna(RoleGroup_Ret.Reason)]
#Very similar to above, but with a few tweaks
Role_Con = pd.merge(RoleGroup_Ret,video,on=['Season_Year','GameKey','PlayID'],how='left')
Role_Con.Concussion = Role_Con.Concussion.fillna('NO')

#Group by formation.  Separate formations by coverage and return teams
Role_Agg = Role_Con.groupby(['Group','Role','Concussion'])['Season_Year'].agg('count').reset_index()
Role_Agg.columns = ['Group','Role','Concussion','Count']
#Separate Yes and No's and join them to compare them.
#Many formations had no concussions.  We may revisit them later.
Role_Yes = Role_Agg[Role_Agg.Concussion=='YES']
Role_No = Role_Agg[Role_Agg.Concussion=='NO']

Role_Percent=pd.merge(Role_Yes,Role_No,on=['Group','Role'],suffixes=['_Yes','_No'],how='left')
Role_Percent['Count_Yes_Percent'] = Role_Percent.Count_Yes/len(Explained)
Role_Percent['Count_No_Percent'] = Role_Percent.Count_No/(len(play_information)-len(Explained))

#High Number indicate bad formations
Role_Percent['Ratio'] = Role_Percent.Count_Yes_Percent/Role_Percent.Count_No_Percent
Role_Percent[Role_Percent.Group=='Jammer'][['Role','Count_Yes','Count_No','Ratio']].rename(columns = {'Role':'Jammers'})

The conclusion about Jammers and line balance are unchanged.

## Types of Punts
Based on the analysis above, it is now worthwhile to look at the 2 major different types of punts.
1. Distance Punt - When a team is far from their endzone, the kicker can focus on distance and not worry about kicking the ball into the endzone.
2. Pinning Punt - As a team approaches their endzone, the kicker focuses on pinning the other team inside their 10 yard line.  This punt tends to be higher and shorter and is seldom returned.

Almost all concussions occur on the distance punt.  This is not suprising but has major implications on which rules will help reduce concussions.

## Clean Data Analysis Summary
A rule change should target one or more of the following areas:
1. Tackling
    * Fair Catch bonus yards (5 or 10 yards)
    * Kicking out of bounds bonus (5 or 10 yards)
    * Eliminating Punts
2. Distance Punts (>65 Yards for a TD)
3. Jammers
    * Limit teams to 2 Jammers
4. Coverage Team injuries
5. Preseason
    * Eliminate punts during preseason
    * Experiment with punt rules during preseason

Recent rule changes have helped. The following areas are lower priority: 
1. Blocking
    * Changes to Blindside block definition
    * Changes to defeneless player definition
    * Wedge block rule from kickoff rules
    * Speed differential between players
2. Pinning Punts (<65 Yards for a TD)
    * Changes in TouchBack distance
    * Encouraging teams to go for 4th down
    * Remove rule that places missed fields goals at spot of kick.
3. Line Balance
    * Certain number of players on the line
    * Limit line imbalance
    * Setup Zone rules
4. Returner Injuries
    * CFL 5 yards buffer zone
    * Fair catch results in play ending
    * Changes in muff rules
5. Targeting/Helmet to Helmet Blows
    * Any enhancements to the targeting rule

## Rules & Rule Impacts

### Limit Jammers
My first thought is to limit jammers to 2.  That change would be transparent to the casual fan and would reduce concussions.
Further analysis shows that could have a impact on the return game and also calls into question the conclusion that Jammers should be limited.

In [None]:
print('Short Returns on Distance Punts')
good = pd.merge(bad,Role_Con[(Role_Con.Group == "Jammer") & (Role_Con.Concussion=="YES")], 
                on=['Season_Year','GameKey','PlayID'],how='right')
good = good[(pd.isna(good.Reason_x))]
good = pd.merge(good,play_information[(play_information.YardsforTD>65)] , on=['Season_Year','GameKey','PlayID'],how='left')
good = good[~(pd.isna(good.Season_Type))&(good.ReturnYards<10)]
display(good.groupby('Role')['ReturnYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Return Yards'}).dropna())
#display(good.groupby('Role')['PuntYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Punt Yards'}).dropna())

good = pd.merge(Role_Con[Role_Con.Group=='Jammer'],play_information[(play_information.YardsforTD>65)], on=['Season_Year','GameKey','PlayID'])
good = good[(good.Role>1)&(good.ReturnYards<10)]
display(good.groupby('Role')['ReturnYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Return Yards'}).dropna())
print('On short returns, more jammers do not appear to have an impact on concussions, but the numbers are very small')
print('')
print('Long Returns on Distance Punts')
good = pd.merge(bad,Role_Con[(Role_Con.Group == "Jammer") & (Role_Con.Concussion=="YES")], 
                on=['Season_Year','GameKey','PlayID'],how='right')
good = good[(pd.isna(good.Reason_x))]
good = pd.merge(good,play_information[(play_information.YardsforTD>65)] , on=['Season_Year','GameKey','PlayID'],how='left')
good2 = good[:]
good = good[~(pd.isna(good.Season_Type))&(good.ReturnYards>10)]
display(good.groupby('Role')['PuntYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Punt Yards'}).dropna())

good = pd.merge(Role_Con[Role_Con.Group=='Jammer'],play_information[(play_information.YardsforTD>65)], on=['Season_Year','GameKey','PlayID'])
good = good[(good.Role>1)&(good.ReturnYards>10)]
display(good.groupby('Role')['ReturnYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Return Yards'}).dropna())
print('On long returns, more jammers appear to have an impact on concussions, but the numbers are very small')

good2 = good2[(good2.Role>1)]
display(good2.groupby('Role')['ReturnYards'].agg(['mean','count']).reset_index().rename(columns = {'Role':'Jammers', 'mean':'Return Yards'}).dropna())
print('Looking at all of the concussion data, there is a much larger difference in return yards than would be expected')

In [None]:
probplot(good,'Role','ReturnYards')
probplot(good2,'Role','ReturnYards')

There is a small but consistent increase in return yards associated with more jammers in the overall data, but a large difference in the concussion data.  



### Do more jammers correlate to concussions?  
The concern I have when looking at this data is the difference in return yards in the concussion subsample is very skewed compared to the general population.  In the overall data, more jammers are associated with about a 3 yard increase in return yards.  In the concussion dataset, the difference is very large.  The difference between 2 and 4 jammers is over 20 yards. 


### Increase Fair Catches 

Award the receiving team 5 yards for a fair catch to decrease returns. This rule would eliminate short, "useless" returns.  The analysis below focuses only on distant punts.

In [None]:
down = len(play_information[play_information.Downed==1])
touchback = len(play_information[play_information.Touchback==1])
OOB = len(play_information[play_information.OutofBounds==1])
faircatch = len(play_information[play_information.FairCatch==1])
returns = len(play_information.ReturnYards.dropna())
punts = len(play_information)
print ('Downed: {:.1%} Touchback: {:.1%} OutofBounds: {:.1%} Fair Catch: {:.1%} Returns: {:.1%}'.format(down/punts,touchback/punts,OOB/punts,faircatch/punts,returns/punts))

Today, only 38% of punts are returned.  A good rule change will decrease returns but not eliminate them.

In [None]:
long_punts = play_information[(play_information.YardsforTD>65)]
#long_punts.ReturnYards = long_punts.ReturnYards.fillna(0)
long_punts = long_punts.dropna(subset=['YardsforTD'])

In [None]:
long_punts['ten'] = np.where((long_punts['ReturnYards']<10) | (long_punts.FairCatch==1),1,0)
long_punts['five'] = np.where((long_punts['ReturnYards']<5) | (long_punts.FairCatch==1),1,0)

In [None]:
display(long_punts.groupby('FairCatch')['NetYards'].mean().reset_index())
display(long_punts.groupby('FairCatch')['PuntYards'].mean().reset_index())
print('Fair catches occur on shorter punts but result in fewer net yards')
display(long_punts.groupby('five')['NetYards'].mean().reset_index())
print('A five yard bonus results in fair catches and not fair catches having the same net yards')
display(long_punts.groupby('ten')['NetYards'].mean().reset_index())
print('A ten yard bonus makes fair catches better than not fair catches.')
display(long_punts.groupby('OutofBounds')['NetYards'].mean().reset_index())
print('Kicking out of bounds is slightly better than not, but not as good as forcing a fair catch')

In [None]:
print('Average Return Yards: ', play_information.ReturnYards.mean())

It is difficult to precisely predict the impact of the rule change.  Today, a fair catch results in 4 fewer yards in field position.  The 5 yard bonus effectively eliminates the disparity between fair catches and returns.  On average, a team could fair catch all punts and not be disadvantaged.  


### Why 5 yards and not 10?
1. Start small and can always increase.  5 yards would eliminate 30% of returns and 10 yards would eliminate another 30%
2. There are problems with 10 yards on pinning punts.  Today, returners stand on the 10 yards line and ignore everything over their head.  With a 10 yards bonus, they could fair catch anything and guarantee their team a position outside the 10 yard line.  


### Punt Bonus
A Punt bonus would decrease returns in the same way as a fair catch bonus would. This rule would be awkward because it would be exactly opposite of the kickoff rule that penalizes out of bound kicks.  It would also reward teams for avoiding good returners.  The punting team already has two ways to avoid good returners -- kick out of bounds or kick it short.  

### Eliminating Punts
Eliminating punts even during just the preseason would be a major change to the game.  It would be better to use the preseason to validate any punt rule changes.
There is no need for that big of a change at this time at the NFL level.  High schools and younger teams might want to explore these rules.  

## Integrity of the Game
### Fair Catch
Historically, the NFL has used small yardage nudges to change behavior.  Kickoff touchbacks were increased by 5 yards.  Kickoff starting points have moved multiple times in 5 yards increments.  In an attempt to decrease long field goal attempts, the ball was placed at the spot of the kick.  That rule was essentially an 8 yard "penalty" on missed field goals.

Awarding 5 yards to discourage dangerous play is the exact kind of rule change that the NFL has done before.  The rule would also be transparent to the casual fan.  After a TV timeout, the ball would be advanced 5 yards from where the ball was caught.  

College football has also paved the way for this rule change by allowing teams to fair catch kickoffs inside the 25 and award a touchback.  Likewise, this rule change would set a positive example for college and high school teams. Changes to improve NFL players' safety quickly get moved and expanded to other levels. If the NFL adopted 5 yards for a fair catch, it would not be suprising to see college or high school adopt 10 yards. 

### Limit Jammers
Formation changes are also a popular technique to reduce concussions.  Limiting jammers would be consistent with recent kickoff rules.  It would be easy to referee.  

## Reaction to 5 yards for Fair Catch
How may kicking and receiving teams react to the new rule?  Would any of these reactions result in more injuries.
There are 4 ways the rule may impact strategy
1. Teams try to punt further to recover the 5 yards.  Longer punts are associated with longer returns, so receiving teams may end up returning the ball more often. In addition, receiving teams may use more jammers if they return more balls. Because receiving teams have an effective counter, I believe teams will not consistently go for long punts

2. Teams go for the block more often.  If the returner is going to fair catch, send more players after the punt. Very few concussions were associated with blocking for the punt (3 of 37), so any increase in those concussions should be more tha offset with the decrease in concussions from increased fair catches.  In addition, teams could not go for an all out block without opening themselves up to a fake.

3. Teams may punt out of bounds more often.  Today, kicking out of bounds is slightly worse than a fair catch.  With a 5 yard bonus, teams would be encourage to kick out of bounds. Taken to an extreme, this would remove the excitement of punt returns.  This rule should be evaluated in the preseason to determine if teams choose to always kick it out of bounds.  

4. Pinning a team inside the 10 yard line would be more difficult.  The receiver can fair catch the ball at the 5 and get the ball at the 10.  Balls that land past the 5 yard line are more difficult to down.  Pinning punts are not a problem, so this rule unnecesarily makes them harder.  I resisted the temptation to make the rule more complicated to deal with this case.

## Reaction to Limit Jammers
How may kicking and receiving teams react to the new rule?  Would any of these reactions result in more injuries.

* Limiting jammers would impact all punt plays.  It is more likely to have unforseen consequences.
* With only 2 jammers, teams may use more front blockers.  Front blockers are rarely used right now, but they are associated with a higher rate of concussions.  A rule limiting jammers would also need to address the number of players near the line of scrimmage. 
* Punts are not distinct plays like kickoffs.  On punts, defenses need to protect against fake attempts.  Restrictions on the return punt formations might make defenses more vulnerable.  If a team lines up in a traditional punt formation, the rule is easy to enforce.  What happens if kicking teams explore different formations?  Would the receiving team be unable to counter them because of these rule changes?  

# Summary

In this kernal, I showed an analytical framework for cleaning data and evaluating rule changes to increase player safety. This analysis showed that concussions can be broken up into three roughly equal groups
1. Covered by existing rules or not punt return related
2. Group tackling on long punt returns
3. Long tail of many other factors.

Group tackling is a fundamental part of football, and there were no obvious technique problems.  Any new rule needs to decrease the frequency of long returns. After reviewing 17 rules, there are two leading candidates. Limiting jammers and awarding bonus yards for fair catches both have pros and cons.  Because of data sparsity, it is unclear if the connection between jammers and concussion is real. It could easily be a correlation with long returns.  Because a limit on jammers would impact all punt players, it should not be implemented until the connection is clearer.  Awarding 5 yards for a fair catch is better because it will impact only a subset of punts.   It is also a simpler rule to officiate and can be easily adjusted in the future and at different levels.  
