In [None]:
#Import packages used for analysis

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import dask.dataframe as dd #dealing with complexities of joins, faster table operations
from IPython.display import Image #used to display the images found throughout the kernel

# Input data files are available in the read-only "../input/" directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

#  Data Overview

All data used in this analysis can be found in the "Input" section of the Kernel

* Game Data: Game level data that specifies the type of season (pre, reg, post), week, and hosting city and team. Each game is uniquely identified across all seasons using GameKey.
* Play Information: Play level data that describes the type of play, possession team, score and a brief narrative of each play. Plays are uniquely identified using a its PlayID along with the corresponding GameKey. PlayIDs are not unique.
* Player Punt Data: Player level data that specifies the traditional football position for each player. Each player is identified using his GSISID.
* Play Player Role Data: Play and player level data that specifies a punt specific player role. This dataset will specify each player that played in each play. A player’s role in a play is uniquely defined by the Gamekey PlayID and GSISID.
* Video Review: Injury level data that provides a detailed description of the concussion-producing event. Video Review data are only available in cases in which the injury play can be identified. Each video review case can be identified using a combination of GameKey, PlayID, and GSISID. A brief narrative of the play events is provided.
* NFL_ScrapeR Data: Curated dataset of detailed Play-by-Play information for each NFL game played 2009-2018. The data is maintained by the Carnegie Mellon Sports Analytics Club and is available for free on Kaggle.
* Player Roles Data: This is a manually created data set, which acts as a lookup table for player classification. Each unique role found in the "Play-Player-Role" dataset is further categorized into Team (Punt vs Return), Area (left/ride side player position), and Type (Oline, Dline, Gunner, etc).


> It should be noted that although NGS (Next-Gen Statistics) were provided, they were not leveraged for this analysis. The reason for not including the data, is that although it is interesting, in terms of "regulating" the game or justifying regulations, it provides little support. This is because the NGS data is used to calculate player speed and direction. While there is no doubt likely correlation between player speeds and risk of concussion, the relationship is difficult to affect with rule changes. For example, regulating a "Top Speed" for a punt "Gunner" would be extremely difficult to enforce. Further, player directions or pursuit angles are also difficult to enforce in a live game environment. Although replay could be used in both examples to aid in enforcement, doing so would greatly slow down the game - a sub-optimal outcome.

# Data Preparation 
> credit to "Kaggle Master" John Miller (JPMILLER) for the great Kernel on handiling this complex dataset. *See Reference section for link to Kernel*

The datasets listed above will be consolidated into 
* Player-Level Data 
* Play-Level Data


Further, the following dataset will be imported from the NFL_ScrapeR collection
* NFL_Scrape Data

# Methods - CRISP-DM + Agile Planning

Each of the forthcoming pieces of “Evidence” included in this presentation were treated as a “sprint”; whereas the entire CRISM-DM process was repeated for each piece of evidence to support our rule change recommendations.

This methodology combines best practices for data mining embodied by Cross Industry Standard Process for Data Mining (CRISP-DM) with the best practices for Agile software development. This rigorous experimental research design and analysis process ensures that all aspects of the analytics project are carefully detailed and validated. The basic premise of the Agile Data Science process is to conduct a series of sprints where the entire CRISP-DM process is completed during each sprint. 


# Player-Level Data Prep

In [None]:
# get base data and role types
players = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_player_role_data.csv')
descrs = pd.read_csv('../input/player-roles/datasets_98965_237949_roledscrps.csv')
players = players.merge(descrs, on='Role', how='left').drop('Season_Year',\
                        axis=1)

# tag players involved in concussion events
revs = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_review.csv', 
                   usecols=['GameKey', 'PlayID', 
            'GSISID', 'Primary_Partner_GSISID'], na_values=['Unclear'])\
            .fillna(-99).astype(int)
players = players.merge(revs, how='left', on=['GameKey', 'PlayID', 'GSISID'],\
                        sort='False')
players['concussed'] = np.where(players.Primary_Partner_GSISID.isnull(), 0, 1)

players = players.merge(revs, how='left', left_on=['GameKey', 'PlayID', 
            'GSISID'], right_on=['GameKey', 'PlayID', 
            'Primary_Partner_GSISID'], suffixes=("", "_dupe"), sort='False')
players['concussor'] = np.where(players.Primary_Partner_GSISID_dupe.isnull()
                        , 0, 1)

# add numbers and positions
playas = pd.read_csv('../input/NFL-Punt-Analytics-Competition/player_punt_data.csv')
playas_agg = playas.groupby('GSISID')['Number'].apply(' '.join).to_frame()
players = players.merge(playas_agg, on='GSISID', how='left')

drops = ['Primary_Partner_GSISID'] + players.columns[players.columns.str\
        .contains('dupe')].tolist()
players = players.drop(drops, axis=1).sort_values(['GSISID', 'GameKey', 
          'PlayID']).set_index('GSISID').reset_index()

players.to_parquet('players.parq')
players.to_csv('players.csv')

# Play-Level Data Prep

In [None]:
# join play level and game level data
plays = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_information.csv', 
                    index_col=['GameKey', 'PlayID'])

games = pd.read_csv('../input/NFL-Punt-Analytics-Competition/game_data.csv', 
                    index_col=['GameKey'])
games.Temperature.fillna(-999, inplace=True)
plays_all = plays.join(games, rsuffix='_dupe', sort=False)

revs = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_review.csv', 
                   index_col=['GameKey', 'PlayID'])
revs.loc[revs.Primary_Partner_GSISID == 'Unclear', 'Primary_Partner_GSISID']\
             = np.nan
revs['Primary_Partner_GSISID'] = pd.to_numeric(revs.Primary_Partner_GSISID)
plays_all = plays_all.join(revs, rsuffix='_dupe2', sort=False)


# merge player numbers and positions for concussions
playernums = pd.read_csv('../input/NFL-Punt-Analytics-Competition/player_punt_data.csv')
playernums = playernums.groupby('GSISID').agg(' '.join) #combine dupes

plays_all = plays_all.reset_index().merge(playernums, how='left', on='GSISID', 
            sort=False)
plays_all = plays_all.merge(playernums, how='left', 
            left_on='Primary_Partner_GSISID', right_on='GSISID', 
            suffixes=("_player", "_partner"), sort=False)


# merge player level data for concussions
roles = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_player_role_data.csv')
roles_all = roles.merge(descrs, on='Role', how='left').drop('Season_Year', 
                        axis=1)

plays_all = plays_all.merge(roles_all, how='left', on=['GameKey', 'PlayID', 
            'GSISID'], sort=False)
plays_all = plays_all.merge(roles_all, how='left', left_on=['GameKey', 
            'PlayID', 'Primary_Partner_GSISID'], right_on=['GameKey', 
            'PlayID', 'GSISID'], suffixes=("_player", "_partner"), sort=False)

plays_all.set_index(['GameKey', 'PlayID'], inplace=True)


# merge aggregated player level data for all plays
roles_enc = pd.get_dummies(roles_all, columns=['Team', 'Area', 'Type'])
collist = list(roles_enc)[2:]
agglist = ['size', pd.Series.nunique] + (len(collist)-3) * ['sum']
aggdict = dict(zip(collist, agglist))


roles_agg = roles_enc.groupby(['GameKey', 'PlayID']).agg(aggdict)
roles_agg.columns = [r + '_agg' for r in roles_agg.columns]

plays_all = plays_all.join(roles_agg, rsuffix="_roles")


# make simple features
plays_all['yard_number'] = plays_all.YardLine.str.split().str[1].astype(int)
plays_all['dist_togoal'] = np.where(plays_all.Poss_Team == plays_all.YardLine\
            .str.split().str[0], plays_all.yard_number + 50, 
            plays_all.yard_number)
plays_all['Rec_team'] = np.where(plays_all.Poss_Team == plays_all.HomeTeamCode, 
             plays_all.VisitTeamCode, plays_all.HomeTeamCode)
plays_all['home_score'] = plays_all.Score_Home_Visiting.str.split(" - ")\
            .str[0].astype(int)
plays_all['visit_score'] = plays_all.Score_Home_Visiting.str.split(" - ")\
            .str[1].astype(int)
plays_all['concussion'] = np.where(plays_all.Primary_Impact_Type.isnull(), 
                                    0, 1)

# clean up 
drops = ['YardLine',
         'Play_Type',
         'Home_Team_Visit_Team',
         'Primary_Partner_GSISID',
         'Score_Home_Visiting']\
         + plays_all.columns[plays_all.columns.str.contains('dupe')].tolist()
plays_all.drop(drops, axis=1, inplace=True)

plays_all['GSISID_player'] = plays_all.GSISID_player.fillna(-99, 
                                downcast='infer')
plays_all['GSISID_partner'] = plays_all.GSISID_partner.fillna(-99, 
                                downcast='infer')

floatcols = plays_all.select_dtypes('float').columns
for f in floatcols:
    plays_all[f] = plays_all[f].fillna(-99).astype(int)

plays_all.fillna('unspecified', inplace=True)
plays_all.replace('SD', 'LAC', inplace=True, regex=True)
plays_all['Game_Date'] = pd.to_datetime(plays_all.Game_Date, format='%m/%d/%Y')

plays_all.sort_index(inplace=True)
plays_all.reset_index(inplace=True) #avoid mulit-index for parquet

plays_all.to_parquet('plays.parq')
plays_all.to_csv('plays.csv')

# NFL_ScrapeR Data Prep

In [None]:
# get 2009-2018 season play-by-play
nfl_scrape = pd.read_csv('../input/nfl-scraper/nfl_data_09_18.csv')
nfl_scrape.to_csv('nfl_scrapeR.csv')

# Additional Data Refinement and Charts

In the above data preparation cells, each cleansed dataset is written to a "CSV" in the output section. In some instances, the data required further cleansing; for example categorizing punt play outcomes. Though this task could be accomplished using Python programming, the team decided it was more efficient to handle this task in Excel or Microsoft Power BI.

> ***As a result, some of the charts or graphs contained in this Kernel are derived outside of the Kernel, but remain reproduceable. Any such charts or graphs can be found in their native format in the "ZIP" file submitted as required in the Project Submission Instructions. Since the goal of this project is "Applied Analysis", and not "Computer Programming", our team felt this method is justified.***

# Evidence #1 : Punts are on the Decline across the NFL

One of the criteria for evaluation is ensuring the integrity of the game. As evidenced by the past 10 years, "Punt" plays are becoming a less frequent facet of the game. As such, any rule changes impacting punting will affect a small percentage of the total plays -- thus minimally impacting the integrity of the game.

In [None]:
Image("../input/punt-data/raw numbers punts.JPG")

In [None]:
Image("../input/return1/punts_per_game.JPG")

> #  Average Punt Return Yards are also in Decline

> Even when controlling for less frequency in punts, the average punt return yards are also down. This could potentially be attributed to multiple causes -- such as an increase in the number of "Fair Catches".

In [None]:
Image("../input/punt-data/punt return yards.JPG")

In [None]:
Image("../input/return/avg_return_yards.JPG")

Despite decline, 

# Evidence #2 : Punt Play Outcomes - "No Play" Exceeds "Returns"
A valid concern on altering punt play rules, is that it may negatively impact a team's ability to "return" the punt. 

However, the table and chart below shows that in over 50% of cases, Punt plays result in an outcome other than a "Return".

 > > * For the purposes of analysis, a "No Play" is a punt which was "Out of Bounds", a "Fair Catch", or a "Touchback". A "return" is when the Punt Returner catches the ball and makes a football move, regardless of yardage gain or loss from the point of possession. It should be noted that "Penalties" may result in a re-kick -- however, because there is a stoppage in play and a reset, we will still consider these plays a "No-Play". One justifcation for this comes from Mathletics Chapter 18 which examines "What Makes NFL Teams Win?". In this chapter, the work of Bud Goode is examined who found that team efficiency is a major contributing factor to Win/Loss outcome. Further, because the number of plays is a constrained resource -- how teams use these plays is important. By wasting a play on a punt that results in "No Play" as defined in this analysis -- teams are  negatively impacting their efficiency. 

> > * To create this categorization, the "Play Description" field was parsed for specific key words related to the outcome. For example, if a punt was kicked out of bounds, the play description will include the key phrase "Out of Bounds". Plays that resulted in a penalty, contain the key word "PENALTY". Similar logic was used for the other categorizations to determine punt-play "outcomes". 

In [None]:
Image("../input/return2/returns_non_returns.JPG")

# Further, although Returns account for only 45% of Punt outcomes, they account for 68% of observed concussion cases in 2016 & 2017

> * As shown in the chart above, most punt outcomes do not result in a "Return". However, plays that do have a "Return" are also where most Punt related injuries take place. In the table below, the count of punt injuries by outcome type details this risk.
Although “Returns” account for only 45% of Punt outcomes, they account for 68% of observed concussion cases. Further, when broken down by player type – we see that Offensive Lineman have the highest rate of injury, followed by Gunners, then Returners; all players experienced some rate of concussion. Since concussions effect all players on the field, and are not contained to a single position – any rule change should focus on shifting outcomes away from “Returns” and not focus on a specific role.


In [None]:
Image("../input/return3/return_conc.JPG")

In [None]:
Image("../input/play-type-conc/conc_play_type.png")

# Evidence #3 : Injury Risk is Greater in the Pre-Season than Regular Season

The incidence rate of concussions in “Pre-Season” Games, compared to “Regular Season” Games is nearly double. 
While the primary driver of this difference can be debated – for example, Lack of Conditioning or “Rookie” players during Pre-Season – the important takeaway is that the Pre-Season is a higher risk.

In [None]:
Image("../input/return4/pre_reg.JPG")

# Rule Changes Section

# Rule Change #1: Change Touchback ball placement

> Current Rule: After a touchback, the team that has been awarded the touchback next snaps the ball at its 20-yard line from any point on or between the inbound lines, unless the touchback results from a free kick, in which case the ball shall be placed at the team’s 25-yard line.

> Proposed Rule:After a touchback, the team that has been awarded the touchback next snaps the ball at its 15-yard line from any point on or between the inbound lines, unless the touchback results from a free kick, in which case the ball shall be placed at the team’s 25-yard line.

> Justification: Punt plays which resulted in Touchbacks resulted in no concussions in any of the seasons examined. Because it has been shown that “Returns” are most likely to result in injury, this rule change seeks to incentivize non-return play outcomes (“No Play”). By incentivizing the kicking team with better field position upon the change of possession as a result of a Touchback, kicking teams will be more likely to “kick for touch”. This phenomenon is observable in “Kickoff” strategy.

> Game Integrity and Enforcement: As demonstrated in our evidence, Touchbacks account for 6% of punt outcomes. Therefore, shifting the distribution of existing punt outcomes toward “safer” features has minimal impact on game integrity, given that no “new” features are introduced. The rule is easy to enforce, since Officials currently spot the ball after the Touchback; this simply changes the spot of the ball.


# Rule Change #2 : Advancement on a Fair catch

> Current Rule: After a fair catch is made, or is awarded as the result of fair-catch interference, the receiving team has the option of putting the ball in play by either:
*  fair-catch kick (drop kick or placekick without a tee) from the spot of the catch (or the succeeding spot after enforcement of any applicable penalties) 
*  snap from the spot of the catch (or the succeeding spot after enforcement of any applicable penalties).

> Proposed Rule: After a fair catch is made, or is awarded as the result of fair-catch interference, the receiving team has the option of putting the ball in play by either:
*  fair-catch kick (drop kick or placekick without a tee) from the spot of the catch (or the succeeding spot after enforcement of any applicable penalties) 
*  snap from 10-yards beyond the spot of the catch (or the succeeding spot after enforcement of any applicable penalties).

> Justification: Punt plays which resulted in a Fair Catch resulted in one concussion in the seasons examined; far safer than “Returning” the punt. As demonstrated in our “Key Findings” the average punt return is 7.6 yards. By providing the receiving team with yardage greater than their expected return, more teams may call for a Fair Catch; reducing the number of returns in favor of a safer alternative. Another benefit is that decreasing Returns provides increased safety for all athletes on the field, regardless of position.

> Game Integrity and Enforcement: Fair Catches are already the second most likely outcome of a Punt play (22%). Increasing their occurrence in pursuit of player safety will not effect the integrity of the game, as they are already a common feature. Enforcing this change is simple, because Officials already spot the ball after a fair catch – this simply changes the placement


# Rule Change #3 : Eliminate punt in preseason

> Current Rule: Team A may attempt a punt, drop kick, or placekick from on or behind the line of scrimmage.

> Proposed Rule: Team A may attempt a punt, drop kick, or placekick from on or behind the line of scrimmage.
* During Preseason, Punt attempts are prohibited. In place of a Punt, Team A may elect to have the ball spotted as if a Touchback has occurred.

> Justification: Preseason punt formations have a concussion rate almost twice that of the Regular season as demonstrated in our Key Findings. Due to the extreme differences in concussions, there is no basis for its inclusion in the preseason if player safety is the primary goal. 

> Game Integrity and Enforcement: Removing an element of the game is the most impactful change to the game’s integrity our team could make. However, the preseason is not itself integral to the game, as evidenced by the NFL-PA’s proposal to eliminate the preseason altogether. Therefore, removing an element from the game in the preseason only, will have minimal impact to the game as a whole and is an acceptable trade-off in pursuit of player safety. Enforcement is simple, as Punts will be eliminated altogether from any preseason game.

# References

*  https://operations.nfl.com/stats-central/chart-the-data/
*  https://operations.nfl.com/the-rules/2020-nfl-rulebook/
*  https://operations.nfl.com/the-rules/evolution-of-the-nfl-rules/
* https://www.kaggle.com/jpmiller/nfl-data-preparation