- This analysis is for getting an idea of what's happening in the punts that result in a concussion. We'll look primarily at who is concussed and the primary partner of the conussed player.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from IPython.display import Image, display

## VIDEO SECTION
#### VIDEO REVIEW
- <b>Player Punt Data</b>: Player level data that specifies the traditional football position for each player. Each player is identified using his GSISID.
- <b>Play Player Role Data</b>: Play and player level data that specifies a punt specific player role. This dataset will specify each player that played in each play. A player’s role in a play is uniquely defined by the Gamekey PlayID and GSISID.
- <b>Video Review</b>: Injury level data that provides a detailed description of the concussion-producing event. Video Review data are only available in cases in which the injury play can be identified. Each video review case can be identified using a combination of GameKey, PlayID, and GSISID. A brief narrative of the play events is provided.
- <b>Primary Impact Type</b>: gives source of trauma; Unclear means exactly what you think
- <b>Primary_Partner_GSISID</b>: usual suspects?
    - If both the player and partner are concuseed, then each player will be listed as a player
- <b>37 Concussions</b> Identified out of 6681 punts = 0.5538%

In [None]:
player_df = pd.read_csv('../input/NFL-Punt-Analytics-Competition/player_punt_data.csv')
play_player_role_df = pd.read_csv('../input/NFL-Punt-Analytics-Competition/play_player_role_data.csv')
video_review_df = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_review.csv')
print(video_review_df.shape)

In [None]:
# Combine Relevant Player Information, Position, Role, Number
master_player_df = pd.merge(player_df, play_player_role_df,
                          how='inner',
                          on=['GSISID']).drop(columns=['Season_Year'])
master_player_df.head()

In [None]:
# Check summary counts to see if any are even needed
print(video_review_df['Player_Activity_Derived'].value_counts())
print('---')
print(video_review_df['Turnover_Related'].value_counts())
print('---')
print(video_review_df['Primary_Impact_Type'].value_counts())
print('---')
print(video_review_df['Primary_Partner_Activity_Derived'].value_counts())
print('---')
print(video_review_df['Friendly_Fire'].value_counts())

- Observations:
    - So we know things involve tackling and blocking (you need contact of some sort to have a concussion)
    - We see no plays were turnover related although play (GameKey: 274, PlayID: 3609) was turnover related by my idea of a turnover. Maybe they're classifying based off wether there was an actual turnover or if there was a fumble/muff/interception going on.
    - We can see majority of concussions are related to a helmet hitting some other mass minus the one 'unclear' instance
    - Majority of concussions are not a result of friendly fire
    - There are 4 plays that have no designated primary_partner: two were helmet-to-ground, 1 was unclear, and 1 was helmet to helmet (H2H).
        - If you review those videos two latter videos, the 'unclear' designation was a clear helmet to the back of the head of #44 by Houston Player #55/56 (not sure of number; could probably verify with GSISID from NGS data for the play) and for the H2H you can see Dolphins #34 make a diving head tackle on Baltimore #21.
            - This information you can deduce by looking at the play video later in the analysis. <b>I MIGHT REPLACE THESE NAN AND UNCLEAR LATER WITH THE CORRECT DATA IF IT SEEMS USEFUL FOR MY OWN ANALYSIS</b>

In [None]:
# Now that we have a descriptive idea of what's going on, I'm gonna just drop these columns
# and also clear up the unclear designation and convert it to NaN
droppers = ['Player_Activity_Derived', 'Turnover_Related', 'Primary_Partner_Activity_Derived', 
            'Friendly_Fire', 'Season_Year']
video_review_df.drop(columns=droppers, inplace=True)

# Remove 'Unclear' designation
video_review_df.loc[33, 'Primary_Partner_GSISID'] = 'NaN'

## VIDEO FOOTAGE

In [None]:
'''Control Video Footage'''
video_footage_control_df = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_footage-control.csv')
print(video_footage_control_df.shape)
video_footage_control_df.tail(1)

In [None]:
# # Use for printing out video links
# for i in range(len(video_footage_control_df)):
#     print(video_footage_control_df.loc[i, 'Preview Link'])

In [None]:
'''Concussion Video Footage'''
video_footage_injury_df = pd.read_csv('../input/NFL-Punt-Analytics-Competition/video_footage-injury.csv')
print(video_footage_injury_df.shape)
video_footage_injury_df.head(1)

In [None]:
# # Injury Video Links; search for player who is injured; watch the film, be the film....
# for i in range(len(injured_players)):
#     print(i, injured_players['PREVIEW LINK (5000K)'][i])

In [None]:
# Preprocess to allow easier join between video review data and the actual footage data
rename_columns = {'gamekey': 'GameKey', 'playid': 'PlayID', 'season': 'Season_Year'}
video_footage_injury_df.rename(columns=rename_columns, inplace=True)

# Combine Video Review and Video Injury DataFrames to have the injured player and partner player data
injury_play = pd.merge(video_review_df, video_footage_injury_df, 
                       how='inner', 
                       on=['GameKey', 'PlayID'])
injury_play.head(1)

In [None]:
# Lets Drop Some More Data I consider uncritical for getting a feel for the data
droppers = ['Season_Year', 'Type', 'Week', 'Home_team', 'Visit_Team', 'Qtr']
injury_play.drop(columns=droppers, inplace=True)

In [None]:
# Join Info (their jersey number, position, role) on Players Who are Injured
injured_players = pd.merge(injury_play, master_player_df,
                           how='inner',
                           on=['GSISID', 'GameKey', 'PlayID'])

print('Shape:', injured_players.shape)
injured_players.head(1)

- I get the concussed player's information to identify their jersey number in the video. With potentially multiple jersey numbers associated with a player, this was needed information to focus on what events led up to the concussion. I watched the film and cross referenced plays where it was unclear as to the players number with game recaps (googled) to identify the concussed players number.

In [None]:
# Drop Certain Rows After Identifying Concussed Players Jersey Number from video
drop_rows = [1, 7, 17, 20, 23, 26, 27, 28, 29, 32, 33, 36, 38, 39, 45, 46, 47, 50, 52, 55, 57]
injured_players.drop(labels=drop_rows, inplace=True)
injured_players.reset_index(drop=True, inplace=True)

## NGS DATA: CONCUSSION PLAY ANALYSIS
- <b>Next Gen Stats</b>:player level data that describes the movement of each player during a play. NGS data is processed by BIOCORE to produce relevant speed and direction data. The NGS data is identified using GameKey, PlayID, and GSISID. Player data for each play is provided as a function of time (Time) for the duration of the play.
- Players are recorded at every <b>10th of a second or 100 milliseconds</b>
- Field dimensions: 120 yards by 53.3 yards
- Speed can be calculated with Time and dis
- <b>Event</b> record pretty regular events

In [None]:
# Read in only concussion data (dataset formed in a different notebook)
# Contains NGS data for plays involving a concussion
ngs_concussion = pd.read_csv('../input/ngsconcussion/NGS-concussion.csv')
ngs_concussion.head()

In [None]:
# Convert Primary_Partner_GSISID from str to float
injured_players['Primary_Partner_GSISID'] = injured_players['Primary_Partner_GSISID'].astype('float')

### Reasoning for analyzing speeds
- Again this notebook is more to explore parts of the play. I wanted to understand what speeds the concussed player and the partner player were running at throughout the play. I care more about distributions than averages. Mostly to see severity of impact. I plot their routes with their corresponding speeds. I do not label each player because it's only used as reference to what's in the actual videos. I do not evaluate 'o' and 'dir' variables because previous analyses didn't seem to give any useful information to me. Maybe it helps if you're a biomechanical engineer.

In [None]:
from IPython.display import display, HTML

def make_html(game_key, play_id):
    return '<img src="{}" style="display:inline;margin:1px"/>'\
    .format('../input/ngsconcussion/' + str(game_key) + '_' + str(play_id) + '.gif')

In [None]:
# Map Routes of concussed player and partner player
# and give approximate speeds throughout their route
for i in range(len(injured_players)):
    # Get necessary values for query of NGS data
    game_key = injured_players.loc[i, 'GameKey']
    play_id = injured_players.loc[i, 'PlayID']
    concussed_id = injured_players.loc[i, 'GSISID']
    partner_id = injured_players.loc[i, 'Primary_Partner_GSISID']
    print('GameKey:', game_key, 'PlayID:', play_id)
    print('Play Description:', injured_players.loc[i,'PlayDescription'])
    print('Primary Impact Type:', injured_players.loc[i, 'Primary_Impact_Type'])
    print('Concussed:', concussed_id, 'Role:', injured_players.loc[i, 'Role'])
    print('Partner:', partner_id)
    # Visualizing play with .gif file
    display(HTML(''.join(make_html(game_key, play_id))))
    
    # Concussed player
    where_condition = (
        (ngs_concussion['GameKey'] == game_key)&\
        (ngs_concussion['PlayID'] == play_id) &\
        (ngs_concussion['GSISID'] == concussed_id))
    concussion = ngs_concussion[where_condition].copy()
    # Reorder by Time and reset index
    concussion.sort_values(by=['Time'], inplace=True)
    concussion.reset_index(drop=True, inplace=True)
    
    # Partner player
    where_condition = (
        (ngs_concussion['GameKey'] == game_key)&\
        (ngs_concussion['PlayID'] == play_id) &\
        (ngs_concussion['GSISID'] == partner_id))
    partner = ngs_concussion[where_condition].copy()
    partner.sort_values(by=['Time'], inplace=True)
    partner.reset_index(drop=True, inplace=True)

    # Variables for Mapping
    concussion_x = concussion['x']
    concussion_y = concussion['y']
    partner_x = partner['x']
    partner_y = partner['y']
    speed1 = concussion['dis'] / 0.1
    speed2 = partner['dis'] / 0.1
    
    # Mapping of play
    sns.set()
    plt.figure(figsize=(10,5))
    cmap = plt.get_cmap('coolwarm')
    plt.scatter(concussion_x, concussion_y, c=speed1, cmap=cmap, alpha=0.5)
    if partner_id != 'NaN':
        plt.scatter(partner_x, partner_y, c=speed2, cmap=cmap, alpha=0.5)
    plt.clim(0, 12)
    plt.colorbar(label='yards/sec')
    # Normal length of field is 120 yards
    plt.xlim(-10, 130)
    plt.xticks(np.arange(0, 130, step=10),
               ['End', 'Goal Line', '10', '20', '30', '40', '50', '40', '30', '20', '10', 'Goal Line', 'End'])
    # Normal width is 53.3 yards
    plt.ylim(-10, 65)
    plt.yticks(np.arange(0, 65, 53.3), ['Sideline', 'Sideline'])
    plt.title('Playing Field')
    plt.xlabel('yardline')
    plt.ylabel('width of field')
    plt.show()
    print('---')

- All mentions of a play are in this format: (GameKey, PlayId)
- Video link for play <b>(281, 1526)</b> is incorrect.
    - Play can be identified from times (13:52 to 14:03) of video (https://www.youtube.com/watch?v=NsBDbLWcLyM)
    - Looks like an illegal block above the waist (https://operations.nfl.com/the-rules/nfl-video-rulebook/illegal-block-above-the-waist/) that was not called
- Note: fastest times for 40 yard dash range from 4.22 - 4.30 seconds, which is equivalent to 9.30 - 9.48 yards/sec
    - Reference: https://en.wikipedia.org/wiki/40-yard_dash
- Some players are at moments reaching higher velocities than these amazing 40 yard sprinters average yards/sec. Just guessing there might be shoves in the back with some plays and maybe competition pushes your speeds a bit :P.
- Some plays result in high impact collisions others are at speeds similar to an average punt play (analysis in a different notebook for a different time). Just hoping this notebook might be useful to someone who likes to see player routes with speed. Cheers!

<img src="../input/ngsconcussion/149_3663.gif" width='250px'/>******
 

<img src="https://i.imgur.com/xgKeain.jpg" width="250px"/>[](http://)