# NFL Punt Analytics Competition - Starter EDA

![punt](https://nbcprofootballtalk.files.wordpress.com/2018/12/gettyimages-1067771316-e1543962247286.jpg?w=560&h=316&crop=1)

In this competition we are tasked with analzying punt plays for player safety and proposing rule changes.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pylab as plt
import os
plt.style.use('ggplot')

# For interactive plots
from plotly import offline
import plotly.graph_objs as go


pd.set_option('max.columns', None)
offline.init_notebook_mode()
config = dict(showLink=False)

## Data Overview
- **Game Data**: Game level data that specifies the type of season (pre, reg, post), week, and hosting city and team. Each game is uniquely identified across all seasons
using `GameKey`.
- **Play Information**: Play level data that describes the type of play, possession team, score and a brief narrative of each play. Plays are uniquely identified using a
its `PlayID` along with the corresponding `GameKey`. `PlayIDs` are not unique.
- **Player Punt Data**: Player level data that specifies the traditional football position for each player. Each player is identified using his `GSISID`.
- **Play Player Role Data**: Play and player level data that specifies a punt specific player role. This dataset will specify each player that played in each play. A
player’s role in a play is uniquely defined by the `Gamekey` `PlayID` and `GSISID`.
- **Video Review**: Injury level data that provides a detailed description of the concussion-producing event. Video Review data are only available in cases in which the
injury play can be identified. Each video review case can be identified using a combination of `GameKey`, `PlayID`, and `GSISID`. A brief narrative of the play events is
provided.
- **NGS**: Next Gen Stats – player level data that describes the movement of each player during a play. NGS data is processed by BIOCORE to produce relevant speed and
direction data. The NGS data is identified using `GameKey`, `PlayID`, and `GSISID`. Player data for each play is provided as a function of time (Time) for the duration of
the play.

![](https://storage.googleapis.com/kaggle-media/competitions/NFL%20player%20safety%20analytics/key_variables.jpg)

## Lets explore the smaller datasets

In [None]:
# Read the input data
ppd = pd.read_csv('../input/player_punt_data.csv')
gd = pd.read_csv('../input/game_data.csv')
pprd = pd.read_csv('../input/play_player_role_data.csv')
vr = pd.read_csv('../input/video_review.csv')
vfi = pd.read_csv('../input/video_footage-injury.csv')
pi = pd.read_csv('../input/play_information.csv')

# Game Data `gd` for 666 games

* Season_Year: NFL Season (YYYY)
* Season_Type: Preseason (Pre), Regular season (Reg), Post season (Post)
* GameKey: Numeric game identifier, unique across seasons (####)
* Game_Date: Date of game (MM/DD/YYYY)
* Week: Week number, defined within Season_Type (##)
* Game_Day: Day of the week (text)
* Game_Site: Hosting city (text)
* Start_Time: Time at start of game (24:00:00)
* Home_Team: Long format of hosting team (text)
* HomeTeamCode Team abbreviation (ABC)
* Visit_Team: Long format of visiting team (text)
* VisitTeamCode: Team abbreviation (ABC)
* Stadium: Stadium Name (text)
* StadiumType: Indoor, outdoor or closure type - semi categorical (text)
* Turf: Natural or type of synthetic turf (text)
* GameWeather: Description of indoor weather, same as "OutdoorWeather" for open stadiums (text)
* Temperature: Stadium temperature (Degrees F)
* OutdoorWeather: Description of outdoor weather (free-form text)

In [None]:
gd.head()

In [None]:
gd.plot(kind='scatter', x='Week', y='Temperature', figsize=(15, 5), title='NFL Game Data Week vs. Temperature')
plt.show()

In [None]:
gd['count'] = 1
gd.groupby('Turf') \
    .count()[['count']] \
    .sort_values('count', ascending=False) \
    .plot(kind='bar', figsize=(15, 5), rot=85, title='Count of Games by Turf Type')
plt.show()

In [None]:
gd.groupby('Start_Time') \
    .count()[['count']] \
    .plot(kind='bar', figsize=(15, 5), rot=85, title='Count of Games by Start Time', color='g')
plt.show()

# Video Review Data (*concussion-producing event*)
- 37 Video Review plays
- 6 Have Friendly Fire, 28 don't, and 1 it's unclear 

In [None]:
vr.shape

In [None]:
vr.head()

In [None]:
vr['count'] = 1
vr.groupby('Player_Activity_Derived') \
    .count()[['count']] \
    .sort_values('count', ascending=False) \
    .plot(kind='barh', figsize=(15, 5), title='Count of Player Activity Derived')

plt.show()

In [None]:
vr['count'] = 1
vr.groupby('Primary_Partner_Activity_Derived') \
    .count()[['count']] \
    .sort_values('count', ascending=False) \
    .plot(kind='barh', figsize=(15, 5), title='Count of Primary Partner Activity Derived', color='g')

plt.show()

In [None]:
vr['count'] = 1
vr.groupby('Primary_Impact_Type') \
    .count()[['count']] \
    .sort_values('count', ascending=False) \
    .plot(kind='barh', figsize=(15, 5), title='Count of Primary Impact Type', color='b')
plt.show()

# Play information Dataset
- 6681 Plays with information
- Team names and Score
- Position on the field
- Time of play
- All plays are punts

In [None]:
pi.head()

In [None]:
pi['count'] = 1
pi.groupby('Poss_Team').count()[['count']] \
    .sort_values('count', ascending=False) \
    .plot(kind='bar', figsize=(15, 5), title='Count of punts per team', color='k')
plt.show()

In [None]:
# They are all punts!!! :D
pi['Play_Type'].unique()

# Using @crawford 's starter code
## Using provided starter code from here: https://www.kaggle.com/crawford/nfl-punt-analytics-starter-kernel

In [None]:
# Loading and plotting functions

def load_plays_for_game(GameKey):
    """
    Returns a dataframe of play data for a given game (GameKey)
    """
    play_information = pd.read_csv('../input/play_information.csv')
    play_information = play_information[play_information['GameKey'] == GameKey]
    return play_information


def load_game_and_ngs(ngs_file=None, GameKey=None):
    """
    Returns a dataframe of player movements (NGS data) for a given game
    """
    if ngs_file is None:
        print("Specifiy an NGS file.")
        return None
    if GameKey is None:
        print('Specify a GameKey')
        return None
    # Merge play data with NGS data    
    plays = load_plays_for_game(GameKey)
    ngs = pd.read_csv(ngs_file, low_memory=False)
    merged = pd.merge(ngs, plays, how="inner", on=["GameKey", "PlayID", "Season_Year"])
    return merged


def plot_play(game_df, PlayID, player1=None, player2=None, custom_layout=False):
    """
    Plots player movements on the field for a given game, play, and two players
    """
    game_df = game_df[game_df.PlayID==PlayID]
    
    GameKey=str(pd.unique(game_df.GameKey)[0])
    HomeTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[0]
    VisitingTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[1]
    YardLine = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)]['YardLine'].iloc[0]
    
    traces=[]   
    if (player1 is not None) & (player2 is not None):
        game_df = game_df[ (game_df['GSISID']==player1) | (game_df['GSISID']==player2)]
        for player in pd.unique(game_df.GSISID):
            player = int(player)
            trace = go.Scatter(
                x = game_df[game_df.GSISID==player].x,
                y = game_df[game_df.GSISID==player].y,
                name='GSISID '+str(player),
                mode='markers'
            )
            traces.append(trace)
    else:
        print("Specify GSISIDs for player1 and player2")
        return None
    
    if custom_layout is not True:
        layout = load_layout()
        layout['title'] =  HomeTeam + \
        ' vs. ' + VisitingTeam + \
        '<br>Possession: ' + \
        YardLine.split(" ")[0] +'@'+YardLine.split(" ")[1]
    data = traces
    fig = dict(data=data, layout=layout)
    play_description = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)].iloc[0]["PlayDescription"]
    print("\n\n\t",play_description)
    offline.iplot(fig, config=config)
    
def load_layout():
    """
    Returns a dict for a Football themed Plot.ly layout 
    """
    layout = dict(
        title = "Player Activity",
        plot_bgcolor='darkseagreen',
        showlegend=True,
        xaxis=dict(
            autorange=False,
            range=[0, 120],
            showgrid=False,
            zeroline=False,
            showline=True,
            linecolor='black',
            linewidth=1,
            mirror=True,
            ticks='',
            tickmode='array',
            tickvals=[10,20, 30, 40, 50, 60, 70, 80, 90, 100, 110],
            ticktext=['Goal', 10, 20, 30, 40, 50, 40, 30, 20, 10, 'Goal'],
            showticklabels=True
        ),
        yaxis=dict(
            title='',
            autorange=False,
            range=[-3.3,56.3],
            showgrid=False,
            zeroline=False,
            showline=True,
            linecolor='black',
            linewidth=1,
            mirror=True,
            ticks='',
            showticklabels=False
        ),
        shapes=[
            dict(
                type='line',
                layer='below',
                x0=0,
                y0=0,
                x1=120,
                y1=0,
                line=dict(
                    color='white',
                    width=2
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=0,
                y0=53.3,
                x1=120,
                y1=53.3,
                line=dict(
                    color='white',
                    width=2
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=10,
                y0=0,
                x1=10,
                y1=53.3,
                line=dict(
                    color='white',
                    width=10
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=20,
                y0=0,
                x1=20,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=30,
                y0=0,
                x1=30,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=40,
                y0=0,
                x1=40,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=50,
                y0=0,
                x1=50,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=60,
                y0=0,
                x1=60,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=70,
                y0=0,
                x1=70,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=80,
                y0=0,
                x1=80,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=90,
                y0=0,
                x1=90,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),dict(
                type='line',
                layer='below',
                x0=100,
                y0=0,
                x1=100,
                y1=53.3,
                line=dict(
                    color='white'
                )
            ),
            dict(
                type='line',
                layer='below',
                x0=110,
                y0=0,
                x1=110,
                y1=53.3,
                line=dict(
                    color='white',
                    width=10
                )
            )
        ]
    )
    return layout

layout = load_layout()

In [None]:
# Load the movements of players in GameKey 280. 
game280 = load_game_and_ngs('../input/NGS-2016-reg-wk13-17.csv',GameKey=280)

In [None]:
# Plot a single play, with two players
plot_play(game_df=game280, PlayID=2918, player1=32864, player2=32725)

## Try modifying to see all player in a play

In [None]:
def plot_play_all_players(game_df, PlayID, custom_layout=False):
    """
    Plots player movements on the field for a given game, play, and two players
    """
    game_df = game_df[game_df.PlayID==PlayID]
    
    GameKey=str(pd.unique(game_df.GameKey)[0])
    HomeTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[0]
    VisitingTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[1]
    player1 = game_df[(game_df.PlayID==PlayID)]['GSISID'].values[0]
    YardLine = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)]['YardLine'].iloc[0]
    
    traces=[]   
    for player in pd.unique(game_df.GSISID):
        player = int(player)
        trace = go.Scatter(
            x = game_df[game_df.GSISID==player].x,
            y = game_df[game_df.GSISID==player].y,
            name='GSISID '+str(player),
            mode='markers'
        )
        traces.append(trace)
    if custom_layout is not True:
        layout = load_layout()
        layout['title'] =  HomeTeam + \
        ' vs. ' + VisitingTeam + \
        '<br>Possession: ' + \
        YardLine.split(" ")[0] +'@'+YardLine.split(" ")[1]
    data = traces
    fig = dict(data=data, layout=layout)
    play_description = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)].iloc[0]["PlayDescription"]
    print("\n\n\t",play_description)
    offline.iplot(fig, config=config)

In [None]:
plot_play_all_players(game_df=game280, PlayID=2918)

A few notes after plotting all players:
- It looks like anyone to stepped on the field during the time of the play is picked up. 
- This could just be players who were close on the sideline, or players who are coming on for the next play?
- I think our main concern would be with the players most likely to take a high impact or the players injured.

## Add player position to plots

In [None]:
pprd.head()

In [None]:

# inputs
custom_layout = False
game_df=game280
PlayID=2918


# Function code
game_df = game_df[game_df.PlayID==PlayID]

GameKey=str(pd.unique(game_df.GameKey)[0])
HomeTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[0]
VisitingTeam = pd.unique(game_df.Home_Team_Visit_Team)[0].split("-")[1]
player1 = game_df[(game_df.PlayID==PlayID)]['GSISID'].values[0]
YardLine = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)]['YardLine'].iloc[0]

traces=[]   
for player in pd.unique(game_df.GSISID):
    player = int(player)
    trace = go.Scatter(
        x = game_df[game_df.GSISID==player].x,
        y = game_df[game_df.GSISID==player].y,
        name='GSISID '+str(player),
        mode='markers'
    )
    traces.append(trace)
if custom_layout is not True:
    layout = load_layout()
    layout['title'] =  HomeTeam + \
    ' vs. ' + VisitingTeam + \
    '<br>Possession: ' + \
    YardLine.split(" ")[0] +'@'+YardLine.split(" ")[1]
data = traces
fig = dict(data=data, layout=layout)
play_description = game_df[(game_df.PlayID==PlayID) & (game_df.GSISID==player1)].iloc[0]["PlayDescription"]
print("\n\n\t",play_description)
offline.iplot(fig, config=config)

## Player Role Data

In [None]:
pprd.shape

In [None]:
pprd.head()