# NFL 1st and Future 2019

tl;dr:

**In this challenge, you're tasked to investigate the relationship between the playing surface and the injury and performance of National Football League (NFL) athletes and to examine factors that may contribute to lower extremity injuries.**

Submissions will be judged by the NFL based on how well they address:
- Representation of player movement, including, but not limited to, the development of novel metrics that characterize player movement on the field:
- Identification of specific variables that present an elevated risk of injury:
- Evaluation of differences in player movement between playing surfaces:

Submissions will be scored using the following rubric:
- Creativity and Presentation (5 points)
- Methodology (5 points)
- Application (5 points)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import seaborn as sns
import matplotlib.patches as patches
sns.set_style("whitegrid")

# Data:
There are three files provided in the dataset, as described below:

1. **Injury Record:** The injury record file in .csv format contains information on 105 lower-limb injuries that occurred during regular season games over the two seasons. Injuries can be linked to specific records in a player history using the PlayerKey, GameID, and PlayKey fields.

2. **Play List:** – The play list file contains the details for the 267,005 player-plays that make up the dataset. Each play is indexed by PlayerKey, GameID, and PlayKey fields. Details about the game and play include the player’s assigned roster position, stadium type, field type, weather, play type, position for the play, and position group.

3. **Player Track Data:** player level data that describes the location, orientation, speed, and direction of each player during a play recorded at 10 Hz (i.e. 10 observations recorded per second).



In [None]:
playlist = pd.read_csv('../input/nfl-playing-surface-analytics/PlayList.csv')
inj = pd.read_csv('../input/nfl-playing-surface-analytics/InjuryRecord.csv')
trk = pd.read_csv('../input/nfl-playing-surface-analytics/PlayerTrackData.csv')

## Injury Data
- PlayerKey, GameId, PlayKey
- BodyPart
- Surface
- DM_M1, DM_M7, DM_28, DM_42 - One hot encoding the number of days missed for injury

In [None]:
inj.groupby('BodyPart').count()['PlayerKey'] \
    .sort_values() \
    .plot(kind='bar', figsize=(15, 5), title='Count of injuries by Body Part')
plt.show()

In [None]:
inj.groupby('Surface').count()['PlayerKey'] \
    .sort_values() \
    .plot(kind='bar', figsize=(15, 5), title='Count of injuries by Body Part')
plt.show()

In [None]:
inj_detailed.groupby('PlayType').count()['DM_M42'] \
    .sort_values() \
    .plot(figsize=(15, 5), kind='bar', title='Days missed by PlayType')
plt.show()

## Playlist Data

In [None]:
# Number of unique plays in the playlist dataset
playlist['PlayKey'].nunique()

In [None]:
playlist[['PlayKey','PlayType']].drop_duplicates() \
    .groupby('PlayType').count()['PlayKey'] \
    .sort_values() \
    .plot(kind='barh',
         figsize=(12, 6),
          color='black',
         title='Number of plays provided by type')
plt.show()

## Match Player info with injury data
- Only 77 link up the player info

In [None]:
inj_detailed = inj.merge(playlist)

In [None]:
inj_detailed.groupby('RosterPosition').count()['PlayerKey'] \
    .sort_values() \
    .plot(figsize=(15, 5), kind='bar', title='Injured Players by Position')
plt.show()

In [None]:
inj_detailed.groupby('PlayType').count()['BodyPart'] \
    .sort_values() \
    .plot(figsize=(15, 5), kind='bar', title='Injured Players by BodyPart')
plt.show()

In [None]:
inj_detailed.groupby('PlayType').count()['PlayerKey'] \
    .sort_values() \
    .plot(figsize=(15, 5), kind='bar', title='Injured Players by PlayType')
plt.show()

# Plotting Plays
Check out my kernel here where I provide a function for creating and plotting a NFL football field.

https://www.kaggle.com/robikscube/nfl-big-data-bowl-plotting-player-position

In [None]:
def create_football_field(linenumbers=True,
                          endzones=True,
                          highlight_line=False,
                          highlight_line_number=50,
                          highlighted_name='Line of Scrimmage',
                          fifty_is_los=False,
                          figsize=(12, 6.33)):
    """
    Function that plots the football field for viewing plays.
    Allows for showing or hiding endzones.
    """
    rect = patches.Rectangle((0, 0), 120, 53.3, linewidth=0.1,
                             edgecolor='r', facecolor='darkgreen', zorder=0)

    fig, ax = plt.subplots(1, figsize=figsize)
    ax.add_patch(rect)

    plt.plot([10, 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70, 80,
              80, 90, 90, 100, 100, 110, 110, 120, 0, 0, 120, 120],
             [0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3,
              53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 53.3, 0, 0, 53.3],
             color='white')
    if fifty_is_los:
        plt.plot([60, 60], [0, 53.3], color='gold')
        plt.text(62, 50, '<- Player Yardline at Snap', color='gold')
    # Endzones
    if endzones:
        ez1 = patches.Rectangle((0, 0), 10, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ez2 = patches.Rectangle((110, 0), 120, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ax.add_patch(ez1)
        ax.add_patch(ez2)
    plt.xlim(0, 120)
    plt.ylim(-5, 58.3)
    plt.axis('off')
    if linenumbers:
        for x in range(20, 110, 10):
            numb = x
            if x > 50:
                numb = 120 - x
            plt.text(x, 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white')
            plt.text(x - 0.95, 53.3 - 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white', rotation=180)
    if endzones:
        hash_range = range(11, 110)
    else:
        hash_range = range(1, 120)

    for x in hash_range:
        ax.plot([x, x], [0.4, 0.7], color='white')
        ax.plot([x, x], [53.0, 52.5], color='white')
        ax.plot([x, x], [22.91, 23.57], color='white')
        ax.plot([x, x], [29.73, 30.39], color='white')

    if highlight_line:
        hl = highlight_line_number + 10
        plt.plot([hl, hl], [0, 53.3], color='yellow')
        plt.text(hl + 2, 50, '<- {}'.format(highlighted_name),
                 color='yellow')
    return fig, ax

In [None]:
example_play_id = inj['PlayKey'].values[0]

In [None]:
trk.shape

## Plot path of injured player

In [None]:
fig, ax = create_football_field()
trk.query('PlayKey == @example_play_id').plot(kind='scatter', x='x', y='y', ax=ax, color='orange')
plt.show()

In [None]:
inj_play_list = inj['PlayKey'].tolist()

## Plotting every route of injured players
- Too much info to draw conclusions, but fun to plot for context.

In [None]:
# Loop through all 99 inj plays
fig, ax = create_football_field()
for playkey, inj_play in trk.query('PlayKey in @inj_play_list').groupby('PlayKey'):
    inj_play.plot(kind='scatter', x='x', y='y', ax=ax, color='orange', alpha=0.2)
plt.show()

## Plotting routes of some non-injured players

In [None]:
trk['PlayKey'].nunique()

In [None]:
trk.head()

In [None]:
# Loop through all 99 inj plays
fig, ax = create_football_field()
for playkey, inj_play in trk.query('PlayKey not in @inj_play_list').head(50000).groupby('PlayKey'):
    inj_play.plot(kind='scatter', x='x', y='y', ax=ax, color='red', alpha=0.2)
plt.show()

# Look at the top of each data file

In [None]:
inj.head()

In [None]:
trk.head()

In [None]:
playlist.head()