![](http://www.google.com/url?sa=i&source=images&cd=&ved=2ahUKEwim7I3IzeDmAhVtmuAKHd_OAKkQjRx6BAgBEAQ&url=https%3A%2F%2Fwww.cnbc.com%2F2019%2F12%2F30%2Fnfl-ratings-recovering-new-media-deals-could-be-on-the-2020-agenda.html&psig=AOvVaw3exy8gLUZLNp789PHRSqAo&ust=1577906387273502)

Football is one of the most popular sports worldwide. The frequency of football injuries is estimated to be approximately 10 to 35 per 1000 playing hours. The majority of injuries occur in the lower extremities, mainly in the knees and ankles. Considering the health and financial consequences of injury, an extensive data analysis to inspect the risk factors and reduce the incidence of injuries is urgently required. For this reason, an analysis of intrinsic (person-related) and extrinsic (environment-related) risk factors was undertaken based on a review of the datasets provided by NFL.

***Table of Contents***
* 1.1 Import Libraries
* 1.2 Read Data
      Injuries Record
      Play List
      Player Track
* 2.1 Characterize Movement
      Metrics of Speed
      Directional Change
      Acceleration/Deceleration
* 2.2 Identify Variables
      Speed Analysis
      Direction Analysis
      Spatial Analysis
* 2.3 Exploratory Data Analysis



**1.1 Import Libraries** 

In [None]:
import pandas as pd
import numpy as np
import math 
import seaborn as sns
import matplotlib.pyplot as plt

**1.2 Read Data**

In [None]:
playlist = pd.read_csv('../input/nfl-playing-surface-analytics/PlayList.csv')
playertrack= pd.read_csv('../input/nfl-playing-surface-analytics/PlayerTrackData.csv')
injuries= pd.read_csv('../input/nfl-playing-surface-analytics/InjuryRecord.csv')

**Injuries Record**

Information about Injuries incurred. 
PlayerKey, GameID (PlayerKey-X), PlayKey (PlayerKey-GameID-X).

BodyPart, Surface: character string

DM: one-hot encoding indicating days missed.




In [None]:
injuries.head()

**Play List**

The play file contains information about each player-play in the dataset, to include the player’s assigned roster position, stadium type, field type, weather, play type, position for the play, and position group.

PlayerDay: an integer sequence reflects timeline of a player's participation in games.

PlayerGames: Identify player's games.

PlayerGamePlay: running counts.


In [None]:
playlist.head()

**Player Track**

The player track file in .csv format includes player position, direction, and orientation data for each player during the entire course of the play collected using the Next Gen Stats (NGS) system. This data is indexed by PlayKey (which includes information about the player and game), with the time variable providing a temporal index within an individual play.

Note that the orientation variable should not be considered to be a reliable indicator of the actual direction a player is facing. The records for this study come from multiple seasons of the NFL during which different systems were used to calculate and record a player’s orientation. The orientation variable can be used to characterize how much a player is turning or pivoting during the course of a play. The “geography” of the direction variable does remain consistent across the study horizon.

In [None]:
playertrack.head()

**2.1 Characterize Movement**

Representation of player movement, including, but not limited to, the development of novel metrics that characterize player movement on the field:

*   Speed
*   Directional changes
*   Acceleration/Deceleration
*   Distance





In [None]:
playertrack.count()

**Metrics of Speed**

We want to redefine speed using Euclid Distance between two locations divided by time.

How many PlayKeys are there?

In [None]:
InjuryKeys = injuries['PlayKey'].unique()
len(injuries['PlayKey'].unique())

pt_filter = playertrack['PlayKey'].isin(InjuryKeys)
playertrack2 = playertrack[pt_filter]

In [None]:
PlayKeys = playertrack2['PlayKey'].unique()
len(playertrack2['PlayKey'].unique())

In [None]:
speed = []

for i in range(0,21904):
      s = math.sqrt((round((playertrack2.iloc[i+1]['x'] - playertrack2.iloc[i]['x'])*100,3))**2 + (round((playertrack2.iloc[i+1]['y'] - playertrack2.iloc[i]['y'])*100,3))**2)
      speed.append(s)
    
speed.append(1000)
speed2 = speed

for i in range(0,21904):
    if(( playertrack2.iloc[i]['PlayKey'] != playertrack2.iloc[i+1]['PlayKey'])):
      speed2[i] = 1000

playertrack2['speed'] = speed2

**Directional Changes**

We redefine directional change for each PlayKey.

In [None]:
dc = []

for j in range(0,21904):
      d = abs(playertrack2.iloc[j+1]['dir'] - playertrack2.iloc[j]['dir'])
      dc.append(d)

dc.append(1000)

for j in range(0,21904):
    if(( playertrack2.iloc[j]['PlayKey'] != playertrack2.iloc[j+1]['PlayKey'])):
      dc[j] = 1000
    
playertrack2['dc'] = dc

**Acceleration/Deceleration**

We redefine acceleration/deceleration using the speed variable we created.

In [None]:
ac = []
for k in range(0,21904):
  a = playertrack2.iloc[k+1]['speed'] - playertrack2.iloc[k]['speed']
  ac.append(a)

ac.append(1000)

for k in range(0,21904):
    if(( playertrack2.iloc[k]['PlayKey'] != playertrack2.iloc[k+1]['PlayKey'])):
      ac[k-1] = 1000
      ac[k] = 1000
        
playertrack2['ac'] = ac

In [None]:
playertrack2.head(10)

**2.2 Identify Variables**

Identification of specific variables that present an elevated risk of injury:

*   Are there specific movement patterns that correlate with the acute onset of injury (in general or by specific injury location)?
*   Are there summary metrics of player movement which influence risk of injury?
*   How do playing surface, game scenario, player movement, and weather interact to influence the risk of injury?

In [None]:
# Use seaborn style defaults and set the default figure size
sns.set(rc={'figure.figsize':(16, 5)})

In [None]:
pt_sp = playertrack2[playertrack2['speed']!=1000]

In [None]:
# We take a look at the maximum time of game.
pt_sp['time'].max()

**Speed Analysis**

In [None]:
ax = sns.lineplot(x="time", y="speed",
             data=pt_sp)
ax.set_ylabel('Speed')
ax.set_xlabel('time')
ax.set_title("Lineplot of Speed by Time")

It is apparent that there are more acceleration/decceleration in the time range 50-70 minutes. Then we want to join other datasets to inspect risk factors such as ground surface and weather.

In [None]:
pt_sp2 = pd.merge(pt_sp, injuries, on = "PlayKey")

In [None]:
pt_sp3 = pd.merge(pt_sp2, playlist, on = "PlayKey")

In [None]:
pt_sp3.head()

In [None]:
fig, axes = plt.subplots(1, 1, figsize=(16, 10), sharex=True)
ax = sns.boxplot(data=pt_sp3, x='PlayKey', y='speed',
            whis="range", hue="Surface",hue_order =["Natural", "Synthetic"], dodge=False)
ax.set_ylabel('Speed')
ax.set_xlabel('Players')
ax.set_title("Boxplot of Speed for Injuried Players by Surface")

In [None]:
pt_sp2.groupby('Surface').agg({"speed":['mean','max','min','var']})

For all the injuried players, it seems that there is no significant speed difference with respect to surface. However, there is more variation in speed for synthetic surface.

In [None]:
w = []

for wea in pt_sp3['Weather']:
  if (wea == "Sunny" or wea == "Clear and warm" or wea == "Clear Skies" or wea == "Clear skies" or wea == "Mostly Sunny"or wea == "Mostly sunny"or wea == "Clear"):
    w.append("Sunny")
  elif(wea == "Partly Cloudy" or wea == "Controlled Climate" or wea == "Cloudy" or wea == "Mostly cloudy" or wea == "Sun & clouds" or wea == "Coudy" or wea == "Cloudy and Cool" ):
    w.append("Cloudy")
  elif(wea == "Rain"or wea == "Light Rain"or wea == "Rain shower" or wea =="Cloudy with periods of rain, thunder possible. Winds shifting to WNW, 10-20 mph." or wea =="Fair" or wea =="Cloudy, 50% change of rain" ):
    w.append("Rain")
  elif(wea == "Indoors"or wea =="Indoor"):
    w.append("Indoors")
  elif(wea == "Cold"):
    w.append("Cold")
  else:
    w.append("NA")


In [None]:
pt_sp3['wea'] = w

In [None]:
fig, axes = plt.subplots(1, 1, figsize=(16, 10), sharex=True)
ax = sns.boxplot(data=pt_sp3, x='PlayKey', y='speed',
            whis="range", hue="wea", palette = "vlag",dodge=False)
ax.set_ylabel('Speed')
ax.set_xlabel('Players')
ax.set_title("Boxplot of Speed for Injuried Players by Weather")

There is no significant speed difference for injuried players with respect to weather conditions. Interestingly, there are more sunny and cloudy days than rainy days when players got injuried.

**Direction Analysis**

In [None]:
pt_sp_N = pt_sp3[pt_sp3['Surface']=="Natural"]
pt_sp_S = pt_sp3[pt_sp3['Surface']=="Synthetic"]

In [None]:
pt_sp_N['PlayKey'].unique()

In [None]:
p31070_3_7 = pt_sp3[pt_sp3['PlayKey'] == "31070-3-7"]
p33337_8_15 = pt_sp3[pt_sp3['PlayKey'] == "33337-8-15"]
p33474_19_7 = pt_sp3[pt_sp3['PlayKey'] == "33474-19-7"]
p34347_5_9 = pt_sp3[pt_sp3['PlayKey'] == "34347-5-9"]
p35570_15_35 = pt_sp3[pt_sp3['PlayKey'] == "35570-15-35"]
p36559_12_65 = pt_sp3[pt_sp3['PlayKey'] == "36559-12-65"]
p36621_13_58 = pt_sp3[pt_sp3['PlayKey'] == "36621-13-58"]
p38192_8_8 = pt_sp3[pt_sp3['PlayKey'] == "38192-8-8"]
p38876_29_14 = pt_sp3[pt_sp3['PlayKey'] == "38876-29-14"]
p39956_2_14 = pt_sp3[pt_sp3['PlayKey'] == "39956-2-14"]

In [None]:
court = plt.imread("../input/football2/football.jpg")

In [None]:
plt.figure(figsize=(15, 11.5))

# Plot the movemnts as scatter plot
# using a colormap to show change in game clock
plt.scatter(p31070_3_7.x, p31070_3_7.y, c=p31070_3_7.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p33337_8_15.x, p33337_8_15.y, c=p33337_8_15.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p33474_19_7.x, p33474_19_7.y, c=p33474_19_7.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p34347_5_9.x, p34347_5_9.y, c=p34347_5_9.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p35570_15_35.x, p35570_15_35.y, c=p35570_15_35.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p36559_12_65.x, p36559_12_65.y, c=p36559_12_65.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p36621_13_58.x, p36621_13_58.y, c=p36621_13_58.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p38192_8_8.x, p38192_8_8.y, c=p38192_8_8.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p38876_29_14.x, p38876_29_14.y, c=p38876_29_14.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p39956_2_14.x, p39956_2_14.y, c=p39956_2_14.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
# Darker colors represent moments earlier on in the game
cbar = plt.colorbar(orientation="horizontal")
cbar.ax.invert_xaxis()

plt.imshow(court, zorder=0, extent=[0,120,-10,60])

plt.show()

In [None]:
pt_sp_S['PlayKey'].unique()

In [None]:
p35611_7_42 = pt_sp3[pt_sp3['PlayKey'] == "35611-7-42"]
p36557_1_70 = pt_sp3[pt_sp3['PlayKey'] == "36557-1-70"]
p36607_16_19 = pt_sp3[pt_sp3['PlayKey'] == "36607-16-19"]
p38228_1_4 = pt_sp3[pt_sp3['PlayKey'] == "38228-1-4"]
p38364_5_23 = pt_sp3[pt_sp3['PlayKey'] == "38364-5-23"]
p39656_2_38 = pt_sp3[pt_sp3['PlayKey'] == "39656-2-38"]
p39678_2_1 = pt_sp3[pt_sp3['PlayKey'] == "39678-2-1"]
p39850_9_2 = pt_sp3[pt_sp3['PlayKey'] == "39850-9-2"]
p39873_4_32 = pt_sp3[pt_sp3['PlayKey'] == "39873-4-32"]
p40474_1_8 = pt_sp3[pt_sp3['PlayKey'] == "40474-1-8"]

In [None]:
plt.figure(figsize=(15, 11.5))

# Plot the movemnts as scatter plot
# using a colormap to show change in game clock
plt.scatter(p35611_7_42.x, p35611_7_42.y, c=p35611_7_42.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p36557_1_70.x, p36557_1_70.y, c=p36557_1_70.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p36607_16_19.x, p36607_16_19.y, c=p36607_16_19.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p38228_1_4.x, p38228_1_4.y, c=p38228_1_4.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p38364_5_23.x, p38364_5_23.y, c=p38364_5_23.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p39656_2_38.x, p39656_2_38.y, c=p39656_2_38.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p39678_2_1.x, p39678_2_1.y, c=p39678_2_1.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p39850_9_2.x, p39850_9_2.y, c=p39850_9_2.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p39873_4_32.x, p39873_4_32.y, c=p39873_4_32.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
plt.scatter(p40474_1_8.x, p40474_1_8.y, c=p40474_1_8.time,
            cmap=plt.cm.Blues, s=50, zorder=1)
# Darker colors represent moments earlier on in the game
cbar = plt.colorbar(orientation="horizontal")
cbar.ax.invert_xaxis()

plt.imshow(court, zorder=0, extent=[0,120,-10,60])

plt.show()

It seems that injuried players on natural surfaces are more likely to make turns/direction changes than on synthetic surfaces. However, whether the surface has an impact on players' movement need more investigation on the data.

In [None]:
p_largeDC = pt_sp3[pt_sp3['dc']>=120]
p_smallDC = pt_sp3[pt_sp3['dc']<120]

In [None]:
p_largeDC['speed'].mean()

In [None]:
p_smallDC['speed'].mean()

**Spatial Analysis** 

Where do injuried players play in the game?

In [None]:
sns.jointplot(x=pt_sp2["x"], y=pt_sp2["y"], kind='hex', marginal_kws=dict(bins=50, rug=True))

In [None]:
#sns.jointplot(x=df["sepal_length"], y=df["sepal_width"], kind='kde', color="grey", space=0)
 
# Huge space
sns.jointplot(x=pt_sp2["x"], y=pt_sp2["y"], kind='kde', color="grey", space=3)
 
# Make marginal bigger:
#sns.jointplot(x=df["sepal_length"], y=df["sepal_width"], kind='kde',ratio=1)


Injuried players are more clustered in the middle area of the playground.

**2.3 Exploratory Data Analysis**

In [None]:
ff = playlist['PlayKey'].isin(injuries['PlayKey'])
in_list = playlist[ff]
out_list = playlist[~ff]

In [None]:
in_list['injuried'] = ["Y"] * 76
out_list['injuried'] = ["N"] * 266929

In [None]:
merged = pd.concat([in_list, out_list])
merged.head()

In [None]:
w2 = []

for wea in merged['Weather']:
  if (wea == "Sunny" or wea == "Clear and warm" or wea == "Clear Skies" or wea == "Clear skies" or wea == "Mostly Sunny"or wea == "Mostly sunny"or wea == "Clear"):
    w2.append(1)
  elif(wea == "Partly Cloudy" or wea == "Controlled Climate" or wea == "Cloudy" or wea == "Mostly cloudy" or wea == "Sun & clouds" or wea == "Coudy" or wea == "Cloudy and Cool" ):
    w2.append(2)
  elif(wea == "Rain"or wea == "Light Rain"or wea == "Rain shower" or wea =="Cloudy with periods of rain, thunder possible. Winds shifting to WNW, 10-20 mph." or wea =="Fair" or wea =="Cloudy, 50% change of rain" ):
    w2.append(3)
  elif(wea == "Indoors"or wea =="Indoor"):
    w2.append(4)
  elif(wea == "Cold"):
    w2.append(5)
  else:
    w2.append(6)
    
merged['wea'] = w2

In [None]:
def hexbin(x, y, color, **kwargs):
    cmap = sns.light_palette(color, as_cmap=True)
    plt.hexbin(x, y, gridsize=10, cmap=cmap, **kwargs)

with sns.axes_style("dark"):
    g = sns.FacetGrid(merged, col="FieldType", height=4)
g.map(hexbin, "wea", "Temperature", extent=[0, 10, 0, 100]);

In [None]:
merged = merged[merged['Temperature']>0]

In [None]:
sns.catplot(x="wea", y="Temperature", hue="FieldType", kind="box", data=merged)

In [None]:
sns.catplot(x="FieldType", col="wea", col_wrap=6, kind="count", palette="ch:.25", data=merged)

It is interesting that usage of natural playgrounds is mostly on sunny and cloudy days with higher temperature. Usage of synthetic grounds is mostly on cloudy days with relatively cooler temperature.

In [None]:
sns.catplot(x="FieldType", col="wea", col_wrap=6, kind="count", palette="ch:.25", data=merged[merged['injuried']=="Y"])

Although the use of natural surface is a lot more than synthetic surface on sunny and cloudy days, there is more injuried on synthetic surface on cloudy days. This probably infer that synthetic surface has higher risk of injury on sunny and cloudy days.