# Summary

The goal of this project was to predict the likelihood of injury for a given player in the NFL. The injuries per game, player statistics per game, and team statistics per game data were gathered from the Sportsradar US API. After importing deconding the JSON objects, the data was assembled into dataframes and the dependent variables (outputs) were added to the player stats dataframe. The data was first modeled using a logistics regression.

# Importing the Data

A series of functions was written to call the API and unwrap the nested dictionaries from the JSON responses.

This code is written for importing only the defense statics only, other statistics will be imported and modeled later (ran out of API calls for given timeperiod).

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import NFL_api
import time
import pickle

In [3]:


season_type = 'REG'
year = 2015
weeks = [1, 2, 3, 4, 5, 6, 7, 8]
injuries_df = pd.DataFrame()
stat_keys = ['defense']
player_stat = pd.DataFrame()
team_stat = pd.DataFrame()

for i in weeks:
    sched = NFL_api.api_get_weekly_schedule(year, season_type, i)
    sched = NFL_api.flatten_schedule(sched)
    time.sleep(1)
    
    
    for n in range(0,len(sched)):
        home_team = sched.ix[n]['home']
        away_team = sched.ix[n]['away']
        injuries = NFL_api.api_get_injuries(year, season_type, i, away_team, home_team)
        injuries = NFL_api.flatten_injuries(injuries)
        injuries['week_num']=i
        injuries['year_num']=year
        injuries_df = injuries_df.append(injuries)
        time.sleep(1)
        
        for m in stat_keys:
            stat = NFL_api.api_get_stats(year, season_type, i, away_team, home_team)
            player, team = NFL_api.flatten_game_stats(stat, m)
            time.sleep(1)
            player['week_num']=i
            team['week_num']=i
            player['year_num']=year
            team['year_num']=year
            player_stat = player_stat.append(player)
            team_stat = team_stat.append(team)

f = open('player_def_2015_1-8.pickle', 'wb')
pickle.dump(player_stat, f)
f.close()

g = open('team_def_2015_1-8.pickle', 'wb')
pickle.dump(team_stat, g)
g.close()

h = open('injuries_2015_1-8.pickle', 'wb')
pickle.dump(injuries_df, h)
h.close()

# Adding a dependent variable to be modeled

This code was written to combine the saved player statistics with the injury data. Initially tried just using True/False declaration of injury as the dependent variable, but working on a more intelligent way to go about this.

In [2]:
f= open('player_def_2015_1-4.pickle')
player_stat = pickle.load(f)
f.close()

g=open('team_def_2015_1-4.pickle')
team_stat = pickle.load(g)
g.close()

h=open('injuries_2015_1-4.pickle')
injuries_df = pickle.load(h)
h.close()

The current player stats consist of 2515 entries for weeks 1-4 of the 2015 season. There are 36 possible features for the defense.

In [3]:
player_stat.keys()

Index([       u'game_id',      u'scheduled',            u'ast',
                   u'bk',           u'comb',      u'force_fum',
              u'fum_rec',         u'fum_td',             u'id',
                  u'int',         u'int_lg',         u'int_td',
              u'int_yds',         u'jersey',         u'market',
             u'misc_ast',      u'misc_comb', u'misc_force_fum',
         u'misc_fum_rec',    u'misc_tackle',           u'name',
                   u'pd',       u'position',             u'qh',
                 u'sack',       u'sack_yds',           u'sfty',
             u'sfty_1pt',         u'sp_ast',        u'sp_comb',
         u'sp_force_fum',     u'sp_fum_rec',      u'sp_tackle',
               u'tackle',      u'team_name',          u'tlost'],
      dtype='object')

To create a simple dependent variable to try to model, the code simply asks the question whether or not a player's name is in the injury dataset.

In [8]:
injured = player_stat['name'].isin(injuries_df['name_full'])

# Modeling the data

The first attempt to model the data was using a logisitic regression on the number of tackles and sacks by a player

In [10]:
from sklearn import linear_model
from sklearn import cross_validation

In [14]:
x_train, x_test, y_train, y_test = cross_validation.train_test_split(player_stat[['tackle', 'sack']], injured, train_size = 0.8, random_state=1)

rgr = linear_model.LogisticRegression()
rgr.fit(x_train, y_train)
print "The accuracy of the classifier is: %f" %rgr.score(x_test, y_test)

The accuracy of the classifier is: 0.675944


# Conclusions

* The classifier as it stands is not bad
* However, need to consider if I am using the correct dependent variable
* Also need to add more data for the offense, previous seasons, etc.
* Will also try more sophisticated models, more features, etc.