# Who should you take in the NFL draft? - QB Edition

## Quarterback

In this notebook I attempt at predicting who will be a probowl quarterback based on Combine metrics. Let's see if it works..

In [1]:
import pandas as pd
import numpy as np

from sklearn.cross_validation import cross_val_score
from sklearn import linear_model, ensemble, decomposition
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
sns.set()

from imblearn.over_sampling import SMOTE

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

%config InlineBackend.figure_format='retina'
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)

In [2]:
df = pd.read_csv('/Users/richard/data/NFL.csv', index_col='idx')

In [10]:
df_qb = df[df['Pos'] == 'QB']
#df_qb = df_qb.drop(['BenchReps'], 1).dropna() # most QBs don't do benchreps!
feature_cols = ['Wt', '40YD', 'Vertical', 'Broad Jump', '3Cone', 'Shuttle', 'Height_inches']

In [28]:
df.groupby('School').size()

School
Abilene Christian       5
Akron                   3
Alabama                68
Alabama State           3
Alabama-Birmingham      6
Alcorn State            1
Appalachian State       5
Arizona                28
Arizona State          42
Arkansas               39
Arkansas State         10
Arkansas-Pine Bluff     3
Army                    1
Auburn                 49
BYU                    26
Ball State              4
Baylor                 18
Bethune-Cookman         1
Bloomsburg              1
Boise State            23
Boston College         31
Bowling Green           3
Brown                   1
Buffalo                 4
Cal Poly                4
California             47
California-Davis        3
Central Arkansas        1
Central Florida        20
Central Michigan        7
                       ..
Towson                  3
Troy                   12
Tulane                  8
Tulsa                   2
Tusculum                1
Tuskegee                1
UCLA                   34
UNLV 

It turns out that quarterbacks hardly ever do benchreps at the Combine! And also how many total players we have data on that made it to the probowl:

In [9]:
df.groupby('Pos').size()

Pos
C      101
CB     409
DE     319
DT     318
FB      78
ILB    180
K        2
LS       7
OLB    304
OT     281
P       46
QB     168
RB     339
SS     147
TE     221
WR     502
dtype: int64

13 players out of our 90 made it to the probowl. Holy smokes batman I don't think we have enough data!

In [83]:
cutoff_year = 2010

df_train   = df[df['Year'] < cutoff_year]
df_test = df[df['Year'] >= cutoff_year]

X_train = df_train[feature_cols]
y_train = df_train.Probowl

X_test = df_test[feature_cols]
y_test = df_test.Probowl

print(len(y_train), len(y_test))

789 624


In [84]:
1 - df_train.groupby('Probowl').size()[1]/df_train.groupby('Probowl').size()[0]

0.8581765557163531

In [87]:
lr = linear_model.LogisticRegression()
rf = ensemble.RandomForestClassifier(n_jobs=8)
scores = cross_val_score(rf, X_train, y_train, cv = 10, scoring='accuracy')
print(np.round(scores,2))

[ 0.88  0.87  0.87  0.86  0.89  0.89  0.89  0.87  0.87  0.9 ]


In [88]:
rf.fit(X_train, y_train)


RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=8,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [90]:
arr = np.zeros((len(y_test),4))

arr[:,0] = np.array(np.round(rf.predict_proba(X_test)[:,0],2))
arr[:,1] = np.array(np.round(rf.predict_proba(X_test)[:,1],2))
arr[:,2] = np.array(rf.predict(X_test))
arr[:,3] = np.array(y_test)

results = pd.DataFrame(arr, columns=['non probowl prob', 'probowl prob', 'prediction', 'actual'])

results.sort('actual',ascending=False)

Unnamed: 0,non probowl prob,probowl prob,prediction,actual
623,1.0,0.0,0.0,1.0
488,0.9,0.1,0.0,1.0
235,0.9,0.1,0.0,1.0
237,1.0,0.0,0.0,1.0
49,0.8,0.2,0.0,1.0
95,0.9,0.1,0.0,1.0
209,0.7,0.3,0.0,1.0
369,1.0,0.0,0.0,1.0
240,0.8,0.2,0.0,1.0
552,0.7,0.3,0.0,1.0
