# Predicting shots made per game by Kobe Bryant

In this lab you'll be using regularization techniques Ridge, Lasso, and Elastic Net to try and predict well how many shots Kobe Bryant made per game in his career.

---

### 1. Load packages and data

In [44]:
import numpy as np
import pandas as pd
import patsy

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.cross_validation import cross_val_score

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')

from sklearn.preprocessing import MinMaxScaler, StandardScaler

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [38]:
kobe = pd.read_csv('/Users/ryandunlap/Desktop/DSI-SF-2/datasets/kobe_bryant/kobe_superwide_games.csv')

---

### 2. Examine the data

- How many columns are there?
- Infer what the observations (rows) and columns represent.
- Why is this data that regularization might be particularly useful for?

In [39]:
#Rows are the games
#Columns include all teams:year and shot types...
kobe.head()
print kobe.columns

Index([u'SHOTS_MADE', u'AWAY_GAME', u'SEASON_OPPONENT:atl:1996-97',
       u'SEASON_OPPONENT:atl:1997-98', u'SEASON_OPPONENT:atl:1999-00',
       u'SEASON_OPPONENT:atl:2000-01', u'SEASON_OPPONENT:atl:2001-02',
       u'SEASON_OPPONENT:atl:2002-03', u'SEASON_OPPONENT:atl:2003-04',
       u'SEASON_OPPONENT:atl:2004-05',
       ...
       u'ACTION_TYPE:tip_layup_shot', u'ACTION_TYPE:tip_shot',
       u'ACTION_TYPE:turnaround_bank_shot',
       u'ACTION_TYPE:turnaround_fadeaway_bank_jump_shot',
       u'ACTION_TYPE:turnaround_fadeaway_shot',
       u'ACTION_TYPE:turnaround_finger_roll_shot',
       u'ACTION_TYPE:turnaround_hook_shot',
       u'ACTION_TYPE:turnaround_jump_shot', u'SEASON_GAME_NUMBER',
       u'CAREER_GAME_NUMBER'],
      dtype='object', length=645)


---

### Make predictor and target variables. Normalize the predictors.

Why is normalization necessary for regularized regressions?

There is a class in sklearn.preprocessing called `StandardScaler`. Look it up and figure out how to use it to normalize your predictor matrix. 

In [48]:
y = kobe['SHOTS_MADE']
cols = [x for x in kobe.columns if x!= 'SHOTS_MADE']

X_raw = kobe.loc[:,cols]

scaler = StandardScaler()
scaled_data = scaler.fit_transform(X_raw)
X_stand = pd.DataFrame(scaled_data, columns=[cols])

X_stand.head()

Unnamed: 0,AWAY_GAME,SEASON_OPPONENT:atl:1996-97,SEASON_OPPONENT:atl:1997-98,SEASON_OPPONENT:atl:1999-00,SEASON_OPPONENT:atl:2000-01,SEASON_OPPONENT:atl:2001-02,SEASON_OPPONENT:atl:2002-03,SEASON_OPPONENT:atl:2003-04,SEASON_OPPONENT:atl:2004-05,SEASON_OPPONENT:atl:2005-06,...,ACTION_TYPE:tip_layup_shot,ACTION_TYPE:tip_shot,ACTION_TYPE:turnaround_bank_shot,ACTION_TYPE:turnaround_fadeaway_bank_jump_shot,ACTION_TYPE:turnaround_fadeaway_shot,ACTION_TYPE:turnaround_finger_roll_shot,ACTION_TYPE:turnaround_hook_shot,ACTION_TYPE:turnaround_jump_shot,SEASON_GAME_NUMBER,CAREER_GAME_NUMBER
0,-1.001285,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,...,-0.035852,-0.281806,-0.183922,-0.025343,-0.342591,-0.035746,-0.088428,-0.643218,-1.610867,-1.733044
1,0.998717,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,...,-0.035852,-0.281806,-0.183922,-0.025343,-0.342591,-0.035746,-0.088428,-0.643218,-1.572464,-1.730821
2,0.998717,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,...,-0.035852,-0.281806,-0.183922,-0.025343,-0.342591,-0.035746,-0.088428,-0.643218,-1.534062,-1.728597
3,0.998717,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,...,-0.035852,-0.281806,-0.183922,-0.025343,-0.342591,-0.035746,-0.088428,-0.643218,-1.495659,-1.726374
4,-1.001285,27.892651,-0.035852,-0.025343,-0.025343,-0.035852,-0.035852,-0.025343,-0.025343,-0.035852,...,-0.035852,-0.281806,-0.183922,-0.025343,-0.342591,-0.035746,-0.088428,-0.643218,-1.457256,-1.724151


---

### Build a linear regression predicting `SHOTS_MADE` from the rest of the columns.

Cross-validate the $R^2$ of a linear regression model with 10 cross-validation folds.

How does it perform?

In [49]:
lm = LinearRegression()

scores = cross_val_score(lm, X_stand, y, cv = 10)
#It performs shitty! 

array([ -8.25243667e+28,  -2.63331405e+27,  -2.89603499e+28,
        -3.98953019e+27,  -1.27962923e+28,  -1.00353712e+28,
        -4.64132550e+27,  -9.68379957e+27,  -7.66818584e+27,
        -3.76696784e+28])

---

### Find an optimal value for Ridge regression alpha using RidgeCV

[Go to the documentation and read how RidgeCV works.](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html)

Hint: once the RidgeCV is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.

Recall that Ridge performs best searching alphas through logarithmic space (`np.logspace`).


In [68]:
rlm = RidgeCV(alphas = (np.logspace(-6,6,num=13)), cv=10, fit_intercept=True, normalize=False)

new_model = rlm.fit(X_stand,y)
new_predictions = rlm.predict(X)
new_score = rlm.score(X_stand,y)

print new_model.alpha_
#print new_model.coef_
#print new_model.intercept_
print new_score

1000.0
0.771842520583


---

### Cross-validate the Ridge $R^2$ with the optimal alpha.

Is it better than the Linear regression? If so, why would this be?

---

### Find an optimal value for Lasso regression alpha using LassoCV

[Go to the documentation and read how LassoCV works.](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html) It is very similar to RidgeCV.

Hint: again, once the LassoCV is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.

Recall that Lasso, unlike Ridge, performs best searching alphas through linear space (`np.linspace`). However, you can actually let the LassoCV decide itself what alphas to use by instead setting the keyword argument `n_alphas=` to however many alphas you want it to search over.

In [73]:
rlm2 = LassoCV(alphas = (np.logspace(-2,6,num=13)), cv=10)

new_model = rlm2.fit(X_stand,y)
new_predictions = rlm2.predict(X)
new_score = rlm2.score(X_stand,y)

print new_score
print new_model.alpha_

0.742340716866
0.0464158883361


---

### Cross-validate the Lasso $R^2$ with the optimal alpha.

Is it better than the Linear regression? Is it better than Ridge? For each, why would this be?

Depending on which $R^2$ is better between the Ridge and Lasso, what can you infer about the primary issue in the data?

---

### Look at the coefficients for variables in the Lasso.

1. Show the coefficient for variables, ordered from largest to smallest coefficient by absolute value.
2. What percent of the variables in the original dataset are "zeroed-out" by the lasso?
3. What are the most important predictors for how many shots kobe made in a game?

Note: if you only fit the Lasso within cross_val_score, you will have to refit it outside of that
function to pull out the coefficients.

ValueError: absolute is an unrecognized kind of sort

---

### Find an optimal value for Elastic Net regression alpha using ElasticNetCV

[Go to the documentation and read how LassoCV works.](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html).

Note here that you will be optimizing both the alpha parameter and the l1_ratio:

    alpha: strength of regularization
    l1_ratio: amount of ridge vs. lasso (0 = all ridge, 1 = all lasso)
    
Do not include 0 in the search for l1_ratio: it will not allow it and break!

You can use n_alphas for the alpha parameters instead of setting your own values: highly recommended!

Also - be careful setting too many l1_ratios over cross-validation folds in your search. It can take a very long time if you choose too many combinations and for the most part there are diminishing returns in this data.

In [91]:
encv = ElasticNetCV(cv=10)

modelen = encv.fit(X_stand,y)
predictionen = encv.predict(X_stand)
scoreen = encv.score(X_stand,y)

print scoreen
print modelen.alpha_


0.717492487221
0.121267665872


---

### Cross-validate the ElasticNet $R^2$ with the optimal alpha and l1_ratio.

How does it compare to the other regularized regressions?

---

### Plot the residuals for the ridge, lasso, and elastic net on histograms

This is another way to look at the performance of your model.

The tighter the distribution of residuals around zero, the better your model has performed!

In [92]:
X = pd.DataFrame([[1,2],[2,4]])
y = pd.DataFrame([3,6])

print X
print y

lm = lin

   0  1
0  1  2
1  2  4
   0
0  3
1  6


In [93]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()

model = lm.fit(X,y)
prediction = lm.predict(X)
score = lm.score(X,y)

In [94]:
print score

1.0


In [95]:
print model.coef_

[[ 0.6  1.2]]
