<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Predicting Shots Made Per Game by Kobe Bryant

_Authors: Kiefer Katovich (SF)_

---

In this lab you'll be using regularized regression penalties — ridge, lasso, and elastic net — to try and predict how many shots Kobe Bryant made per game during his career.

The Kobe Shots data set contains hundreds of columns representing different characteristics of each basketball game. Fitting an ordinary linear regression using every predictor would dramatically overfit the model, considering the limited number of observations (games) we have available. Plus, many of the predictors have significant multicollinearity. 


**Warning:** Some of these calculations are computationally expensive and may take a while to execute. It may be worthwhile to only use a portion of the data to perform these calculations, especially if you've experienced kernel issues in the past.

---

### 1) Load packages and data.

In [1]:
import numpy as np
import pandas as pd
import patsy

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import StandardScaler

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')

from sklearn import metrics
from sklearn.metrics import mean_squared_error

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
kobe = pd.read_csv('./datasets/kobe_superwide_games.csv')
df = kobe

---

### 2) Examine the data.

- How many columns are there?
- Examine what the observations (rows) and columns represent.
- Why might regularization be particularly useful for modeling this data?

In [3]:
# A: 645 Columns, 1558 Rows
# A: Rows: Game number, Columns: Opponents, type of shot, Home/Away game, other variables.
# A: There are too many variables which may result in a high variance
df

Unnamed: 0,SHOTS_MADE,AWAY_GAME,SEASON_OPPONENT:atl:1996-97,SEASON_OPPONENT:atl:1997-98,SEASON_OPPONENT:atl:1999-00,SEASON_OPPONENT:atl:2000-01,SEASON_OPPONENT:atl:2001-02,SEASON_OPPONENT:atl:2002-03,SEASON_OPPONENT:atl:2003-04,SEASON_OPPONENT:atl:2004-05,...,ACTION_TYPE:tip_layup_shot,ACTION_TYPE:tip_shot,ACTION_TYPE:turnaround_bank_shot,ACTION_TYPE:turnaround_fadeaway_bank_jump_shot,ACTION_TYPE:turnaround_fadeaway_shot,ACTION_TYPE:turnaround_finger_roll_shot,ACTION_TYPE:turnaround_hook_shot,ACTION_TYPE:turnaround_jump_shot,SEASON_GAME_NUMBER,CAREER_GAME_NUMBER
0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,1,1
1,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,2,2
2,2.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,3,3
3,2.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,4,4
4,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,5,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1553,4.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.105263,0.0,0.000000,0.052632,62,1555
1554,4.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,63,1556
1555,9.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.045455,64,1557
1556,3.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.166667,0.0,0.000000,0.000000,65,1558


---

### 3) Create predictor and target variables. Standardize the predictors.

Why is normalization necessary for regularized regressions?

Use the `sklearn.preprocessing` class `StandardScaler` to standardize the predictors.

In [4]:
ss = StandardScaler()
lr = LinearRegression()

In [5]:
df.columns

Index(['SHOTS_MADE', 'AWAY_GAME', 'SEASON_OPPONENT:atl:1996-97',
       'SEASON_OPPONENT:atl:1997-98', 'SEASON_OPPONENT:atl:1999-00',
       'SEASON_OPPONENT:atl:2000-01', 'SEASON_OPPONENT:atl:2001-02',
       'SEASON_OPPONENT:atl:2002-03', 'SEASON_OPPONENT:atl:2003-04',
       'SEASON_OPPONENT:atl:2004-05',
       ...
       'ACTION_TYPE:tip_layup_shot', 'ACTION_TYPE:tip_shot',
       'ACTION_TYPE:turnaround_bank_shot',
       'ACTION_TYPE:turnaround_fadeaway_bank_jump_shot',
       'ACTION_TYPE:turnaround_fadeaway_shot',
       'ACTION_TYPE:turnaround_finger_roll_shot',
       'ACTION_TYPE:turnaround_hook_shot', 'ACTION_TYPE:turnaround_jump_shot',
       'SEASON_GAME_NUMBER', 'CAREER_GAME_NUMBER'],
      dtype='object', length=645)

In [6]:
X = df.drop(columns='SHOTS_MADE')
type(X)

pandas.core.frame.DataFrame

In [7]:
y = df['SHOTS_MADE']
type(y)

pandas.core.series.Series

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    shuffle=True,      # This is the default. Shuffles data
                                                    test_size=0.25     # This is the default. 
                                                   )

In [9]:
Z_train = ss.fit_transform(X_train)

In [10]:
Z_test = ss.transform(X_test)

---

### 4. Build a linear regression predicting `SHOTS_MADE` from the rest of the columns.

Cross-validate the $R^2$ of an ordinary linear regression model with 10 cross-validation folds.

How does it perform?

In [11]:
# A: R2 with 10 cv folds gives us a huge negative number.
# This means that CVS did not give a meaningful result.
cross_val_score(lr,
                Z_train,
                y_train,
                cv=10
               ).mean()

-1.5798832635798682e+28

In [12]:
lr.fit(Z_train, y_train)
print(metrics.mean_squared_error(y_train, lr.predict(Z_train)))
print(metrics.mean_squared_error(y_test, lr.predict(Z_test)))

2.439028783829903
3.546093760339451e+27


---

### 5) Find an optimal value for the ridge regression alpha using `RidgeCV`.

Go to the documentation and [read how RidgeCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html).

> *Hint: Once the RidgeCV is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.*

Recall that ridge performs best when searching alphas through logarithmic space (`np.logspace`). This may take awhile to fit.


In [13]:
# A:
r_alphas = np.logspace(0, 5, 100)

ridge_cv = RidgeCV(
    alphas=r_alphas,
    cv=5,
    scoring='neg_mean_squared_error'
)

In [14]:
ridge_cv.fit(Z_train, y_train)

RidgeCV(alphas=array([1.00000000e+00, 1.12332403e+00, 1.26185688e+00, 1.41747416e+00,
       1.59228279e+00, 1.78864953e+00, 2.00923300e+00, 2.25701972e+00,
       2.53536449e+00, 2.84803587e+00, 3.19926714e+00, 3.59381366e+00,
       4.03701726e+00, 4.53487851e+00, 5.09413801e+00, 5.72236766e+00,
       6.42807312e+00, 7.22080902e+00, 8.11130831e+00, 9.11162756e+00,
       1.02353102e+01, 1.14975700e+0...
       6.89261210e+03, 7.74263683e+03, 8.69749003e+03, 9.77009957e+03,
       1.09749877e+04, 1.23284674e+04, 1.38488637e+04, 1.55567614e+04,
       1.74752840e+04, 1.96304065e+04, 2.20513074e+04, 2.47707636e+04,
       2.78255940e+04, 3.12571585e+04, 3.51119173e+04, 3.94420606e+04,
       4.43062146e+04, 4.97702356e+04, 5.59081018e+04, 6.28029144e+04,
       7.05480231e+04, 7.92482898e+04, 8.90215085e+04, 1.00000000e+05]),
        cv=5, scoring='neg_mean_squared_error')

In [15]:
ridge_cv.alpha_

1072.2672220103232

In [16]:
ridge_dict = dict(zip(X.columns, ridge_cv.coef_))
ridge_dict

{'AWAY_GAME': -0.015920026909846837,
 'SEASON_OPPONENT:atl:1996-97': -0.02450223416741238,
 'SEASON_OPPONENT:atl:1997-98': 0.004845435734698708,
 'SEASON_OPPONENT:atl:1999-00': 0.0,
 'SEASON_OPPONENT:atl:2000-01': -0.012559889577027205,
 'SEASON_OPPONENT:atl:2001-02': 0.03412520938315459,
 'SEASON_OPPONENT:atl:2002-03': 0.0006916465723863039,
 'SEASON_OPPONENT:atl:2003-04': 0.02852228528783237,
 'SEASON_OPPONENT:atl:2004-05': 0.0022754880766064868,
 'SEASON_OPPONENT:atl:2005-06': 0.005536867623525109,
 'SEASON_OPPONENT:atl:2006-07': 0.025472406244620157,
 'SEASON_OPPONENT:atl:2007-08': -0.01135672250576294,
 'SEASON_OPPONENT:atl:2008-09': -0.009333606045793945,
 'SEASON_OPPONENT:atl:2009-10': 0.047401089066052324,
 'SEASON_OPPONENT:atl:2010-11': -0.018698325878156185,
 'SEASON_OPPONENT:atl:2011-12': -0.02836221912510312,
 'SEASON_OPPONENT:atl:2012-13': 0.017918658520187803,
 'SEASON_OPPONENT:atl:2013-14': -0.037161380483628784,
 'SEASON_OPPONENT:atl:2014-15': 0.0,
 'SEASON_OPPONENT:atl

---

### 6) Cross-validate the ridge regression $R^2$ with the optimal alpha.

Is it better than the linear regression? If so, why might this be?

In [17]:
# A: This is better than the linear regression.
# A: An optimized alpha was used, so this accounts for the number of coefficients
# and uses only those which are more suitable for the model

ridge_train_mse = mean_squared_error(y_train, ridge_cv.predict(Z_train))
ridge_test_mse = mean_squared_error(y_test, ridge_cv.predict(Z_test))

print(mean_squared_error(y_train, ridge_cv.predict(Z_train)))
print(mean_squared_error(y_test, ridge_cv.predict(Z_test)))
print(f'Difference of {ridge_test_mse/ridge_train_mse-1}% MSE')

2.7022582945481
4.649126534852373
Difference of 0.7204597148363452% MSE


In [18]:
# Finding the R^2

ridge_train_r2 = ridge_cv.score(Z_train, y_train)
ridge_test_r2 = ridge_cv.score(Z_test, y_test)

print(ridge_cv.score(Z_train, y_train))
print(ridge_cv.score(Z_test, y_test))
print(f'Difference of {ridge_test_r2/ridge_train_r2-1}%')

0.7783102563449675
0.5966804241472863
Difference of -0.23336430519448037%


---

### 7) Find an optimal value for lasso regression alpha using `LassoCV`.

Go to the documentation and [read how LassoCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html). It is very similar to `RidgeCV`.

> *Hint: Again, once the `LassoCV` is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.*

Recall that lasso, unlike ridge, performs best when searching for alpha through linear space (`np.linspace`). However, you can actually let the LassoCV decide what alphas to use itself by setting the keyword argument `n_alphas=` to however many alphas you want it to search over. We recommend letting scikit-learn choose the range of alphas.

_**Tip:** If you find your CV taking a long time and you're not sure if it's working, set `verbose =1`._

In [19]:
# A:
# l_alphas = np.linspace(-10, 10, 200)

lasso_cv = LassoCV(
    n_alphas=100,       # This is the default, which sets 100 alphas
    cv=5,
    max_iter=50000,
)

lasso_cv.fit(Z_train, y_train)

lasso_cv.alpha_

0.10025153681740488

---

### 8) Cross-validate the lasso $R^2$ with the optimal alpha.

Is it better than the linear regression? Is it better than ridge? What do the differences in results imply about the issues with the data set?

In [20]:
# A:
lasso_train_r2 = lasso_cv.score(Z_train, y_train)
lasso_test_r2 = lasso_cv.score(Z_test, y_test)
print(lasso_cv.score(Z_train, y_train))
print(lasso_cv.score(Z_test, y_test))
print(f'Difference of {lasso_test_r2/lasso_train_r2-1}%')

0.6967968922217581
0.632005236276323
Difference of -0.09298499558292372%


In [21]:
X_train

Unnamed: 0,AWAY_GAME,SEASON_OPPONENT:atl:1996-97,SEASON_OPPONENT:atl:1997-98,SEASON_OPPONENT:atl:1999-00,SEASON_OPPONENT:atl:2000-01,SEASON_OPPONENT:atl:2001-02,SEASON_OPPONENT:atl:2002-03,SEASON_OPPONENT:atl:2003-04,SEASON_OPPONENT:atl:2004-05,SEASON_OPPONENT:atl:2005-06,...,ACTION_TYPE:tip_layup_shot,ACTION_TYPE:tip_shot,ACTION_TYPE:turnaround_bank_shot,ACTION_TYPE:turnaround_fadeaway_bank_jump_shot,ACTION_TYPE:turnaround_fadeaway_shot,ACTION_TYPE:turnaround_finger_roll_shot,ACTION_TYPE:turnaround_hook_shot,ACTION_TYPE:turnaround_jump_shot,SEASON_GAME_NUMBER,CAREER_GAME_NUMBER
183,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.071429,0.000000,0.0,0.0,0.0,0.0,0.000000,21,185
590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,5,592
1040,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.086957,31,1042
1379,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,7,1381
1119,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.033333,0.0,0.3,0.0,0.0,0.033333,5,1121
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
354,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.038462,0.000000,0.0,0.0,0.0,0.0,0.000000,46,356
1224,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,14,1226
1504,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.050000,13,1506
357,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,49,359


In [22]:
lasso_cv.coef_

array([-0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -5.84475910e-02,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -

---

### 9) Look at the coefficients for variables in the lasso.

1. Show the coefficient for variables, ordered from largest to smallest coefficient by absolute value.
2. What percent of the variables in the original data set are "zeroed-out" by the lasso?
3. What are the most important predictors for how many shots Kobe made in a game?

> **Note:** If you only fit the lasso within `cross_val_score`, you'll have to refit it outside of that function to pull out the coefficients.

In [23]:
# A:
dict(zip(X_train.columns, lasso_cv.coef_))

{'AWAY_GAME': -0.0,
 'SEASON_OPPONENT:atl:1996-97': -0.0,
 'SEASON_OPPONENT:atl:1997-98': 0.0,
 'SEASON_OPPONENT:atl:1999-00': 0.0,
 'SEASON_OPPONENT:atl:2000-01': -0.0,
 'SEASON_OPPONENT:atl:2001-02': 0.0,
 'SEASON_OPPONENT:atl:2002-03': 0.0,
 'SEASON_OPPONENT:atl:2003-04': 0.0,
 'SEASON_OPPONENT:atl:2004-05': 0.0,
 'SEASON_OPPONENT:atl:2005-06': 0.0,
 'SEASON_OPPONENT:atl:2006-07': 0.0,
 'SEASON_OPPONENT:atl:2007-08': -0.0,
 'SEASON_OPPONENT:atl:2008-09': -0.0,
 'SEASON_OPPONENT:atl:2009-10': 0.0,
 'SEASON_OPPONENT:atl:2010-11': -0.0,
 'SEASON_OPPONENT:atl:2011-12': -0.0,
 'SEASON_OPPONENT:atl:2012-13': 0.0,
 'SEASON_OPPONENT:atl:2013-14': -0.0,
 'SEASON_OPPONENT:atl:2014-15': 0.0,
 'SEASON_OPPONENT:atl:2015-16': -0.0,
 'SEASON_OPPONENT:bkn:2012-13': -0.0,
 'SEASON_OPPONENT:bkn:2015-16': -0.0,
 'SEASON_OPPONENT:bos:1996-97': 0.0,
 'SEASON_OPPONENT:bos:1997-98': 0.0,
 'SEASON_OPPONENT:bos:1999-00': 0.0,
 'SEASON_OPPONENT:bos:2001-02': -0.0,
 'SEASON_OPPONENT:bos:2002-03': 0.0,
 'SEASO

In [24]:
Z_train

array([[-0.96632025, -0.04141577, -0.04141577, ..., -0.65100587,
        -0.82834623, -1.3413468 ],
       [ 1.03485361, -0.04141577, -0.04141577, ..., -0.65100587,
        -1.43612536, -0.42982248],
       [-0.96632025, -0.04141577, -0.04141577, ...,  1.10983775,
        -0.44848428,  0.57800539],
       ...,
       [-0.96632025, -0.04141577, -0.04141577, ...,  0.36147921,
        -1.1322358 ,  1.6171879 ],
       [ 1.03485361, -0.04141577, -0.04141577, ..., -0.65100587,
         0.23526724, -0.95165335],
       [ 1.03485361, -0.04141577, -0.04141577, ...,  0.09898308,
         0.15929485, -0.54628259]])

In [25]:
count = 0
for i in lasso_cv.coef_:
    if i == 0.0:
        count += 1
print(count)
X_train.shape[1]

# Percent zeroed out:
((-count/X_train.shape[1]) + 1)*100

574


10.869565217391308

---

### 10) Find an optimal value for elastic net regression alpha using `ElasticNetCV`.

Go to the documentation and [read how ElasticNetCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html).

Note that here you'll be optimizing both the alpha parameter and the l1_ratio:
- `alpha`: Strength of regularization.
- `l1_ratio`: Amount of ridge vs. lasso (0 = all ridge, 1 = all lasso).
    
Do not include 0 in the search for `l1_ratio` — it won't allow it and will break.

You can use `n_alphas` for the alpha parameters instead of setting your own values, which we highly recommend.

Also, be careful setting too many l1_ratios over cross-validation folds in your search. It can take a long time if you choose too many combinations and, for the most part, there are diminishing returns in this data.

In [26]:
# A:
# Set up a list of alphas to check.
enet_alphas = np.linspace(0.1, 3, 100)

# Set up our l1 ratio. (What does this do?)
# If l1 ratio = "1" ==> Lasso, if "0" ==> Ridge
enet_ratio = [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

# Instantiate model.
enet_model = ElasticNetCV(alphas=enet_alphas, 
                          l1_ratio=enet_ratio, 
                          cv=5,
                          max_iter=50000
                         )

# Fit model using optimal alpha.
enet_model = enet_model.fit(Z_train, y_train)

In [27]:
enet_model.l1_ratio_

1.0

In [28]:
enet_model.alpha_

0.1

---

### 11) Cross-validate the elastic net $R^2$ with the optimal alpha and l1_ratio.

How does it compare to the ridge and lasso regularized regressions?

In [29]:
# A:

# Generate predictions.
enet_model_preds = enet_model.predict(Z_test)
enet_model_preds_train = enet_model.predict(Z_train)

# Evaluate R2 model.
print("R2 Comparison")
print("\nEnet:")
print(enet_model.score(Z_train, y_train))
print(enet_model.score(Z_test, y_test))
print("\nLasso:")
print(lasso_cv.score(Z_train, y_train))
print(lasso_cv.score(Z_test, y_test))
print("\nRidge:")
print(ridge_cv.score(Z_train, y_train))
print(ridge_cv.score(Z_test, y_test))

R2 Comparison

Enet:
0.697044795320598
0.6320068445473157

Lasso:
0.6967968922217581
0.632005236276323

Ridge:
0.7783102563449675
0.5966804241472863


In [30]:
# Evaluate model.
print("MSE Comparison")

print("\nEnet:")
print(mean_squared_error(y_train, enet_model.predict(Z_train)))
print(mean_squared_error(y_test, enet_model.predict(Z_test)))
print(f'Diff: {100*(mean_squared_error(y_test, enet_model.predict(Z_test))/mean_squared_error(y_train, enet_model.predict(Z_train))-1)}%')

print("\nLasso:")
print(mean_squared_error(y_train, lasso_cv.predict(Z_train)))
print(mean_squared_error(y_test, lasso_cv.predict(Z_test)))
print(f'Diff: {100*(mean_squared_error(y_test, lasso_cv.predict(Z_test))/mean_squared_error(y_train, lasso_cv.predict(Z_train))-1)}%')

print("\nRidge:")
print(mean_squared_error(y_train, ridge_cv.predict(Z_train)))
print(mean_squared_error(y_test, ridge_cv.predict(Z_test)))
print(f'Difference of {100*(ridge_test_mse/ridge_train_mse-1)}% MSE')

MSE Comparison

Enet:
3.692833061304536
4.241913475293112
Diff: 14.868812233678575%

Lasso:
3.6958548438824748
4.2419320140792145
Diff: 14.775395497488987%

Ridge:
2.7022582945481
4.649126534852373
Difference of 72.04597148363452% MSE


In [32]:
enet_model.coef_

array([-0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -5.87090202e-02,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -

---

### 12. [Bonus] Compare the residuals for ridge and lasso visually.


In [31]:
# A: Maybe a jointplot?