## Ridge and Lasso
At this point we've seen a number of criteria and algorithms for fitting regression models to data. We've seen the simple linear regression using ordinary least squares, and its more general regression of polynomial functions. We've also seen how we can arbitrarily overfit models to data using kernel methods or feature engineering. With all of that, we began to explore other tools to analyze this general problem of overfitting versus underfitting. This included train and test splits, bias and variance, and cross validation.

Now we're going to take a look at another way to tune our models. These methods all modify our mean squared error function that we were optimizing against. The modifications will add a penalty for large coefficient weights in our resulting model. If we think back to our case of feature engineering, we can see how this penalty will help combat our ability to create more accurate models by simply adding additional features.

In general, all of these penalties are known as $L^p norms$.

## $L^p$ norm of x
In order to help account for underfitting and overfitting, we often use what are called $L^p$ norms.   
The **$L^p$ norm of x** is defined as:  

### $||x||_p  =  \big(\sum_{i} x_i^p\big)^\frac{1}{p}$

## 1. Ridge (L2)
One common normalization is called Ridge Regression and uses the $l_2$ norm (also known as the Euclidean norm) as defined above.   
The ridge coefficients minimize a penalized residual sum of squares:    
    $ \sum(\hat{y}-y)^2 + \lambda\bullet w^2$

Write this loss function for performing ridge regression.

In [3]:
import numpy as np

In [4]:
def ridge_loss(y, y_hat, coeff_weights, lam = 0.8):
    rss = np.sum((y_hat-y)**2)
    norm = np.sum(lam * coeff_weights**2)
    l2_err = rss + lam*norm
    return l2_err

## 2. Lasso (L1)
Another common normalization is called Lasso Regression and uses the $l_1$ norm.   
The ridge coefficients minimize a penalized residual sum of squares:    
    $ \sum(\hat{y}-y)^2 + \lambda\bullet |w|$

Write this loss function for performing ridge regression.

In [5]:
def lasso_loss(y, y_hat, coeff_weights, lam = 0.8):
    rss = np.sum((y_hat-y)**2)
    norm = np.sum(lam * np.abs(coeff_weights))
    l1_err = rss + lam*norm
    return l1_err

## 3. Run + Compare your Results
Run a ridge lasso and unpenalized regressions on the dataset below.
While we have practice writing the precursors to a full ridge regression, we'll import the package for now.
Then, answer the following questions:
* Which model do you think created better results overall? 
* Comment on the differences between the coefficients of the resulting models

In [6]:
import pandas as pd

In [7]:
df = pd.read_excel('movie_data_detailed.xlsx')
df.head()

Unnamed: 0,budget,domgross,title,Response_Json,Year,imdbRating,Metascore,imdbVotes
0,13000000,25682380,21 &amp; Over,0,2008,6.8,48,206513
1,45658735,13414714,Dredd 3D,0,2012,0.0,0,0
2,20000000,53107035,12 Years a Slave,0,2013,8.1,96,537525
3,61000000,75612460,2 Guns,0,2013,6.7,55,173726
4,40000000,95020213,42,0,2013,7.5,62,74170


In [9]:
X = df[['budget', 'imdbRating',
       'Metascore', 'imdbVotes']]
y = df['domgross']

In [10]:
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split

#Perform test train spli
X_train , X_test, y_train, y_test = train_test_split(X, y)

ridge_reg = Ridge()
ridge_reg.fit(X_train, y_train)

lasso_reg = Lasso()
lasso_reg.fit(X_train, y_train)

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

def rss(residual_col):
    return sum(residual_col.astype(float).map(lambda x: x**2))

print('Train Error  Ridge Model', rss(y_train - ridge_reg.predict(X_train)))
print('Test Error Ridge Model', rss(y_test - ridge_reg.predict(X_test)))
print('\n')

print('Train Error Lasso Model', rss(y_train - lasso_reg.predict(X_train)))
print('Test Error Lasso Model', rss(y_test - lasso_reg.predict(X_test)))
print('\n')

print('Train Error Unpenalized Linear Model', rss(y_train - lin_reg.predict(X_train)))
print('Test Error Unpenalized  Linear Model', rss(y_test - lin_reg.predict(X_test)))

Train Error  Ridge Model 1.517045946179938e+17
Test Error Ridge Model 9895338708281942.0


Train Error Lasso Model 1.5170347498409187e+17
Test Error Lasso Model 9865064824965426.0


Train Error Unpenalized Linear Model 1.5170347498409184e+17
Test Error Unpenalized  Linear Model 9865064713195520.0


In [13]:
print(lasso_reg.coef_)
print(ridge_reg.coef_)
print(lin_reg.coef_)

[ 6.95395319e-01 -7.74512236e+06  4.62976126e+05  2.87412686e+02]
[ 6.95597610e-01 -7.59839284e+06  4.52541873e+05  2.87602028e+02]
[ 6.95395319e-01 -7.74512281e+06  4.62976163e+05  2.87412685e+02]


# Altering Alpha

Remember that we can also change our normalization coefficient, alpha, to adjust the strenght of our normalization.
Iterate over the set **np.linspace(start=0.1, stop=2.5, num=13)** in order to find an optimal alpha.

In [14]:
# import numpy as np
min_test_error_ridge = []
min_test_error_lasso = []
optimal_ridge_alpha = 0
optimal_lasso_alpha = 0
for iter, a in enumerate(np.linspace(start=0.1, stop=2.5, num=13)):
    ridge_reg = Ridge(alpha=a)
    ridge_reg.fit(X_train, y_train)

    lasso_reg = Lasso(alpha=a)
    lasso_reg.fit(X_train, y_train)

    ridge_train_rss = rss(y_train - ridge_reg.predict(X_train))
    ridge_test_rss = rss(y_test - ridge_reg.predict(X_test))
#     print('Train Error  Ridge Model', ridge_train_rss)
#     print('Test Error Ridge Model', ridge_test_rss)
#     print('\n')
    
    lasso_train_rss = rss(y_train - lasso_reg.predict(X_train))
    lasso_test_rss = rss(y_test - lasso_reg.predict(X_test))
#     print('Train Error Lasso Model', lasso_train_rss)
#     print('Test Error Lasso Model', lasso_test_rss)
#     print('\n')
    
    if iter == 0:
        min_test_error_ridge = ridge_test_rss
        min_test_error_lasso = lasso_test_rss
        optimal_ridge_alpha = a
        optimal_lasso_alpha = a
    if min_test_error_ridge > ridge_test_rss:
        min_test_error_ridge = ridge_test_rss
        optimal_ridge_alpha = a
    if min_test_error_lasso > lasso_test_rss:
        min_test_error_lasso = lasso_test_rss
        optimal_lasso_alpha = a
print('Minimum Ridge Test RSS: {}, Best alpha: {}'.format(min_test_error_ridge, optimal_ridge_alpha))
print('Minimum Lasso Test RSS: {}, Best alpha: {}'.format(min_test_error_lasso, optimal_lasso_alpha))

Minimum Ridge Test RSS: 9868134088747096.0, Best alpha: 0.1
Minimum Lasso Test RSS: 9865064724371076.0, Best alpha: 0.1
