# Regularized Regression Practice

Why regularize?

- Reduce complexity
- Reduce the chance of overfitting
- Reduce model variance at the expense of introducing small bias
- Increase model interpretability

What even is L1 or L2?

### Review:

What is L1 Regularization (LASSO) good for?

- 


What is L2 Regularization (Ridge) good for?

- 


In [None]:
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

![baby penguin gif from Giphy](https://media.giphy.com/media/RiJuDMqd6vDgfPrZN2/giphy.gif)

Let's hang out with penguins some more:

In [None]:
data = sns.load_dataset('penguins')

In [None]:
data.head()

In [None]:
data.info()

Let's clean up this dataset - two rows have quite a few null values, and 11 total do not have a value for `sex`, so let's drop rows where any data is null:

In [None]:
# Drop nulls here


In [None]:
# Sanity check


In [None]:
data.info()

### Encoding Our Data

In [None]:
data[['species','island','sex']].describe()

In order to use the `gender`, `species` or `island` data we need to render those strings as numbers - since there are only 2-3 unique values per column, let's simply one-hot-encode those columns (aka turn the columns into a series of binary indicators).

Using Pandas' `get_dummies` : https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.get_dummies.html

In [None]:
# One hot encode our three 'object' columns
data_num = None

In [None]:
data_num.head()

You'll note that Pandas' `get_dummies` does not automatically drop one of the columns - even though the two `sex` columns, `sex_FEMALE` and `sex_MALE`, are simply inverses of each other, and thus one of those columns contains the same amount of information as having both of those columns, it keeps both. That's fine for now.

In [None]:
sns.heatmap(data_num.corr().abs())
plt.show()

In [None]:
sns.pairplot(data_num)

The goal of this is to predict body mass, `body_mass_g`, so let's define our X and y and perform a train/test split:

In [None]:
# Define X and y
X = None
y = None

In [None]:
# Perform a train/test split


### Scaling our Data:

When we introduced scaling variables last week, we talked about how some models require that we scale or standardize variables before using those models - Ridge and LASSO regression are two of those models!

Why? Because both Ridge and LASSO look at the coefficients of a linear regression model to penalize those coefficients. Coefficients of linear models are highly dependent on the values of those models - making sure they're properly scaled will make sure that our model penalizes actually useless columns, instead of just thinking those columns are useless because the data isn't properly scaled.

In [None]:
# Instantiate a scaler


In [None]:
# Train our scaler on training data, then fit to testing
X_train_scaled = None
X_test_scaled = None

### Baseline Linear Regression Model

In [None]:
# Instantiate a linear regression model
lr = None

In [None]:
# Fit our model on our scaled data


In [None]:
# Evaluate
y_train_pred = lr.predict(X_train_scaled)
y_test_pred = lr.predict(X_test_scaled)

print("Training Scores:")
print(f"R2: {r2_score(y_train, y_train_pred)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_train, y_train_pred)}")
print("---")
print("Testing Scores:")
print(f"R2: {r2_score(y_test, y_test_pred)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_test, y_test_pred)}")

### L1 Norm: LASSO

In [None]:
# Instantiate a lasso regression model
lasso = None

In [None]:
# Fit your new L1 model -  on the scaled data


In [None]:
# Evaluate=
y_train_pred_l1 = lasso.predict(X_train_scaled)
y_test_pred_l1 = lasso.predict(X_test_scaled)

print("Training Scores:")
print(f"R2: {r2_score(y_train, y_train_pred_l1)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_train, y_train_pred_l1)}")
print("---")
print("Testing Scores:")
print(f"R2: {r2_score(y_test, y_test_pred_l1)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_test, y_test_pred_l1)}")

Remember - what's the benefit of using LASSO?

In [None]:
data_num.columns

In [None]:
print("Unpenalized Linear Regression Coefficients are:{}".format(lr.coef_))
print("Unpenalized Linear Regression Intercept:{}".format(lr.intercept_))
print("---")
print("Lasso Regression Coefficients are:{}".format(lasso.coef_))
print("Lasso Linear Regression Intercept:{}".format(lasso.intercept_))

### L2 Norm: Ridge

In [None]:
# Instantiate a lasso regression model
ridge = None

In [None]:
# Fit your new L2 model -  on the scaled data


In [None]:
# Evaluate
y_train_pred_l2 = ridge.predict(X_train_scaled)
y_test_pred_l2 = ridge.predict(X_test_scaled)

print("Training Scores:")
print(f"R2: {r2_score(y_train, y_train_pred_l2)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_train, y_train_pred_l2)}")
print("---")
print("Testing Scores:")
print(f"R2: {r2_score(y_test, y_test_pred_l2)}")
print(f"Mean Absolute Error: {mean_absolute_error(y_test, y_test_pred_l2)}")

In [None]:
print("Unpenalized Linear Regression Coefficients are:{}".format(lr.coef_))
print("Unpenalized Linear Regression Intercept:{}".format(lr.intercept_))
print("---")
print("Ridge Regression Coefficients are:{}".format(ridge.coef_))
print("Ridge Linear Regression Intercept:{}".format(ridge.intercept_))

All together:

In [None]:
coef_dict = {}
for loc, col in enumerate(data_num.columns):
    coef_dict[col] = {"Unpenalized": lr.coef_[loc-1],
                      "LASSO": lasso.coef_[loc-1],
                      "Ridge": ridge.coef_[loc-1]}
pd.DataFrame.from_dict(coef_dict)

### Alpha Levels??

We started with the **hyperparameter** alpha set to `0.5` for both our LASSO and Ridge Models: now let's play around with it!

### Resources

- [Stats course resource from Penn State](https://online.stat.psu.edu/stat508/lesson/5), going into detail about Regression Shrinkage Methods - aka regularization. This is pretty technical, and the code is in R, but goes into good detail about the motivation of why we do this and how this works.