## Class 3 - L1 and L2 regularization
In our lecture, we discussed some foundational concepts in data science, namely *overfitting*, the *bias-variance* trade-off, and *regularization*.
Today, we will apply some of these concepts to interpret and improve the performance of our predictive models, focusing especially on regularization.

AS usual, we will do all of this using `scikit-learn`.

**Note**: As last week, under `nbs/class_03` you will find a notebook called `example.ipynb`, where I provide an example of how to run today's exercise on sample data.
Some of it repeats things done as part of `class_02`, for the sake of completeness.
But you can also:
- just keep working on the same notebook as last week
- work in a new notebook, but load data splits and models that have saved last week

### Today's exercise
Gather in the same group as last week, and please go through the following steps:
1. Look back at last week's notebook. If you have not applied any transformation to your input because you did not have time, spend some time thinking about whether it would make sense to do so. You can find relevant transformations in `scikit-learn`: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing. You will probably mostly be interested in `StandardScaler` and `MinMaxScaler`
2. Look at the performance of the models you've fitted last week: what is the best model? Do you see any evidence of overfitting?
3. Fit your maximal models with `Lasso` (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso) and `Ridge` (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge) estimators instead of `LinearRegression`. Make sure you look at the documentation to understand what these do. Fit multiple models with multiple values of alpha and store the outputs;
4. Plot the performance of your models against your linear and KNN models from last week. Does the performance of the model on the validation set improve with regularization?
5. For both `Lasso` and `Ridge` models, plot the value of the coefficients as a function of alpha. You can access the coefficients for a fitted `model` through `model.coef_`. What do you notice in terms of how LASSO versus Ridge behave? (Look at `example.ipynb` for inspiration)
6. Finally, if any models are doing better than the linear model without regularization, select the best `Ridge` and the best `Lasso` model, and plot their coefficients, alongsize coefficients from the simple linear models. How do estimates change with regularization? Which values have changed the most? Do you have any hypothesis as to why?

### Once you have done this
Please submit a pull request to my repository where, within `nbs/class_03/group-x` you have the notebook on which you have worked.
If your notebook is the same as last week, please push that one.

In [24]:
import pandas as pd
import os

#importing bike data
path = os.path.join("..","..","_data", "bike_sharing_dataset", "day.csv")
bike_data = pd.read_csv(path)

bike_data.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [23]:
import pandas as pd
import os

#importing bike data
path = os.path.join("..","..","_data", "bike_sharing_dataset", "day.csv")
bike_data = pd.read_csv(path)

#print(bike_data.describe())

# reformating dteday to datetime
bike_data['dteday'] = pd.to_datetime(bike_data['dteday'])

In [None]:
# making a linear regression model
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error