# Regularization

We generate some random data and overfit it on purpose.

Let's create data that follows this equation:


$y = \sqrt x$

We add random noise to make the data more interesting.

In [None]:
import numpy as np

In [None]:
np.random.seed(42)

X = np.linspace(0, 60, 1000)
y = np.sqrt(X) + np.random.normal(loc=0.0, scale=1.0, size=X.shape[0])

In [None]:
import matplotlib.pyplot as plt

plt.scatter(X, y)

---

### Overfit the model

**Now let's intentionally overfit a model that focuses too much on training examples.**

In [None]:
Xtrain.shape, ytrain.shape

In [None]:
Xpoly.shape

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

Xtrain = X.reshape(-1, 1)
ytrain = y

poly = PolynomialFeatures(degree=8) 
m = LinearRegression() 

Xpoly = poly.fit_transform(Xtrain)  # do this only on training data!!!

m.fit(Xpoly, ytrain)

ypred = m.predict(Xpoly)

In [None]:
ypred.shape

In [None]:
plt.scatter(X, y)
plt.plot(X, ypred, 'r-')

### Regularize

Reduce the complexity through regularization

#### Exercise: Increase alpha and observe what happens

In [None]:
from sklearn.linear_model import Ridge
from sklearn.preprocessing import MinMaxScaler

Xscaled = MinMaxScaler().fit_transform(Xpoly)

m = LinearRegression() 
r = Ridge(alpha=0.001)  # alpha: strength of the regularization penalty

m.fit(Xscaled, y)       
r.fit(Xscaled, y)  

ypred = m.predict(Xscaled)
ypred_ridge = r.predict(Xscaled)

plt.figure(figsize=(12, 8))
plt.scatter(X, y)
plt.plot(X, ypred)
plt.plot(X, ypred_ridge)
plt.plot(X, np.sqrt(X))
plt.legend(['No regularization', 'Ridge Regularization'])

In [None]:
# also try:
from sklearn.linear_model import Lasso
# Lasso is used in the same way as Ridge