# Regularisation (Lasso)
## Cambridge ML Commando Course

In this notebook you should **(by completing bits of code left as "...")**:
- create noisy data based on a pure signal
- create regressors with various non-linear features
- test their fits and plot them, along with printing their cross-validation scores
- implement lasso regression to smooth out overfit, inspect how it works

One you've done that, you can (no code needed):
- plot a validation curve for our lasso regressor
- plot learning curves for all our regressors
- Check the correlation matrix of the data for clues about performance


In [None]:
%matplotlib inline
%pylab inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import sklearn
import IPython
import platform
from sklearn import preprocessing


print ('Python version:', platform.python_version())
print ('IPython version:', IPython.__version__)
print ('numpy version:', np.__version__)
print ('scikit-learn version:', sklearn.__version__)
print ('matplotlib version:', matplotlib.__version__)

In [None]:
X_pure = np.arange(0,2*3.1415,0.1)

true_fun = lambda x : np.sin(x) # Why not try a different function?

np.random.seed(666)
X = np.sort(random.choice(X_pure, size=30, replace=False))

y_pure = np.sin(X_pure)
y = true_fun(X) + np.random.randn(len(X))*0.25
print("X values:\n",X)
print("y values:\n",y)
plt.plot(X_pure, true_fun(X_pure))
plt.scatter(X,y)

In [None]:
from sklearn.linear_model import Lasso
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
print(X.shape, y.shape)

X = X.reshape(-1,1)
X_pure = X_pure.reshape(-1,1)

plt.ylim(-1.6, 1.5)
plt.plot(X_pure,true_fun(X_pure), linestyle="--", label="true")


scaler15 = StandardScaler()
poly15 = PolynomialFeatures(15)
steps = [
    ("poly",poly15),
    ("scale",scaler15),
    ("reg",LinearRegression())
]
reg15 = Pipeline( steps )

reg15.fit(X,y)
plt.plot(X_pure, reg15.predict(X_pure), label="quindecic (15)")
scores15 = cross_val_score(reg15, X, y, scoring="neg_mean_squared_error", cv=10)

lasso_regressor = ... #Create your estimator here.

steps = [
    ("poly",poly15),
    ("scale",scaler15),
    (...) # put your lasso regressor here
]
lasso15 = Pipeline( steps )
lasso15.fit(X,y)
plt.plot(X_pure, lasso15.predict(X_pure), label="lasso (15)")
lasso_scores = cross_val_score(lasso15, X, y, scoring="neg_mean_squared_error", cv=10)

plt.legend()

print("Quindecic model")
print(-np.mean(scores15), np.std(scores15))

print("Lasso regression regularisation")
print(-np.mean(lasso_scores), np.std(lasso_scores))

plt.gcf().set_size_inches(10,10)

## Checkpoint
How did that work?  Was it what you were expecting to see?
Maybe it's time to investigate what effects changing *alpha* with have...

PS: If you are getting convergence warnings, you may need to increase the Lasso object's tolerance (tol) to some larger value.

In [None]:
alphas = ... # implement a sequence over a log or linear space - you will probably need quite small values!
ax = plt.figure().gca()
ax.scatter(X, y)
for a in alphas:
    steps = [
    ("poly",poly15),
    ("scale",scaler15),
    (...) # put your lasso regressor here - don't forget to set its "alpha=" keyword!
    ]
    new_lasso = Pipeline( steps )
    new_lasso.fit(X,y)
    ax.plot(X_pure, new_ridge.predict(X_pure), label=a)
plt.ylim(-1.5,1.5)
plt.legend(title="alpha")
plt.gcf().set_size_inches(10,10)
plt.title('Lasso (15): Fit to datapoints under increasing regularisation')
plt.axis('tight')
plt.show()

In [None]:
coefs = []
for a in alphas: # you can use the alphas you defined earlier
    steps = [
    ("poly",poly15),
    ("scale",scaler15),
    (...) # lasso regressor here
    ]
    new_lasso = Pipeline( steps )
    new_lasso.fit(X,y)    
    coefs.append(new_lasso.named_steps["reg"].coef_)

plt.gcf().set_size_inches(10,10)
ax = plt.gca()

ax.plot(alphas, coefs)
ax.set_xscale('log')
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Lasso coefficients as a function of the regularisation param')
plt.axis('tight')
plt.show()

Can you see how coefficients are being dropped to zero as you increase alpha?  This removes the corresponding features from the model, making it *sparse*.

Have a look at some more characteristics of the lasso regressor below, using validation and learning curves.  You dont need to fill in any of this code, but you might want to take the best value and apply it to your code above, and see if it improves the fit and the cross-validation scores.

In [None]:
from sklearn.model_selection import validation_curve
def plot_validation_curve(estimator, title, X, y, ylim=None, cv=None,
                        n_jobs=None, param_name="C", param_range = np.logspace(-3, 5, 10)):

    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training examples")
    plt.ylabel("Score")
    
    train_scores, test_scores = validation_curve(
    estimator, X, y, param_name=param_name, scoring="neg_mean_squared_error", param_range=param_range,
    cv=cv, n_jobs=n_jobs)

    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
    test_scores_std = np.std(test_scores, axis=1)
            
    plt.grid()

    plt.xlabel(param_name)
    plt.ylabel("Score")
    lw = 2
    plt.semilogx(param_range, train_scores_mean, label="Training score",
                 color="darkorange", lw=lw)
    plt.fill_between(param_range, train_scores_mean - train_scores_std,
                     train_scores_mean + train_scores_std, alpha=0.2,
                     color="darkorange", lw=lw)
    plt.semilogx(param_range, test_scores_mean, label="Cross-validation score",
                 color="navy", lw=lw)
    plt.fill_between(param_range, test_scores_mean - test_scores_std,
                     test_scores_mean + test_scores_std, alpha=0.2,
                     color="navy", lw=lw)

    plt.gcf().set_size_inches(10,10)
    plt.legend(loc="best")
    return plt


In [None]:
X15=scaler15.transform(poly15.transform(X)) # here we have to transform the X values ourselves, without relying on a pipeline
plot_validation_curve(Lasso(tol=0.05), "Lasso15", X15, y, (-1.5,0.5), cv=10, n_jobs=-1, param_name="alpha", param_range=np.logspace(-4, 1, 30))


In [None]:
from sklearn.model_selection import learning_curve
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
                        n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)):
    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training examples")
    plt.ylabel("Score")
    train_sizes, train_scores, test_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=n_jobs, scoring="neg_mean_squared_error", train_sizes=train_sizes)
    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
#     print(test_scores_mean)
    test_scores_std = np.std(test_scores, axis=1)
    plt.grid()

    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
                     train_scores_mean + train_scores_std, alpha=0.1,
                     color="r")
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
                     test_scores_mean + test_scores_std, alpha=0.1, color="g")
    plt.plot(train_sizes, train_scores_mean, 'o-', color="r", alpha=0.5,
             label="Training score")
    plt.plot(train_sizes, test_scores_mean, 'o-', color="g", alpha=0.5,
             label="Cross-validation score")

    plt.legend(loc="best")
    plt.gcf().set_size_inches(10,10)
    return plt

In [None]:
from sklearn.model_selection import ShuffleSplit
cv = ShuffleSplit(n_splits=10, test_size=0.25, random_state=0)

# train_sizes = np.linspace(1,15,15).astype(int)
train_sizes = np.linspace(0.1, 1.0, 20)
print("Using following proportions of data:",train_sizes)
plot_learning_curve(reg15, "Overfit (15)", X, y, (-.2e8,.5e7), cv=cv, n_jobs=4, train_sizes=train_sizes)
plot_learning_curve(lasso15, "Lasso (15)", X, y, (-4,1), cv=cv, n_jobs=4, train_sizes=train_sizes)

## End note
It seems like the lasso regressor struggles a bit with this data.  You need to set the *alpha* value very small to avoid overly sparse regularisation, which leads to underfit.  This in turn means we need to increase the tolerance on the estimator, otherwise it does not converge properly.

Can you think why this might be?  Have a look at the correlation matrix (below) for clues:

In [None]:
numpy.corrcoef(X15[:,1:].T) # remove first feature (always zero) and transpose (one feature per row)