# Lasso Regression

An alternative to Ridge for regularizing linear regression is Lasso. As with ridge regression, using the lasso also restricts coefficients to be close to zero, but in a slightly different way, called L1 regualarization. The consequence of L1 regularization is that when using the lasso, some coefficients are exactly zero. This means some features are entirely ignored by the model. This can be seen as a form of automatic feature selection. Having some coefficients be exactly zero often makes a model easier to interpret, and can reveal the most important features of your model.

# Boston Housing Dataset

In [3]:
from sklearn.linear_model import Lasso
import mglearn
import numpy as np
from sklearn.model_selection import train_test_split
X,y = mglearn.datasets.load_extended_boston()
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)

In [4]:
lasso = Lasso().fit(X_train,y_train)
print("Training set score: {:.2f}".format(lasso.score(X_train,y_train)))
print("Test set score: {:.2f}".format(lasso.score(X_test,y_test)))
print("Number of features used: {}".format(np.sum(lasso.coef_ !=0)))

Training set score: 0.29
Test set score: 0.21
Number of features used: 4


As you can see, Lasso does quite badly, both on the training and the test set. This indicates that we are underfitting, and we find that it used only 4 of the 105 features. Similarly to Ridge, the Lasso also has a regularization parameter alpha. To reduce underfitting ,let's try decreasing alpha. When we do this, we need to increase the default settings of max_iter.

## alpha = 0.01

In [5]:
lasso001 = Lasso(alpha = 0.01, max_iter=100000).fit(X_train,y_train)
print("Training set score : {:.2f}".format(lasso001.score(X_train,y_train)))
print("Test set score : {:.2f}".format(lasso001.score(X_test,y_test)))
print("Number of features used : {}".format(np.sum(lasso001.coef_!=0)))

Training set score : 0.90
Test set score : 0.77
Number of features used : 33


## alpha = 0.01

In [6]:
lasso00001 = Lasso(alpha = 0.0001, max_iter=100000).fit(X_train,y_train)
print("Training set score : {:.2f}".format(lasso00001.score(X_train,y_train)))
print("Test set score : {:.2f}".format(lasso00001.score(X_test,y_test)))
print("Number of features used : {}".format(np.sum(lasso00001.coef_!=0)))

Training set score : 0.95
Test set score : 0.64
Number of features used : 96


# Conclusion

In practice, ridge regression is usually the first choice between these two models. However, if you have a large amount of features and expect only a few of them to be important, Lasso might be a better choice. Similarly, if you would like to have a model that is easy to interpret, Lasso will provide a model that is easier to understand, as it will select only a subset of the input features. scikit-learn provides the ElasticNet class, which combines the penalties of Lasso and Ridge. In Practice, this combination works best, though at the price of having two parameters to adjust: one for L1 regularization, and one for L2 regularization.