# Lasso Regression - L1 Regularization

Notebook to demonstrate Lasso Regression

Dataaset source: https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ 
Metadata:  https://archive.ics.uci.edu/ml/datasets/Energy+efficiency

Both Y1 and Y2 can be used as target variables

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

In [2]:
enb = pd.read_excel('ENB2012_data.xlsx')

In [3]:
#Select df columns with regex:
X = enb.filter(regex=("X.*"))
y = enb.filter(regex=("Y.*"))

In [4]:
scaler = MinMaxScaler()

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X,y)

In [6]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

In [20]:
lasso = Lasso(alpha = 100)

In [21]:
lasso.fit(X_train, y_train)

Lasso(alpha=100, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [22]:
print('ridge regression linear model intercept: {}'
     .format(lasso.intercept_))
print('ridge regression linear model coeff:\n{}'
     .format(lasso.coef_))

ridge regression linear model intercept: [41.92125544 46.43899977]
ridge regression linear model coeff:
[[ 0.         -0.06241453  0.07730316 -0.01332252  0.          0.
   0.          0.        ]
 [ 0.         -0.06002131  0.06393122 -0.01101798  0.          0.
   0.          0.        ]]


In [23]:
print('R-squared score (training): {:.3f}'
     .format(lasso.score(X_train_scaled, y_train)))
print('R-squared score (test): {:.3f}'
     .format(lasso.score(X_test_scaled, y_test)))

R-squared score (training): -4.532
R-squared score (test): -4.332


In [25]:
print('Number of non-zero features: {}'
     .format(np.sum(lasso.coef_ != 0)))

Number of non-zero features: 6
