- We know Lasso is able to shrink coefficients to zero, but we haven’t taken a deeper dive into how or why that is.
- This ability becomes more clear when learning about elastic net which combines Lasso and Ridge together!


<img src='rreg14.png' width=700>

- we are minimizing the residual sum of squares subject to a condition. And that condition is that penalty term. And obviously S is going to change depending on the set of features you're working with.

---

So let's start with a simple thought experiment.
<br>We're going to deal with a very simple set of features, and there's only two of them.

<img src='rreg15.png' width=700>

---

<img src='rreg16.png' width=700>

---

<img src='rreg17.png' width=700>

---

<img src='rreg18.png' width=700>

---

<img src='rreg19.png' width=700>

---

<img src='rreg20.png' width=700>

---

<img src='rreg21.png' width=700>

---

Geometrically speaking as you're minimizing residual sum of squares and you're subject to that lasso penalty because it's in the shape of a hypercube, you're highly likely to encounter a corner and recall the corner of the cube, in this case, the corner of the squares for two dimensions means one of the coefficients on that corner is going to be zero.

<img src='rreg22.png' width=700>

---

<img src='rreg23.png' width=700>

---

<img src='rreg24.png' width=700>

---

We can see here that if Alpha was equal to zero, that would mean we're just considering beta squared or if alpha was equal to one, then we only be considering absolute value of beta.

---

<img src='rreg25.png' width=700>

---

In [3]:
import numpy as np
import pandas as pd

In [4]:
df = pd.read_csv('Advertising.csv')

In [5]:
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [6]:
X = df.drop('sales', axis=1)
y = df['sales']

In [7]:
from sklearn.preprocessing import PolynomialFeatures

In [8]:
poly_converter = PolynomialFeatures(degree=3, include_bias=False)

In [9]:
poly_features = poly_converter.fit_transform(X)

In [10]:
poly_features

array([[2.30100000e+02, 3.78000000e+01, 6.92000000e+01, ...,
        9.88757280e+04, 1.81010592e+05, 3.31373888e+05],
       [4.45000000e+01, 3.93000000e+01, 4.51000000e+01, ...,
        6.96564990e+04, 7.99365930e+04, 9.17338510e+04],
       [1.72000000e+01, 4.59000000e+01, 6.93000000e+01, ...,
        1.46001933e+05, 2.20434291e+05, 3.32812557e+05],
       ...,
       [1.77000000e+02, 9.30000000e+00, 6.40000000e+00, ...,
        5.53536000e+02, 3.80928000e+02, 2.62144000e+02],
       [2.83600000e+02, 4.20000000e+01, 6.62000000e+01, ...,
        1.16776800e+05, 1.84062480e+05, 2.90117528e+05],
       [2.32100000e+02, 8.60000000e+00, 8.70000000e+00, ...,
        6.43452000e+02, 6.50934000e+02, 6.58503000e+02]])

In [11]:
poly_features.shape

(200, 19)

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)

In [14]:
from sklearn.preprocessing import StandardScaler

In [15]:
scaler = StandardScaler()

In [16]:
scaler.fit(X_train)

StandardScaler()

In [17]:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

---

In [18]:
from sklearn.linear_model import ElasticNetCV

In [19]:
elastic_model = ElasticNetCV(l1_ratio=[.1, .5, .7, .9, .95, .99, 1], 
                             eps=0.001,
                            n_alphas=100,
                            max_iter=1000000)

.1  means 10% lasso 90% ridge

In [21]:
elastic_model.fit(X_train, y_train)

ElasticNetCV(l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], max_iter=1000000)

__n_alphas__ is not the same alpha I showed in the equation in the slides, it's the lambda outside.
<br>__l1_ratio__ refers to the alpha from the equation.

In [22]:
elastic_model.l1_ratio # what we've tried

[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]

In [25]:
elastic_model.l1_ratio_ # best performing l1_ratio

1.0

it's 100% an L1 model.

In [27]:
elastic_model.alpha_

0.004943070909225827

In [28]:
test_predictions = elastic_model.predict(X_test)

In [29]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [30]:
MAE = mean_absolute_error(y_test, test_predictions)

In [31]:
MAE

0.43350346185900673

In [32]:
RMSE = np.sqrt(mean_squared_error(y_test, test_predictions))

In [33]:
RMSE

0.6063140748984039