### Required Assignment 9.4: LASSO and Sequential Feature Selection


**Expected Time: 60 Minutes**

**Total Points: 50**

This assignment introduces the `Ridge` regression estimator from scikitlearn.  You will revisit the insurance data from the previous assignment and experiment with varying the `alpha` parameter discussed in Video 9.4. Your work here is a basic introduction where complexity in the preprocessing steps will be added to scale your data.  For now, you are just to familiarize yourself with the `Ridge` regression estimator and its `alpha` parameter. 

This assignment compares a second regularized regression method -- the LASSO -- with that of sequential feature selection.  The LASSO will be briefly discussed below, and you will use the scikit learn implementation.  Rather than using the LASSO as a model, you are to compare it to the `SequentialFeatureSelection` transformer as a method to select important features for a regression model. 


#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)

In [1]:
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SequentialFeatureSelector, SelectFromModel
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import set_config
set_config(display="diagram")


import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px

### The Data

For this exercise, you will revisit the automotive data.  The goal is again to predict the `mpg` column using the other numeric features.  You will build a polynomial model of degree 3 to compare the results of a `Lasso` and that of a `LinearRegression` model. Finally, you will use the `Lasso` estimator to select features in a pipeline with `SelectFromModel`. 

Below, the train and test data is created for you as `auto_X_train`, `auto_X_test`, `auto_y_train`, and `auto_y_test`.

In [2]:
auto = pd.read_csv('data/auto.csv')

In [3]:
auto.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino


In [4]:
#generate train/test data for auto
auto_X = auto.drop(['mpg', 'name'], axis = 1)
auto_y = auto['mpg']
auto_X_train, auto_X_test, auto_y_train, auto_y_test = train_test_split(auto_X, auto_y, 
                                                                       test_size = 0.3,
                                                                       random_state = 42)

[Back to top](#Index:) 

### Problem 1

#### The auto data

**10 Points**

To start, build a `Pipeline` named `auto_pipe` with named steps `polyfeatures`, `scaler` and `lasso` model that utilizes `PolynomialFeatures`, `StandardScaler`, and the `Lasso` estimator with the following parameters:

- `degree = 3` in `PolynomialFeatures`
- `include_bias = False` in `PolynomialFeatures`
- `random_state = 42` in `Lasso`

Fit the pipeline on `auto_X_train` and `auto_y_train` data given.  Extract the lasso coefficients from the pipeline and assign them as an array to `lasso_coefs` below.  

**HINT**: Use the `.named_steps['lasso']` to extract that lasso estimator and use the `.coef_` attribute after fitting to access the model coefficients.

In [7]:
### GRADED

auto_pipe = Pipeline([("polyfeatures", PolynomialFeatures(degree=3, include_bias = False)),("scaler", StandardScaler()),("lasso", Lasso(random_state = 42))])
lasso_coefs = auto_pipe.fit(auto_X_train,auto_y_train).named_steps['lasso'].coef_

# YOUR CODE HERE
#raise NotImplementedError()

# Answer check
print(type(lasso_coefs))
print(lasso_coefs)
auto_pipe

<class 'numpy.ndarray'>
[-0.         -0.         -0.         -3.06660503  0.          0.
  0.         -0.         -0.         -0.         -0.         -0.
 -0.          0.         -0.         -0.         -0.         -0.0880862
 -0.         -0.         -0.         -0.         -1.42250731 -0.
 -0.         -0.         -0.         -0.          0.          0.
  0.          0.          0.          0.          0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.          0.
 -0.          0.          0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.          0.
 -0.         -0.         -0.         -0.         -0.       

[Back to top](#Index:) 

### Problem 2

#### Error in `Lasso` model

**10 Points**

Now, compute the mean squared error of the LASSO model on both the train and test data, `auto_X_train` and `auto_X_test`, respectively.  Assign this as a float to `lasso_train_mse` and `lasso_test_mse` respectively.  

In [8]:
### GRADED

lasso_train_mse = mean_squared_error(auto_pipe.predict(auto_X_train),auto_y_train)
lasso_test_mse = mean_squared_error(auto_pipe.predict(auto_X_test),auto_y_test)

# YOUR CODE HERE
#raise NotImplementedError()

# Answer check
print(lasso_train_mse)
print(lasso_test_mse)

11.860728888695974
8.984776169896323


[Back to top](#Index:) 

### Problem 3

#### Non-zero coefficients

**10 Points**

Using the `lasso_coefs` determine the number of features with non-zero coefficients and determine the name of those features as a result of the polynomial feature transformation.  

To do this, access the `named_steps['polyfeatures']` feature from the `auto_pipe` pipeline and chain the `get_feature_names_out()` to get the features name. Assign the result to `feature_names`.

Next, create a DataFrame named `lasso_df` below that has two columns -- `feature` and `coef`.  To the `feature` column assign `feature_names`. To the `coef` column assign `lasso_coefs`.

In [16]:
### GRADED
feature_names = auto_pipe.named_steps['polyfeatures'].get_feature_names_out()
lasso_df = pd.DataFrame({"feature":feature_names, "coef":lasso_coefs})


# YOUR CODE HERE
#raise NotImplementedError()

# Answer check
print(type(feature_names))
lasso_df.loc[lasso_df['coef'] != 0]

<class 'numpy.ndarray'>


Unnamed: 0,feature,coef
3,weight,-3.066605
17,displacement acceleration,-0.088086
22,horsepower acceleration,-1.422507
111,acceleration^2 origin,0.689516
113,acceleration year origin,0.423159
115,year^3,1.87257


[Back to top](#Index:) 

### Problem 4

#### Comparing `Lasso` to `SequentialFeatureSelection`

**10 Points**

As seen above, the Lasso model effectively eliminated all but 6 features from the cubic polynomial example.  Now, you are to build a `Pipeline` object called `sequential_pipe` with named steps `poly_features`, `selector`, and `linreg` with `PolynomialFeatures`, `SequentialFeatureSelector`, and `LinearRegression` respectively that uses the following parameters:

- `degree = 3` in `PolynomialFeatures` step `poly_features`
- `include_bias = False` in `PolynomialFeatures` step `poly_features`
- `n_features_to_select = 6` in `selector`

Assign this pipeline object to `sequential_pipe`.

Next, use the `fit` function on `scaled_pipe` to train your model on `auto_X_train` and `auto_y_train`. 

Use the `mean_squared_error` function to compute the MSE between `auto_y_train` and` sequential_pipe.predict(auto_X_train)`. Assign your result to `sequential_train_mse`.

Use the `mean_squared_error` function to compute the MSE between `auto_y_test` and `sequential_pipe.predict(auto_X_test)`. Assign your result to `sequential_test_mse`.



In [20]:
### GRADED
sequential_pipe = Pipeline([("poly_features", PolynomialFeatures(degree=3, include_bias=False)), \
                            ("selector",SequentialFeatureSelector(estimator=LinearRegression(),n_features_to_select = 6)),\
                            ("linreg", LinearRegression())])
sequential_train_mse = mean_squared_error(sequential_pipe.fit(auto_X_train,auto_y_train).predict(auto_X_train),auto_y_train)
sequential_test_mse = mean_squared_error(sequential_pipe.predict(auto_X_test),auto_y_test)


# YOUR CODE HERE
#raise NotImplementedError()

# Answer check
print(sequential_train_mse)
print(sequential_test_mse)
sequential_pipe

7.673332944806711
7.145098433686857


[Back to top](#Index:) 

### Problem 5

#### Using `Lasso` as a feature selector

**10 Points**

Rather than using the `Lasso` as the estimator, you can use the results of the `Lasso` to select features that are subsequently used in a `LinearRegression` estimator.  To do so, scikitlearn provides a function in the `feature_selection` module called `SelectFromModel` that will select the features based on coefficients.  

As such, using the `Lasso` estimator to select features would involve instantiating the `SelectFromModel` transformer and selecting features as:

```python
selector = SelectFromModel(Lasso())
selector.transform(auto_X_train)
```



From here, the selector can be used in a `Pipeline` after transforming the features and before building a regression model.  Such a pipeline is given below and you are to use this to fit on the training data and score on the testing data.  Which model performs better, the model with sequential feature selection or that using the `Lasso` to select the features?  

Assign your train and test error using the `model_selector_pipe` as `selector_train_mse` and `selector_test_mse` below.

For more information and examples on `SelectFromModel` see [here](https://scikit-learn.org/stable/modules/feature_selection.html#select-from-model).

In [22]:
model_selector_pipe = Pipeline([('poly_features', PolynomialFeatures(degree = 3, include_bias = False)),
                                ('scaler', StandardScaler()),
                                ('selector', SelectFromModel(Lasso())),
                                    ('linreg', LinearRegression())])

In [23]:
### GRADED
selector_train_mse = mean_squared_error(model_selector_pipe.fit(auto_X_train,auto_y_train).predict(auto_X_train),auto_y_train)
selector_test_mse = mean_squared_error(auto_y_test,model_selector_pipe.predict(auto_X_test) )


# YOUR CODE HERE
#raise NotImplementedError()

# Answer check
print(selector_train_mse)
print(selector_test_mse)

9.93192512323042
8.899543694287043


Further work could involve grid searching parameters of both the transformers and estimators, as well as including a `Ridge` regressor in the mix.  For now, you should be getting comfortable using the scikitlearn `Pipeline` object to combine transformers and estimators. This module introduced examples that can mitigate overfitting in a regression context.  It is important to note that not one strategy is always best for a modeling problem.  Instead, you should consider multiple approaches and let your goals for the model guide you to determine which model best suits your performance metric.