### Codio Activity 9.7: Ridge vs. Sequential Feature Selection

This activity focuses on comparing the results of a `Ridge` regression model with that of a `LinearRegression` model built using `SequentialFeatureSelector`.  Both of these approaches seek to limit the complexity of the model.  The `Ridge` estimator applies a penalty that shrinks the coefficients of the model while using the `SequentialFeatureSelector` selects a subset of features to build a model with.  

In [1]:
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn import set_config
set_config(display="diagram")

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### The Insurance Data

For this example, we return to the insurance data with cubic features.  Below the train and test data is loaded and the train and test sets are determined.  Recall that the target feature has the logarithm applied to it.  

In [2]:
train = pd.read_csv('codio_9_7_solution/data/train_cubic.csv')
test = pd.read_csv('codio_9_7_solution/data/test_cubic.csv')

In [3]:
X_train,y_train = train.drop('target_log',axis = 1), train['target_log']
X_train

Unnamed: 0,age,bmi,children,age^2,age bmi,age children,bmi^2,bmi children,children^2,age^3,age^2 bmi,age^2 children,age bmi^2,age bmi children,age children^2,bmi^3,bmi^2 children,bmi children^2,children^3
0,61.0,31.160,0.0,3721.0,1900.760,0.0,970.945600,0.00,0.0,226981.0,115946.360,0.0,59227.681600,0.00,0.0,30254.664896,0.0000,0.00,0.0
1,46.0,27.600,0.0,2116.0,1269.600,0.0,761.760000,0.00,0.0,97336.0,58401.600,0.0,35040.960000,0.00,0.0,21024.576000,0.0000,0.00,0.0
2,54.0,31.900,3.0,2916.0,1722.600,162.0,1017.610000,95.70,9.0,157464.0,93020.400,8748.0,54950.940000,5167.80,486.0,32461.759000,3052.8300,287.10,27.0
3,55.0,30.685,0.0,3025.0,1687.675,0.0,941.569225,0.00,0.0,166375.0,92822.125,0.0,51786.307375,0.00,0.0,28892.051669,0.0000,0.00,0.0
4,25.0,45.540,2.0,625.0,1138.500,50.0,2073.891600,91.08,4.0,15625.0,28462.500,1250.0,51847.290000,2277.00,100.0,94445.023464,4147.7832,182.16,8.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
931,18.0,31.350,4.0,324.0,564.300,72.0,982.822500,125.40,16.0,5832.0,10157.400,1296.0,17690.805000,2257.20,288.0,30811.485375,3931.2900,501.60,64.0
932,39.0,23.870,5.0,1521.0,930.930,195.0,569.776900,119.35,25.0,59319.0,36306.270,7605.0,22221.299100,4654.65,975.0,13600.574603,2848.8845,596.75,125.0
933,58.0,25.175,0.0,3364.0,1460.150,0.0,633.780625,0.00,0.0,195112.0,84688.700,0.0,36759.276250,0.00,0.0,15955.427234,0.0000,0.00,0.0
934,37.0,47.600,2.0,1369.0,1761.200,74.0,2265.760000,95.20,4.0,50653.0,65164.400,2738.0,83833.120000,3522.40,148.0,107850.176000,4531.5200,190.40,8.0


In [4]:
X_test, y_test = test.drop('target_log',axis = 1), test['target_log']

### Problem 1

#### Feature Selection Pipeline

- Define a dictionary `param_dict` with key `selector__n_features_to_select` and key `[2, 3, 4, 5]`.
- Use `GridSearchCV` construct a grid search over the `n_features_to_select` parameter of the `selector_pipe ` estimator dfined below. Assign your resul to `selector_grid`.
- Use the `predict` function on `selector_grid` to compute the predictions on `X_train`. Assign your result to `train_preds`.
- Use the `predict` function on `selector_grid` to compute the predictions on `X_test`. Assign your result to `test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `selector_train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `selector_test_mse`.


In [5]:
selector_pipe = Pipeline([('selector', SequentialFeatureSelector(LinearRegression())),
                         ('model', LinearRegression())])
selector_pipe

In [6]:
param_dict = {'selector__n_features_to_select': [2,3,4,5]}
param_dict

{'selector__n_features_to_select': [2, 3, 4, 5]}

In [7]:
selector_grid = GridSearchCV(selector_pipe, param_grid = param_dict)
selector_grid

In [8]:
selector_grid.fit(X_train,y_train)

In [9]:
train_preds = selector_grid.predict(X_train)

In [10]:
test_preds = selector_grid.predict(X_test)

In [11]:
selector_train_mse = mean_squared_error(train_preds, y_train)
selector_train_mse

0.6031734290034885

In [12]:
selector_test_mse = mean_squared_error(test_preds, y_test)
selector_test_mse

0.5655875591380699

In [13]:
print(f'Train MSE: {selector_train_mse}')
print(f'Test MSE: {selector_test_mse}')

Train MSE: 0.6031734290034885
Test MSE: 0.5655875591380699


### Problem 2

#### Ridge Grid

- Define a parameter dictionary named `ridge_param_dict` for the grid search. For this, use `np.logspace(0, 10, 50)` to create a range of alpha values `ridge__alpha`. This function generates values evenly spaced in logarithmic scale from 1 to 10^10. The parameter dictionary is specified as follows: `ridge_param_dict = {'ridge__alpha': np.logspace(0, 10, 50)}`.
- Next, construct a `Pipeline` that contains two steps -- `scaler` and `ridge` that first standard scales the data and then build a ridge regression model.  Assign your pipeline as `ridge_pipe`.  Use this to execute the grid search over the `alpha` hyperparameter of the `Ridge` estimator using the training data. Determine the mean squared error on the train and test data. 
- Use `GridSearchCV` construct a grid search over the `ridge_param_dict` parameter of the `ridge_pipe ` estimator dfined below. Assign your resul to `ridge_grid`.
- Use the `predict` function on `ridge_grid` to compute the predictions on `X_train`. Assign your result to `ridge_train_preds`.
- Use the `predict` function on `ridge_grid` to compute the predictions on `X_test`. Assign your result to `ridge_test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `ridge_train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `ridge_test_mse`.


In [34]:
# important: 'ridge__alpha' there are two '_'
ridge_param_dict = {'ridge__alpha': np.logspace(0,10,50)}
ridge_param_dict

{'ridge__alpha': array([1.00000000e+00, 1.59985872e+00, 2.55954792e+00, 4.09491506e+00,
        6.55128557e+00, 1.04811313e+01, 1.67683294e+01, 2.68269580e+01,
        4.29193426e+01, 6.86648845e+01, 1.09854114e+02, 1.75751062e+02,
        2.81176870e+02, 4.49843267e+02, 7.19685673e+02, 1.15139540e+03,
        1.84206997e+03, 2.94705170e+03, 4.71486636e+03, 7.54312006e+03,
        1.20679264e+04, 1.93069773e+04, 3.08884360e+04, 4.94171336e+04,
        7.90604321e+04, 1.26485522e+05, 2.02358965e+05, 3.23745754e+05,
        5.17947468e+05, 8.28642773e+05, 1.32571137e+06, 2.12095089e+06,
        3.39322177e+06, 5.42867544e+06, 8.68511374e+06, 1.38949549e+07,
        2.22299648e+07, 3.55648031e+07, 5.68986603e+07, 9.10298178e+07,
        1.45634848e+08, 2.32995181e+08, 3.72759372e+08, 5.96362332e+08,
        9.54095476e+08, 1.52641797e+09, 2.44205309e+09, 3.90693994e+09,
        6.25055193e+09, 1.00000000e+10])}

In [30]:
ridge_pipe = Pipeline([('scaler',StandardScaler()),
                       ('ridge', Ridge())])
#ridge_pipe

In [31]:
ridge_grid = GridSearchCV(ridge_pipe, param_grid = ridge_param_dict)
ridge_grid.fit(X_train, y_train)

In [32]:
ridge_train_preds = ridge_grid.predict(X_train)
ridge_test_preds = ridge_grid.predict(X_test)
ridge_train_mse = mean_squared_error(y_train, ridge_train_preds)
ridge_test_mse = mean_squared_error(y_test, ridge_test_preds)

In [25]:
# Professor's solution
ridge_param_dict = {'ridge__alpha': np.logspace(0, 10, 50)}
ridge_pipe = Pipeline([('scaler', StandardScaler()), 
                      ('ridge', Ridge())])
ridge_grid = GridSearchCV(ridge_pipe, param_grid=ridge_param_dict)
ridge_grid.fit(X_train, y_train)
ridge_train_preds = ridge_grid.predict(X_train)
ridge_test_preds = ridge_grid.predict(X_test)
ridge_train_mse = mean_squared_error(y_train, ridge_train_preds)
ridge_test_mse = mean_squared_error(y_test, ridge_test_preds)

In [33]:
print(f'Train MSE: {ridge_train_mse}')
print(f'Test MSE: {ridge_test_mse}')
ridge_pipe

Train MSE: 0.5870277750390861
Test MSE: 0.5532169282339873


### Problem 3

#### Examining the "best" model

Your results should suggest that the model using the sequential feature selector and `LinearRegression` estimator.  This was fit with the object `selector_grid`.  One question we may have is what was the optimal number of features selected and what were they?  

Use the `selector_grid` to extract both the feature names and their associated coefficients.  This will involve:

- `.best_estimator_`: extract the best estimator/selector pair from your grid search
- `.named_steps['selector']`: extract the selector from the pipeline
- `.named_steps['model']`: extract the model from the pipeline
- `.get_support()`: extract best features from selector.  This returns booleans as to whether feature was selected, we can use this to slice our train data.  

```python
X_train.columns[best_selector.get_support()]
```

- `.coef_`: coefficients from best model

In [35]:
best_estimator = selector_grid.best_estimator_
best_estimator

In [36]:
best_selector = best_estimator.named_steps['selector']
best_selector

In [37]:
best_model = selector_grid.best_estimator_.named_steps['model']
best_model

In [38]:
feature_names = X_train.columns[best_selector.get_support()]
feature_names

Index(['age', 'bmi children'], dtype='object')

In [39]:
coefs = best_model.coef_
coefs

array([0.03285171, 0.00368017])

In [40]:
print(best_estimator)
print(f'Features from best selector: {feature_names}.')
print('Coefficient values: ')
print('===================')
pd.DataFrame([coefs.T], columns = feature_names, index = ['model'])

Pipeline(steps=[('selector',
                 SequentialFeatureSelector(estimator=LinearRegression(),
                                           n_features_to_select=2)),
                ('model', LinearRegression())])
Features from best selector: Index(['age', 'bmi children'], dtype='object').
Coefficient values: 


Unnamed: 0,age,bmi children
model,0.032852,0.00368


### Problem 4

#### Comparing observations 

According to your model, predict the billed costs for person 1 and person 2 below:

- **Person 1**: Age = 30, bmi = 40, children = 0
- **Person 2**: Age = 45, bmi = 50, children = 2

Use the information from **Problem 3** and the model coefficients to make these predictions.

Note that you will want to transform your predictions.  From your model the predictions are in terms of the logarithm of cost.  To transform the logarithm to the actual value, use `np.exp` -- the inverse of a logarithm. Assign your predictions as floats to `person1` and `person2` below.  Your solution will be checked to two decimal point accuracy. 

In [41]:
age = [30,45]
bmi = [40,50]
children  = [0,2]


In [42]:
person1 = np.exp(best_model.intercept_ + coefs[0] * age[0] + coefs[1] * (bmi[0] * children[0]))
person1

np.float64(5898.780939517866)

In [44]:
person2 = np.exp(best_model.intercept_ + coefs[0] * age[1] + coefs[1] * (bmi[1] * children[1]))
person2

np.float64(13950.816694851044)

In [45]:
print(f'The difference between Person 1 and Person 2 is {person2 - person1: .2f}')

The difference between Person 1 and Person 2 is  8052.04


The models here could be revisited and more encoding of features and different polynomial terms can be incorporated.  More important is understanding how to construct the pipelines and interrogate the resulting models to understand what they say about your data.  Does having a higher body mass matter if one does not have children?  Does this seem reasonable?

### Codio Activity 9.8: LASSO and Sequential Feature Selection

This assignment introduces the `Ridge` regression estimator from scikitlearn.  You will revisit the insurance data from the previous assignment and experiment with varying the `alpha` parameter discussed in Video 9.4. Your work here is a basic introduction where complexity in the preprocessing steps will be added to scale your data.  For now, you are just to familiarize yourself with the `Ridge` regression estimator and its `alpha` parameter. 

This assignment compares a second regularized regression method -- the LASSO -- with that of sequential feature selection.  The LASSO will be briefly discussed below, and you will use the scikit learn implementation.  Rather than using the LASSO as a model, you are to compare it to the `SequentialFeatureSelection` transformer as a method to select important features for a regression model. 


#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)

In [46]:
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SequentialFeatureSelector, SelectFromModel
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import set_config
set_config(display="diagram")

import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px

### The Data

For this exercise you will revisit the automotive data.  The goal is again to predict the `mpg` column using the other numeric features.  You will build a polynomial model of degree 3 to compare the results of a `Lasso` and that of a `LinearRegression` model. Finally, you will use the `Lasso` estimator to select features in a pipeline with `SelectFromModel`. 

Below, the train and test data is created for you as `auto_X_train`, `auto_X_test`, `auto_y_train`, and `auto_y_test`.

In [49]:
auto = pd.read_csv('auto.csv')

In [50]:
auto.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino


In [52]:
#generate train/test data for auto
auto_X = auto.drop(['mpg','name'],axis = 1)
auto_X.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,8,307.0,130.0,3504,12.0,70,1
1,8,350.0,165.0,3693,11.5,70,1
2,8,318.0,150.0,3436,11.0,70,1
3,8,304.0,150.0,3433,12.0,70,1
4,8,302.0,140.0,3449,10.5,70,1


In [53]:
auto_y = auto['mpg']

In [55]:
auto_X_train, auto_X_test,auto_y_train,auto_y_test = train_test_split(auto_X, auto_y, 
                                                                      test_size = 0.3, 
                                                                      random_state = 42)
auto_X_train.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,year,origin
109,4,108.0,94.0,2379,16.5,73,3
17,6,200.0,85.0,2587,16.0,70,1
318,4,119.0,92.0,2434,15.0,80,3
24,6,199.0,90.0,2648,15.0,70,1
126,6,250.0,100.0,3336,17.0,74,1


### Problem 1

#### The auto data

To start, build a `Pipeline` named `auto_pipe` with named steps `polyfeatures`, `scaler` and `lasso` model that utilize `PolynomialFeatures`, `StandardScaler`, and the `Lasso` estimator with the following parameters:

- `degree = 3` in `PolynomialFeatures`
- `include_bias = False` in `PolynomialFeatures`
- `random_state = 42` in `Lasso`

Fit the pipeline on `auto_X_train` and `auto_y_train` data given.  Extract the lasso coefficients from the pipeline and assign them as an array to `lasso_coefs` below.  

**HINT**: Use the `.named_steps['lasso']` to extract that lasso estimator and use the `.coef_` attribute after fitting to access the model coefficients.

In [56]:
auto_pipe = Pipeline([('polyfeatures', PolynomialFeatures(degree = 3, include_bias = False)),
                     ('scaler', StandardScaler()),
                     ('lasso', Lasso(random_state = 42))])
auto_pipe

In [57]:
auto_pipe.fit(auto_X_train, auto_y_train)

In [59]:
lasso_coefs = auto_pipe.named_steps['lasso'].coef_
lasso_coefs

array([-0.        , -0.        , -0.        , -3.06660503,  0.        ,
        0.        ,  0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        ,  0.        , -0.        ,
       -0.        , -0.        , -0.0880862 , -0.        , -0.        ,
       -0.        , -0.        , -1.42250731, -0.        , -0.        ,
       -0.        , -0.        , -0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.        ,  0.        ,
       -0.        ,  0.        ,  0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.  

In [60]:
print(type(lasso_coefs))
print(lasso_coefs)
auto_pipe

<class 'numpy.ndarray'>
[-0.         -0.         -0.         -3.06660503  0.          0.
  0.         -0.         -0.         -0.         -0.         -0.
 -0.          0.         -0.         -0.         -0.         -0.0880862
 -0.         -0.         -0.         -0.         -1.42250731 -0.
 -0.         -0.         -0.         -0.          0.          0.
  0.          0.          0.          0.          0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.          0.
 -0.          0.          0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.          0.
 -0.         -0.         -0.         -0.         -0.       

### Problem 2

#### Error in `Lasso` model

Now, compute the mean squared error of the LASSO model on both the train and test data, `auto_X_train` and `auto_X_test`, respectively.  Assign this as a float to `lasso_train_mse` and `lasso_test_mse` respectively.  

In [61]:
lasso_train_mse = mean_squared_error(auto_pipe.predict(auto_X_train), auto_y_train)
lasso_train_mse

11.860728888695974

In [62]:
lasso_test_mse = mean_squared_error(auto_pipe.predict(auto_X_test), auto_y_test)
lasso_test_mse

8.984776169896321

### Problem 3

#### Non-zero coefficients

Using the `lasso_coefs` determine the number of features with non-zero coefficients and determine the name of those features as a result of the polynomial feature transformation.  

To do this, access the `named_steps['polyfeatures']` feature from the `auto_pipe` pipeline and chain the `get_feature_names_out()` to get the features name. Assign the resul to `feature_names`.

Next, create a DataFrame named `lasso_df` below that has two columns -- `feature` and `coef`.  To the `feature` column assign `feature_names`. To the `coef` column assign `lasso_coefs`.

In [63]:
feature_names = auto_pipe.named_steps['polyfeatures'].get_feature_names_out()
feature_names

array(['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin', 'cylinders^2',
       'cylinders displacement', 'cylinders horsepower',
       'cylinders weight', 'cylinders acceleration', 'cylinders year',
       'cylinders origin', 'displacement^2', 'displacement horsepower',
       'displacement weight', 'displacement acceleration',
       'displacement year', 'displacement origin', 'horsepower^2',
       'horsepower weight', 'horsepower acceleration', 'horsepower year',
       'horsepower origin', 'weight^2', 'weight acceleration',
       'weight year', 'weight origin', 'acceleration^2',
       'acceleration year', 'acceleration origin', 'year^2',
       'year origin', 'origin^2', 'cylinders^3',
       'cylinders^2 displacement', 'cylinders^2 horsepower',
       'cylinders^2 weight', 'cylinders^2 acceleration',
       'cylinders^2 year', 'cylinders^2 origin',
       'cylinders displacement^2', 'cylinders displacement horsepower',
       'cylinde

In [64]:
lasso_df = pd.DataFrame({'Feature':feature_names, 'coef':lasso_coefs})
lasso_df

Unnamed: 0,Feature,coef
0,cylinders,-0.000000
1,displacement,-0.000000
2,horsepower,-0.000000
3,weight,-3.066605
4,acceleration,0.000000
...,...,...
114,acceleration origin^2,0.000000
115,year^3,1.872570
116,year^2 origin,0.000000
117,year origin^2,0.000000


In [65]:
lasso_df[lasso_df['coef'] != 0]

Unnamed: 0,Feature,coef
3,weight,-3.066605
17,displacement acceleration,-0.088086
22,horsepower acceleration,-1.422507
111,acceleration^2 origin,0.689516
113,acceleration year origin,0.423159
115,year^3,1.87257


### Problem 4

#### Comparing `Lasso` to `SequentialFeatureSelection`

As seen above, the Lasso model effectively eliminated all but 6 features from the cubic polynomial example.  Now, you are to build a `Pipeline` object called `sequential_pipe` with named steps `poly_features`, `selector`, and `linreg` with `PolynomialFeatures`, `SequentialFeatureSelector`, and `LinearRegression` respectively that uses the folowing parameters:

- `degree = 3` in `PolynomialFeatures` step `poly_features`
- `include_bias = False` in `PolynomialFeatures` step `poly_features`
- `n_features_to_select = 6` in `selector`

Assign this pipeline object to `sequential_pipe`.

Next, use the `fit` function on `scaled_pipe` to train you model on `auto_X_train` and `auto_y_train`. 

Use the `mean_squared_error` function to compute the MSE between `auto_y_train` and` sequential_pipe.predict(auto_X_train)`. Assign your result to `sequential_train_mse`.

Use the `mean_squared_error` function to compute the MSE between `auto_y_test` and `sequential_pipe.predict(auto_X_test)`. Assign your result to `sequential_test_mse`.

In [68]:
sequential_pipe = Pipeline([('poly_features', PolynomialFeatures(degree = 3, include_bias = False)),
                            ('selector',SequentialFeatureSelector(LinearRegression(), n_features_to_select = 6)), 
                            ('linreg',LinearRegression())])

In [69]:
sequential_pipe.fit(auto_X_train, auto_y_train)

In [71]:
sequential_train_mse = mean_squared_error(auto_y_train, sequential_pipe.predict(auto_X_train))
sequential_train_mse

7.67333294480673

In [72]:
sequential_test_mse = mean_squared_error(auto_y_test, sequential_pipe.predict(auto_X_test))
sequential_test_mse

7.145098435566002

In [73]:
sequential_pipe

### Problem 5

#### Using `Lasso` as a feature selector

Rather than using the `Lasso` as the estimator, you can use the results of the `Lasso` to select features that are subsequently used in a `LinearRegression` estimator.  To do so, scikitlearn provides a function in the `feature_selection` module called `SelectFromModel` that will select the features based on coefficients.  

As such, using the `Lasso` estimator to select features would involve instantiating the `SelectFromModel` transformer and selecting features as:

```python
selector = SelectFromModel(Lasso())
selector.transform(auto_X_train)
```



From here, the selector can be used in a `Pipeline` after transforming the features and before building a regression model.  Such a pipeline is given below and you are to use this to fit on the training data and score on the testing data.  Which model performs better, the model with sequential feature selection or that using the `Lasso` to select the features?  

Assign your train and test error using the `model_selector_pipe` as `selector_train_mse` and `selector_test_mse` below.

For more information and examples on `SelectFromModel` see [here](https://scikit-learn.org/stable/modules/feature_selection.html#select-from-model).

In [75]:
model_selector_pipe = Pipeline([('poly_features', PolynomialFeatures(degree = 3, include_bias = False)),
                               ('scaler',StandardScaler()),
                               ('selector', SelectFromModel(Lasso())),
                               ('linreg', LinearRegression())])

In [76]:
model_selector_pipe.fit(auto_X_train, auto_y_train)

In [77]:
selector_train_mse = mean_squared_error(auto_y_train, model_selector_pipe.predict(auto_X_train))
selector_test_mse = mean_squared_error(auto_y_test, model_selector_pipe.predict(auto_X_test))
### END SOLUTION

# Answer check
print(selector_train_mse)
print(selector_test_mse)

9.93192512323042
8.899543694287045


Further work could involve grid searching parameters of both the transformers and estimators, as well as including a `Ridge` regressor in the mix.  For now, you should be getting comfortable using the scikitlearn `Pipeline` object to combine transformers and estimators. This module introduced examples that can mitigate overfitting in a regression context.  It is important to note that not one strategy is not always best for a modeling problem.  Instead, you should consider multiple approaches and let your goals for the model guide you to determine which model best suits your performance metric.