### Codio Activity 9.7: Ridge vs. Sequential Feature Selection

**Expected Time: 60 Minutes**

**Total Points: 40**

This activity focuses on comparing the results of a `Ridge` regression model with that of a `LinearRegression` model built using `SequentialFeatureSelector`.  Both of these approaches seek to limit the complexity of the model.  The `Ridge` estimator applies a penalty that shrinks the coefficients of the model while using the `SequentialFeatureSelector` selects a subset of features to build a model with.  

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)


In [2]:
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn import set_config
set_config(display="diagram")

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### The Insurance Data

For this example, we return to the insurance data with cubic features.  Below the train and test data is loaded and the train and test sets are determined.  Recall that the target feature has the logarithm applied to it.  

In [5]:
train = pd.read_csv('data/train_cubic.csv')
test = pd.read_csv('data/test_cubic.csv')

In [7]:
X_train, y_train = train.drop('target_log', axis = 1), train['target_log']
X_test, y_test = test.drop('target_log', axis = 1), test['target_log']

### Problem 1

#### Feature Selection Pipeline

**10 Points**

- Define a dictionary `param_dict` with key `selector__n_features_to_select` and key `[2, 3, 4, 5]`.
- Use `GridSearchCV` construct a grid search over the `n_features_to_select` parameter of the `selector_pipe ` estimator dfined below. Assign your resul to `selector_grid`.
- Use the `predict` function on `selector_grid` to compute the predictions on `X_train`. Assign your result to `train_preds`.
- Use the `predict` function on `selector_grid` to compute the predictions on `X_test`. Assign your result to `test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `selector_train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `selector_test_mse`.


In [10]:
selector_pipe = Pipeline([('selector', SequentialFeatureSelector(LinearRegression())),
                         ('model', LinearRegression())])
selector_pipe

In [14]:
### GRADED

# YOUR CODE HERE
param_dict = {'selector__n_features_to_select': [2, 3, 4, 5]}
selector_grid = GridSearchCV(estimator=selector_pipe, param_grid=param_dict, scoring='neg_mean_squared_error')
selector_grid.fit(X_train, y_train)

selector_train_mse = mean_squared_error(y_train, selector_grid.predict(X_train))
selector_test_mse = mean_squared_error(y_test, selector_grid.predict(X_test))

# ANSWER CHECK
print(f'Train MSE: {selector_train_mse}')
print(f'Test MSE: {selector_test_mse}')

Train MSE: 0.6031734290034885
Test MSE: 0.5655875591380699


In [18]:
# Assuming the model is fitted using selector_grid
# Retrieve the best estimator from the grid search
best_model = selector_grid.best_estimator_

# Retrieve the coefficients from the linear model within the pipeline
coefficients = best_model.named_steps['model'].coef_

# Get the feature names from the selector (e.g., RFE or any feature selector in your pipeline)
selected_features = best_model.named_steps['selector'].get_support(indices=True)
feature_names = X_train.columns[selected_features]

# Create a DataFrame with feature names and their corresponding coefficients
coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': coefficients
})

# Display the DataFrame
coef_df

Unnamed: 0,Feature,Coefficient
0,age,0.032852
1,bmi children,0.00368


### Problem 2

#### Ridge Grid

**10 Points**

- Define a parameter dictionary named `ridge_param_dict` for the grid search. For this, use `np.logspace(0, 10, 50)` to create a range of alpha values `ridge__alpha`. This function generates values evenly spaced in logarithmic scale from 1 to 10^10. The parameter dictionary is specified as follows: `ridge_param_dict = {'ridge__alpha': np.logspace(0, 10, 50)}`.
- Next, construct a `Pipeline` that contains two steps -- `scaler` and `ridge` that first standard scales the data and then build a ridge regression model.  Assign your pipeline as `ridge_pipe`.  Use this to execute the grid search over the `alpha` hyperparameter of the `Ridge` estimator using the training data. Determine the mean squared error on the train and test data. 
- Use `GridSearchCV` construct a grid search over the `ridge_param_dict` parameter of the `ridge_pipe ` estimator dfined below. Assign your resul to `ridge_grid`.
- Use the `predict` function on `ridge_grid` to compute the predictions on `X_train`. Assign your result to `ridge_train_preds`.
- Use the `predict` function on `ridge_grid` to compute the predictions on `X_test`. Assign your result to `ridge_test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `ridge_train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `ridge_test_mse`.


In [20]:
### GRADED

# YOUR CODE HERE
ridge_param_dict = {'ridge__alpha': np.logspace(0, 10, 50)}

ridge_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])

ridge_grid = GridSearchCV(estimator=ridge_pipe, param_grid=ridge_param_dict, scoring='neg_mean_squared_error')
ridge_grid.fit(X_train, y_train)

ridge_train_preds = ridge_grid.predict(X_train)
ridge_test_preds = ridge_grid.predict(X_test)

ridge_train_mse = mean_squared_error(y_train, ridge_train_preds)
ridge_test_mse = mean_squared_error(y_test, ridge_test_preds)

# ANSWER CHECK
print(f'Train MSE: {ridge_train_mse}')
print(f'Test MSE: {ridge_test_mse}')
ridge_pipe

Train MSE: 0.5870277750390882
Test MSE: 0.5532169282339894


In [22]:
# Assuming the grid search object is ridge_grid and the training data is X_train

# Retrieve the best estimator (pipeline with the best alpha)
best_model = ridge_grid.best_estimator_

# Extract the coefficients from the Ridge model in the pipeline
coefficients = best_model.named_steps['ridge'].coef_

# Get the feature names (assuming X_train is a DataFrame with named columns)
feature_names = X_train.columns

# Create a DataFrame to organize and display the coefficients
coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': coefficients
})

# Display the DataFrame
coef_df

Unnamed: 0,Feature,Coefficient
0,age,0.609536
1,bmi,0.151901
2,children,0.4383
3,age^2,0.029123
4,age bmi,0.092288
5,age children,-0.685879
6,bmi^2,0.151619
7,bmi children,-0.189619
8,children^2,0.460391
9,age^3,-0.03081


### Problem 3

#### Examining the "best" model

**10 Points**

Your results should suggest that the model using the sequential feature selector and `LinearRegression` estimator.  This was fit with the object `selector_grid`.  One question we may have is what was the optimal number of features selected and what were they?  

Use the `selector_grid` to extract both the feature names and their associated coefficients.  This will involve:

- `.best_estimator_`: extract the best estimator/selector pair from your grid search
- `.named_steps['selector']`: extract the selector from the pipeline
- `.named_steps['model']`: extract the model from the pipeline
- `.get_support()`: extract best features from selector.  This returns booleans as to whether feature was selected, we can use this to slice our train data.  

```python
X_train.columns[best_selector.get_support()]
```

- `.coef_`: coefficients from best model

In [44]:
### GRADED

# YOUR CODE HERE
best_estimator = selector_grid.best_estimator_
best_selector = best_estimator.named_steps['selector']
best_model = best_estimator.named_steps['model']
feature_names = X_train.columns[best_selector.get_support()]
coefs = best_estimator.named_steps['model'].coef_

# Answer check
print(best_estimator)
print(f'Features from best selector: {feature_names}.')
print('Coefficient values: ')
print('===================')
pd.DataFrame([coefs.T], columns = feature_names, index = ['model'])

Pipeline(steps=[('selector',
                 SequentialFeatureSelector(estimator=LinearRegression(),
                                           n_features_to_select=2)),
                ('model', LinearRegression())])
Features from best selector: Index(['age', 'bmi children'], dtype='object').
Coefficient values: 


Unnamed: 0,age,bmi children
model,0.032852,0.00368


### Problem 4

#### Comparing observations 

**10 Points**

According to your model, predict the billed costs for person 1 and person 2 below:

- **Person 1**: Age = 30, bmi = 40, children = 0
- **Person 2**: Age = 45, bmi = 50, children = 2

Use the information from **Problem 3** and the model coefficients to make these predictions.

Note that you will want to transform your predictions.  From your model the predictions are in terms of the logarithm of cost.  To transform the logarithm to the actual value, use `np.exp` -- the inverse of a logarithm. Assign your predictions as floats to `person1` and `person2` below.  Your solution will be checked to two decimal point accuracy. 

In [42]:
### GRADED

# YOUR CODE HERE
ages = [30, 45]
bmis = [40, 50]
childrens = [0, 2]
person1 = float(np.exp(best_model.predict([[ages[0], bmis[0]*childrens[0]]])))
person2 = float(np.exp(best_model.predict([[ages[1], bmis[1]*childrens[1]]])))

# Answer check
print(f'The difference between Person 1 and Person 2 is {person2 - person1: .2f}')

The difference between Person 1 and Person 2 is  8052.04


  person1 = float(np.exp(best_model.predict([[ages[0], bmis[0]*childrens[0]]])))
  person2 = float(np.exp(best_model.predict([[ages[1], bmis[1]*childrens[1]]])))


The models here could be revisited and more encoding of features and different polynomial terms can be incorporated.  More important is understanding how to construct the pipelines and interrogate the resulting models to understand what they say about your data.  Does having a higher body mass matter if one does not have children?  Does this seem reasonable?