# Regularization in Machine Learning

- it is a technique in ML to prevent or reduce overfitting, as well as improve the generalization of the model.
- It adds penalty component to the model's formula: $y = B_0 + B_1X_1 + \epsilon$ (epsilon is the penalty level)
- The penalty level is a hyperparameter. There's no one good one-size fits all value for regularization. You can control the intensity using hyperparameter tuning.
- Regularization:
    - L1 Regularization (Lasso)
        - Technique: apply the penalty with the absolute value of the coefficient
        - When to use: if deriving and interpreting coefficients is important
    - L2 Regularization (Ridge)
        - Technique: apply the penalty with squared value of coefficients
        - When to use: when analyzing coefficients is not the primary concern and reducing the model complexity for best generalization is more important.
    - ElasticNet
        - Technique: using weighted average and combining both L1 and L2.
        - Use case: when neither L1 and L2 are giving good results
- Applying Regularization: it depends on the model
    - For LinearRegression, you'll need to switch to Ridge or Lasso regression
    - For other models, regularization is applied through hyperparameters. For example, in `LogisticRegression()` we have `{‘l1’, ‘l2’, ‘elasticnet’, None}`, default=’l2’

![reg](https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F17277811%2F75f6d401b8efcc9329cde3ffe0bf6d71%2Fridge2.png?generation=1723038136194204&alt=media)

## Automating Multiple Regression Models with Hyperparameter Tuning

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet

#model evaluation
from sklearn.metrics import mean_squared_error, r2_score 

In [2]:
path = '/Users/bassel_instructor/Documents/Datasets/'

df = pd.read_csv(path+'insurance.csv')
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
0,19,female,27.9,0,yes,southwest,16884.92
1,18,male,33.8,1,no,southeast,1725.55
2,28,male,33.0,3,no,southeast,4449.46
3,33,male,22.7,0,no,northwest,21984.47
4,32,male,28.9,0,no,northwest,3866.86


> Expenses is our target/dependent column. It represents the annual medical expenses.

In [3]:
df.isna().sum()

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
expenses    0
dtype: int64

In [4]:
for col in ['sex', 'smoker', 'region']:
    print(f'{col}: {df[col].unique()}')

sex: ['female' 'male']
smoker: ['yes' 'no']
region: ['southwest' 'southeast' 'northwest' 'northeast']


In [5]:
df_org = df.copy()

In [6]:
df = pd.get_dummies(data=df, columns=['region'], dtype=int)
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,expenses,region_northeast,region_northwest,region_southeast,region_southwest
0,19,female,27.9,0,yes,16884.92,0,0,0,1
1,18,male,33.8,1,no,1725.55,0,0,1,0
2,28,male,33.0,3,no,4449.46,0,0,1,0
3,33,male,22.7,0,no,21984.47,0,1,0,0
4,32,male,28.9,0,no,3866.86,0,1,0,0


In [7]:
df['sex'] = pd.factorize(df['sex'])[0]

In [9]:
df['smoker'] = pd.factorize(df['smoker'])[0]
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,expenses,region_northeast,region_northwest,region_southeast,region_southwest
0,19,0,27.9,0,0,16884.92,0,0,0,1
1,18,1,33.8,1,1,1725.55,0,0,1,0
2,28,1,33.0,3,1,4449.46,0,0,1,0
3,33,1,22.7,0,1,21984.47,0,1,0,0
4,32,1,28.9,0,1,3866.86,0,1,0,0


> It's recommended to perform additional data preprocessing and feature engineering, but we'll skip these for now. 
> Suggestions: outlier treatment, normalization, feature selection, binning, etc...

In [13]:
X = df.drop(columns=['expenses'], axis=1)
y = df['expenses']

**Objective** train and evaluate all 4 models with hyperparameter tuning.

Step 1: Build the model and hyperparameter grid

In [33]:
# using a list of dictionaries

models = [
    {'name': 'Linear Regression', 'model':LinearRegression()}, # basic linear regression no hyperparameters - baseline
    {'name': 'Ridge Regression', 'model':Ridge(), 'params':{'alpha':[0.01, 0.1, 1, 10, 100]}},
    {'name': 'Lasso Regression', 'model':Lasso(), 'params':{'alpha':[0.01, 0.1, 1, 10, 100]}},
    {'name': 'ElasticNet Regression', 'model':ElasticNet(), 'params':{'alpha':[0.01, 0.1, 1], 'l1_ratio':[0.2,0.3,0.6, 0.7]}}
]

In [34]:
for model_info in models:
    print(model_info)

{'name': 'Linear Regression', 'model': LinearRegression()}
{'name': 'Ridge Regression', 'model': Ridge(), 'params': {'alpha': [0.01, 0.1, 1, 10, 100]}}
{'name': 'Lasso Regression', 'model': Lasso(), 'params': {'alpha': [0.01, 0.1, 1, 10, 100]}}
{'name': 'ElasticNet Regression', 'model': ElasticNet(), 'params': {'alpha': [0.01, 0.1, 1], 'l1_ratio': [0.2, 0.3, 0.6, 0.7]}}


We can approach this with 2 different methods:
- **Method 1**: quick and easy using `GridSearchCV` and the main dataset
    - No pre-split
    - Choose a specific metric
    - Let GS run the simulation based on the specified metric
    - Get the `best_estimator_`

- **Method 2**: Use it uses double evaluation with multiple metrics. It's also effective if you had a large dataset enough for mutiple splits. 
    - Pre-split the data (train vs test)
    - Use the default evaluation metric from GS
    - After it's done, use the best model and run additional evaluation on the test subset
    - For regression, we can use 2 metrics:
        - `mean_squared_error`
        - `r2_score` 

### Method 2

In [35]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=20)

In [36]:
#track and store the parameters for a final evaluation 
model_name = []
best_parameters = []
mean_sq_err_scores = []
r_sq_scores = []


#iterate over each model
for model_info in models:
    model_gs = GridSearchCV(estimator=model_info['model'],
                            param_grid=model_info.get('params',{},),
                            cv=5,
                            scoring='r2'
                            )
    
    model_gs.fit(X_train, y_train) #use this if you want to add additional evaluation for X_test and y_test

    #additional evaluation with test data
    #calculate the predicted values for the evaluation
    gs_best_model = model_gs.best_estimator_ #pick the model with the best hyperparameters
    y_pred = gs_best_model.predict(X_test)

    #calculate the evaluation metrics
    mse_val = mean_squared_error(y_test, y_pred)
    r2_val = r2_score(y_test, y_pred)

    #append metadata to the empty lists
    model_name.append(model_info['name'])
    best_parameters.append(model_gs.best_params_)
    mean_sq_err_scores.append(mse_val)
    r_sq_scores.append(r2_val)
    

In [42]:
5 * 1 + 5 * 5 + 5 * 5 + 5 * 3 * 4

115

> We expect 100 iterations

In [38]:
model_name

['Linear Regression',
 'Ridge Regression',
 'Lasso Regression',
 'ElasticNet Regression']

In [39]:
best_parameters

[{}, {'alpha': 1}, {'alpha': 100}, {'alpha': 0.01, 'l1_ratio': 0.7}]

In [40]:
result_dict = {'model_names':model_name,
               'best_parameters':best_parameters,
               'mean_sq_err_scores':mean_sq_err_scores,
               'r2_sq_scores': r_sq_scores}

In [41]:
pd.DataFrame(result_dict)

Unnamed: 0,model_names,best_parameters,mean_sq_err_scores,r2_sq_scores
0,Linear Regression,{},30685750.0,0.795936
1,Ridge Regression,{'alpha': 1},30717450.0,0.795725
2,Lasso Regression,{'alpha': 100},30989120.0,0.793918
3,ElasticNet Regression,"{'alpha': 0.01, 'l1_ratio': 0.7}",30807560.0,0.795126


## Using RandomForestRegressor

In [47]:
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestRegressor


In [49]:
rfr_model = RandomForestRegressor(n_estimators=100, random_state=30)

In [50]:
#perform cross validation
rfr_scores = cross_val_score(rfr_model, X, y, cv=5, scoring='neg_mean_squared_error') 

In [51]:
rfr_scores

array([-23136126.24721017, -29038337.15155824, -18939268.93835339,
       -25692547.71442701, -22219908.07322259])

In [53]:
cross_val_outcome = np.mean(rfr_scores)
cross_val_outcome

-23805237.62495428