# ML Regression Cheetsheet

1. [Linear Regression](#linear)<br>
2. [Regularized Regression](#regularized)<br>
    2.1 [LASSO (L1 Norm)](#lasso)<br>
    2.2 [RIDGE (L2 Norm)](#ridge)<br>
    2.3 [ELASTIC-NET](#elastic)<br>
3. [Random Forrest Regression](#rf)<br>
4. [Gradient Boosting](#gb)<br>
5. [XGBoost Regression](#xgboost)<br>

In [1]:
# Load general modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

# Display all rows and columns
pd.set_option( 'display.max_columns', None )

# Supress warnings
import warnings
warnings.filterwarnings( 'ignore' )

In [2]:
# Import Boston data
import statsmodels.api as sm
from sklearn.datasets import load_boston

data = load_boston() # sklearn.utils.Bunch datatype
X = pd.DataFrame( data.data, columns=data.feature_names ) # feature df
y = data.target # target ['MEDV']

print( data.DESCR ) #view description of data


.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

**Create Preprocessing Pipeline**

In [3]:
# Dummify (n-1) categorical feature
def prepro_dummy( X_mat ):
    df = X_mat.copy()
    rad = pd.get_dummies(df['RAD'],      # dummy RAD feature
                         prefix='RAD',       # prefix of new columns
                         prefix_sep='__')    # separates prefix and rest of name
    rad = rad.drop( 'RAD__1.0', axis=1 )   # want n-1 dummy features to minimize sparcity
    df = pd.concat( [df, rad], axis=1 )    # combine original and dummified DFs
    df.drop( 'RAD', axis=1, inplace=True ) # remove original RAD categorical feature

    return df

    

In [4]:
# Convert ZN to bool due to imbalance
def prepro_ZN( X_mat ):
    df = X_mat.copy()
    df['ZN'] = df['ZN'].apply( lambda x: 1 if x > 0 else 0 )
    
    return df

In [5]:
# Remove CHAS feature
def prepro_CHAS( X_mat ):
    df = X_mat.copy()
    df.drop('CHAS', axis=1, inplace=True)
    
    return df

In [6]:
# Standardize Scale of Columns
from sklearn.preprocessing import StandardScaler
def prepro_scale( X_mat ):
    df = X_mat.copy()                            # copy data
    scale = StandardScaler()                     # instantiate model
    trans = scale.fit_transform( df )            # scale feature df
    df = pd.DataFrame( trans, columns=df.columns ) # convert to df
    
    return df

In [7]:
# Combine to preprocessing pipleline for future testing
def preprocess_data( X_mat ):
    df = X_mat.copy()
    df = prepro_dummy(df) # dummify categorical data
    df = prepro_ZN(df)    # convert ZN to boolean
    df = prepro_CHAS(df)  # remove CHAS feature
    df = prepro_scale(df) # scale feature matrix
    
    return df

In [8]:
# Train test split data for machine learning
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X,             
                                                    y,              
                                                    test_size = 0.3,  # size of test split
                                                    random_state = 0) # RandomState instance same each time

# Process feature data
x_train_pro = preprocess_data( x_train )
x_test_pro = preprocess_data( x_test )


<p><a name="linear"></a></p>

## Linear Regression

**Create Linear Regression Model**

In [9]:
from sklearn.linear_model import LinearRegression

linear = LinearRegression()                  # instantiate general model
linear = linear.fit( x_train_pro, y_train )  # fit model with data


**Investigate Model Attributes**

`linear.score(X, y)` = coeff of determination (R^2)

`linear.coef_` = model feature coefficients

`linear.intercept_` = y-intercept

**Evaluate Model Performance/Accuracy**

In [10]:
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score 

# Predict train and test targets
y_pred_train = linear.predict( x_train_pro )   # predict training target
y_pred_test = linear.predict( x_test_pro )     # predict test target

# Print results
print('LINEAR REGRESSION')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Test RMSE: %0.2f' % np.sqrt( MSE(y_test, y_pred_test) ))
print("")
print('Train R^2: %0.2f' % linear.score(x_train_pro, y_train))
print('Test R^2: %0.2f' % linear.score(x_test_pro, y_test))



LINEAR REGRESSION
--------------------------------------
Train RMSE: 4.47
Test RMSE: 5.21

Train R^2: 0.76
Test R^2: 0.67


**RMSE of train set << test set**

*Model has high variance and low bias*
- High variance = model is not generalizable
- Low bias = model captures noise as the true relationship
    
*Want to introduct more bias in model*
- Reduce variance (ie. overfit)
- Improve RMSE_test


<p><a name="regularized"></a></p>

## 2. Regularized Regression
- Slope of model describes how sensitive model is to input feature
    - Penalize slopes to introduce bias (thus reducing variance)
    
- Amount of penalization determined by hyperparameter

<p><a name="lasso"></a></p>

### 2.1 Lasso (L1 Norm) Regularized Regression


**Create LASSO Model**

In [11]:
from sklearn.linear_model import Lasso

# Instantiate general model
lasso = Lasso()


**Perform Grid Search to tune hyperparameter**

In [12]:
from sklearn.model_selection import GridSearchCV

param_grid = {'alpha': [0.1, 0.5, 1, 2, 10],             # penalization term
              'fit_intercept': [True, False]}            # calculate intercept or not (beta0)

# Instantiate GridSearchCV model
grid = GridSearchCV(estimator=lasso,
                    param_grid=param_grid,
                    cv=5)

# Fit training data to GridSearchCV model
grid.fit( x_train_pro, y_train)

# Assign tuned parameters to model 
lasso = grid.best_estimator_ # best model


**Investigate GridSearch Attributes**

`grid.best_estimator_` = return all parameters of best model

`grid.best_params_` = returns only params searched (ie. param_grid features)

`grid.best_estimator_.coef_` = returns best model's coeffs

`grid.best_estimator_.intercept` = returns best model's intercept

`grid.best_score_` = returns best R^2 score (able to choose metric)


In [13]:
grid.best_params_ 

{'alpha': 0.1, 'fit_intercept': True}

**Investigate Model Attributes**

`lasso.score(X, y)` = coeff of determination (R^2)

`lasso.coef_` = model feature coefficients

`lasso.intercept_` = y-intercept

**Feature Importance**

In [14]:
cols = x_train_pro.columns.tolist() # list features
coefs_df = pd.DataFrame(lasso.coef_, index=cols, columns=['coeff_value']) # df of coefficients
coefs_df[ coefs_df['coeff_value'] !=0 ].sort_values(by='coeff_value',ascending=False) # important features


Unnamed: 0,coeff_value
RM,3.722896
CHAS,0.964789
RAD,0.20469
ZN,0.047251
B,0.007958
TAX,-0.012944
AGE,-0.021431
INDUS,-0.039925
CRIM,-0.113118
LSTAT,-0.523924


**Evaluate Model Performance/Accuracy**

In [16]:
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score 

# Predict train and test targets
y_pred_train = lasso.predict( x_train_pro )   # predict training target
y_pred_test = lasso.predict( x_test_pro )     # predict test target

# Print results
print('LASSO (L1) REGRESSION')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Test RMSE: %0.2f' % np.sqrt( MSE(y_test, y_pred_test) ))
print("")
print('Train R^2: %0.2f' % lasso.score(x_train_pro, y_train))
print('Test R^2: %0.2f' % lasso.score(x_test_pro, y_test))


LASSO (L1) REGRESSION
--------------------------------------
Train RMSE: 4.57
Test RMSE: 5.37

Train R^2: 0.75
Test R^2: 0.65


<p><a name="ridge"></a></p>

### 2.2 Ridge (L2 Norm) Regularized Regression


**Create RIDGE Model**

In [17]:
from sklearn.linear_model import Ridge

# Instantiate general model
ridge = Ridge()


**Perform Grid Search to tune hyperparameter**

In [18]:
from sklearn.model_selection import GridSearchCV

param_grid = {'alpha': [0.1, 0.5, 1, 2, 10],             # penalization term
              'fit_intercept': [True, False]}            # calculate intercept or not (beta0)

# Instantiate GridSearchCV model
grid = GridSearchCV(estimator=ridge,
                    param_grid=param_grid,
                    cv=5)

# Fit training data to GridSearchCV model
grid.fit(x_train_pro, y_train)

# Assign tuned parameters to model 
ridge = grid.best_estimator_ # best model


**Investigate GridSearch Attributes**

`grid.best_estimator_` = return all parameters of best model

`grid.best_params_` = returns only params searched (ie. param_grid features)

`grid.best_estimator_.coef_` = returns best model's coeffs

`grid.best_estimator_.intercept` = returns best model's intercept

`grid.best_estimator_.score` = returns best R^2 score (able to choose metric)


In [19]:
grid.best_params_

{'alpha': 0.1, 'fit_intercept': True}

**Investigate Model Attributes**

`ridge.score(X, y)` = coeff of determination (R^2)

`ridge.coef_` = model feature coefficients

`ridge.intercept_` = y-intercept

**Evaluate Model Performance/Accuracy**

In [21]:
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score 

# Predict train and test targets
y_pred_train = ridge.predict( x_train_pro )   # predict training target
y_pred_test = ridge.predict( x_test_pro )     # predict test target

# Print results
print('RIDGE (L2) REGRESSION')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Test RMSE: %0.2f' % np.sqrt( MSE(y_test, y_pred_test) ))
print("")
print('Train R^2: %0.2f' % ridge.score(x_train_pro, y_train))
print('Test R^2: %0.2f' % ridge.score(x_test_pro, y_test))



RIDGE (L2) REGRESSION
--------------------------------------
Train RMSE: 4.47
Test RMSE: 5.22

Train R^2: 0.76
Test R^2: 0.67


<p><a name="elastic"></a></p>

### 2.3 Elastic-Net Regularized Regression


**Create ELASTIC NET Model**

In [22]:
from sklearn.linear_model import ElasticNet

# Instantiate general model
elastic = ElasticNet()


**Perform Grid Search to tune hyperparameters**

In [23]:
from sklearn.model_selection import GridSearchCV

# Define possible hyperparameter values
param_grid = {'alpha': [0.1, 0.5, 1, 2, 10],             # penalization term
              'l1_ratio': [0, 0.1, 0.25, 0.5, 0.75, 1],  # penalization weight (l1=0, l2=1)
              'fit_intercept': [True, False]}            # calculate intercept or not (beta0)

# Instantiate GridSearchCV model
grid = GridSearchCV(estimator = elastic,
                    param_grid = param_grid,
                    cv = 5)

# Fit training data to GridSearchCV model
grid.fit( x_train_pro, y_train) # will run permutation of hyperparameter values

# Assign tuned parameters to ElasticNet model 
elastic = grid.best_estimator_ # assign best model


**Investigate GridSearch Attributes**

`grid.best_estimator_` = returns best model

`grid.best_params_` = returns params of best model

`grid.best_estimator_.coef_` = returns best model's coefficients

`grid.best_estimator_.intercept` = returns best model's intercept

`grid.best_estimator_.score` = returns best model's scoring


In [24]:
grid.best_params_

{'alpha': 0.1, 'fit_intercept': True, 'l1_ratio': 0.1}

**Investigate Model Attributes**

`elastic.score(X, y)` = coeff of determination (R^2)

`elastic.coef_` = model feature coefficients

`elastic.intercept_` = y-intercept

**Evaluate Model Performance/Accuracy**

In [25]:
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score  

# Predict train and test targets
y_pred_train = elastic.predict( x_train_pro )   # predict training target
y_pred_test = elastic.predict( x_test_pro )     # predict test target

# Print results
print('Elastic-Net REGRESSION')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Test RMSE: %0.2f' % np.sqrt( MSE(y_test, y_pred_test) ))
print("")
print('Train R^2: %0.2f' % elastic.score(x_train_pro, y_train))
print('Test R^2: %0.2f' % elastic.score(x_test_pro, y_test))


Elastic-Net REGRESSION
--------------------------------------
Train RMSE: 4.60
Test RMSE: 5.38

Train R^2: 0.75
Test R^2: 0.65


<p><a name="rf"></a></p>

### 3. Random Forrest Regression

Links:
- [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2020/03/beginners-guide-random-forest-hyperparameter-tuning/)
- [RF Boston Housing](https://towardsdatascience.com/predicting-housing-prices-using-a-scikit-learns-random-forest-model-e736b59d56c5)

Background:
- Ensemble algorithm
    - Combines multiple decisions trees 

**Create Random Forrest Model**

In [57]:
from sklearn.ensemble import RandomForestRegressor

# Instantiate general model
rforrest = RandomForestRegressor(criterion='mse',  # measure quality of split
                                 bootstrap = True, # if false, all data used to build each tree
                                 oob_score = True, # use OOB samples to estimate R^2 on unseen data
                                 n_jobs=-1)        # dispatch n-1 CPUs)


**Baseline Model (No Hyperparameters)**

In [58]:
rforrest.fit(x_train, y_train) #fit data to base model
y_pred_train = rforrest.predict(x_train)

print('Random Forrest Baseline Model')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Train R^2: %0.2f' % rforrest.score( x_train_pro, y_train ))


Random Forrest Baseline Model
--------------------------------------
Train RMSE: 1.22
Train R^2: 0.98


**Perform Grid Search to tune hyperparameters**

`max_depth (def=None)` - number of nodes in tree (too large, overfit)

`min_sample_split (def=2)` - min # observations required for split (too small, overfit)

`max_leaf_nodes (def=2)` - num. terminal nodes (too large, overfit)

`min_samples_leaf (def=1)` - min num. samples in leaf after splitting (like terminal nodes)

`n_estimators (def=100)` - number of trees in forrest

`max_sample (def=None)` - frac. of dataset given to any individual tree

`max_features (def=auto)` - num. features provided to each tree


In [55]:
from sklearn.model_selection import RandomizedSearchCV
#GridSearchCV searched ALL permutations (computationally expensive)

# Define possible hyperparameter values
param_grid = {'n_estimators': [*range(100,5000,200)],  # n trees per forrest
              'max_features': ['auto', 'sqrt','log2'], # max features considered at split
              'max_depth': [*range(10,60,10)],         # max nodes in tree
              'min_samples_split': [*range(2,22,2)],   # min samples required for node split
              'min_samples_leaf': [*range(1,23,3)]}    # min samples at each leaf
    

# Instantiate RandomizedSearchCV model
search = RandomizedSearchCV(estimator = rforrest,
                            param_distributions = param_grid,
                            n_iter = 50,
                            cv = 3,
                            n_jobs=-1,
                            verbose=True,
                            random_state=42)

# Fit training data to GridSearchCV model
search.fit( x_train_pro, y_train ) # will run permutation of hyperparameter values

# Assign tuned parameters to ElasticNet model 
rforrest = search.best_estimator_ # assign best model


Fitting 3 folds for each of 50 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   33.9s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  2.9min finished


**Investigate RandomSearchCV Attributes**

`search.best_estimator_` = returns best model

`search.best_params_` = returns params of best model

`search.best_estimator_.coef_` = returns best model's coefficients

`search.best_estimator_.intercept` = returns best model's intercept

`search.best_estimator_.score` = returns best model's scoring


In [56]:
search.best_params_

{'n_estimators': 4100,
 'min_samples_split': 2,
 'min_samples_leaf': 1,
 'max_features': 'log2',
 'max_depth': 50}

**Investigate Model Attributes**

[Link](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)


**Evaluate Model Performance/Accuracy**

In [60]:
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score  

# Predict train and test targets
y_pred_train = rforrest.predict( x_train_pro )   # predict training target
y_pred_test = rforrest.predict( x_test_pro )     # predict test target

# Print results
print('Elastic-Net REGRESSION')
print('--------------------------------------')
print('Train RMSE: %0.2f' % np.sqrt( MSE(y_train, y_pred_train) ))
print('Test RMSE: %0.2f' % np.sqrt( MSE(y_test, y_pred_test) ))
print("")
print('Train R^2: %0.2f' % r2_score( y_train, y_pred_train ))
print('Test R^2: %0.2f' % r2_score( y_test, y_pred_test ))


Elastic-Net REGRESSION
--------------------------------------
Train RMSE: 1.22
Test RMSE: 3.78

Train R^2: 0.98
Test R^2: 0.83
