# Capstone Section 6: Recall Score / Stacking
<br>

## Task
<br>

1) In this section, I will calculate the Recall score on test set for various machine learning models on data that has been normalized with Standard Scaler and data that has been Normalized with Min Max Scaler. [Recall Score](#Section-1:-Recall-Score)
<br>
<br>
2) After calculating the recall score for the various machine learning alogrithims, I will stack for the model for data that has been normalized with standard scaler and for data that has been normalized with min max scaler, separately

## Importing Libraries

In [1]:
import os
from lightgbm import LGBMClassifier
import joblib, pandas as pd, numpy as np
from IPython.display import display, Javascript
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import recall_score, plot_confusion_matrix
from sklearn.ensemble import StackingClassifier, RandomForestClassifier

## Reading Data Files

In [2]:
# Standard Scaler Data
X_train_ss = pd.read_pickle('./Data/Standard Scaler/X_train_ss.pkl')
X_test_ss = pd.read_pickle('./Data/Standard Scaler/X_test_ss.pkl')

# Min Max Scaler Data
X_train_mms = pd.read_pickle('./Data/Min Max Scaler/X_train_mms.pkl')
X_test_mms = pd.read_pickle('./Data/Min Max Scaler/X_test_mms.pkl')

# Target Label
y_train = pd.read_pickle('./Data/y_train.pkl')
y_test = pd.read_pickle('./Data/y_test.pkl')

## Reading Machine Learning Model Files

#### Logistic Regression

In [3]:
gs_ss_lg = joblib.load('./Data/Machine Learning Model/Logistic Regression/gs_ss_lg.pkl')
gs_mms_lg = joblib.load('./Data/Machine Learning Model/Logistic Regression/gs_mms_lg.pkl')

#### Light Gradient Boosting Machine

In [4]:
gs_ss_lgbm = joblib.load(\
              './Data/Machine Learning Model/Light Gradient Boosting Machine\gs_ss_lgbm.pkl')
gs_mms_lgbm = joblib.load(\
              './Data/Machine Learning Model/Light Gradient Boosting Machine\gs_mms_lgbm.pkl')

#### Random Forest

In [5]:
gs_ss_rfc = joblib.load('./Data/Machine Learning Model/Random Forest/gs_ss_rfc.pkl')
gs_mms_rfc = joblib.load('./Data/Machine Learning Model/Random Forest/gs_mms_rfc.pkl')

# Section 1: Recall Score

# Standard Scaler Data

In [6]:
y_test_ss_lg_pred = gs_ss_lg.predict(X_test_ss)
y_test_ss_lgbm_pred = gs_ss_lgbm.predict(X_test_ss)
y_test_ss_rfc_pred = gs_ss_rfc.predict(X_test_ss)

#### Recall Score for Logistic Regression

In [7]:
recall_score(y_test, y_test_ss_lg_pred)

0.7608695652173914

#### Recall Score for Light Gradient Boosting Machine

In [8]:
recall_score(y_test, y_test_ss_lgbm_pred)

0.9627329192546584

#### Recall Score for Random Forest

In [9]:
recall_score(y_test, y_test_ss_rfc_pred)

0.9006211180124224

# Stacking

In [10]:
gs_ss_lg.best_params_

{'Cs': 1, 'class_weight': 'balanced', 'max_iter': 10000, 'solver': 'liblinear'}

In [11]:
gs_ss_lgbm.best_params_

{'class_weight': 'balanced',
 'max_depth': 20,
 'n_estimators': 500,
 'num_leaves': 25,
 'reg_alpha': 0.2}

In [12]:
gs_ss_rfc.best_params_

{'class_weight': 'balanced',
 'max_depth': 150,
 'max_features': 20,
 'min_samples_leaf': 50,
 'min_samples_split': 30}

In [13]:
models = [
            ('lg', LogisticRegressionCV(Cs = 1,
                                       class_weight = 'balanced',
                                       max_iter = 10000,
                                       solver = 'liblinear')),
            ('lgbm', LGBMClassifier(class_weight = 'balanced',
                                       max_depth = 20,
                                       n_estimators = 500,
                                       num_leaves = 25,
                                       reg_alpha = 0.2)),
            ('rfc', RandomForestClassifier(class_weight = 'balanced',
                                          max_depth = 150,
                                          max_features = 20,
                                          min_samples_leaf = 50,
                                          min_samples_split = 30))
]

In [14]:
stacking_ss = StackingClassifier(estimators=models, cv=3, n_jobs=-1)
stacking_ss.fit(X_train_ss, y_train)

StackingClassifier(cv=3,
                   estimators=[('lg',
                                LogisticRegressionCV(Cs=1,
                                                     class_weight='balanced',
                                                     max_iter=10000,
                                                     solver='liblinear')),
                               ('lgbm',
                                LGBMClassifier(class_weight='balanced',
                                               max_depth=20, n_estimators=500,
                                               num_leaves=25, reg_alpha=0.2)),
                               ('rfc',
                                RandomForestClassifier(class_weight='balanced',
                                                       max_depth=150,
                                                       max_features=20,
                                                       min_samples_leaf=50,
                                                 

In [15]:
joblib.dump(stacking_ss, './Data/Machine Learning Model/Stacking/Standard Scaler/stacking_ss.pkl')

['./Data/Machine Learning Model/Stacking/Standard Scaler/stacking_ss.pkl']

#### Recall Score for Stacking

In [16]:
stacking_ss = joblib.load('./Data/Machine Learning Model/Stacking/Standard Scaler/stacking_ss.pkl')

In [17]:
y_test_ss_stacking_pred = stacking_ss.predict(X_test_ss)

In [18]:
recall_score(y_test, y_test_ss_stacking_pred)

0.765527950310559

# Min Max Scaler Data

In [19]:
y_test_mms_lg_pred = gs_mms_lg.predict(X_test_mms)
y_test_mms_lgbm_pred = gs_mms_lgbm.predict(X_test_mms)
y_test_mms_rfc_pred = gs_mms_rfc.predict(X_test_mms)

#### Recall Score for Logistic Regression

In [20]:
recall_score(y_test, y_test_mms_lg_pred)

0.8866459627329193

#### Recall Score for Light Gradient Boosting Machine

In [21]:
recall_score(y_test, y_test_mms_lgbm_pred)

0.9611801242236024

#### Recall Score for Random Forest

In [22]:
recall_score(y_test, y_test_mms_rfc_pred)

0.889751552795031

# Stacking

In [23]:
gs_mms_lg.best_params_

{'Cs': 10,
 'class_weight': 'balanced',
 'max_iter': 10000,
 'solver': 'liblinear'}

In [24]:
gs_mms_lgbm.best_params_

{'class_weight': 'balanced',
 'max_depth': 10,
 'n_estimators': 500,
 'num_leaves': 20,
 'reg_alpha': 0.1}

In [25]:
gs_mms_rfc.best_params_

{'class_weight': 'balanced',
 'max_depth': 200,
 'max_features': 25,
 'min_samples_leaf': 50,
 'min_samples_split': 30}

In [26]:
models = [
            ('lg', LogisticRegressionCV(Cs = 10,
                                       class_weight = 'balanced',
                                       max_iter = 10000,
                                       solver = 'liblinear')),
            ('lgbm', LGBMClassifier(class_weight = 'balanced',
                                       max_depth = 10,
                                       n_estimators = 500,
                                       num_leaves = 20,
                                       reg_alpha = 0.1)),
            ('rfc', RandomForestClassifier(class_weight = 'balanced',
                                          max_depth = 200,
                                          max_features = 25,
                                          min_samples_leaf = 50,
                                          min_samples_split = 30))
]

In [27]:
stacking_mms = StackingClassifier(estimators=models, cv=3, n_jobs=-1)
stacking_mms.fit(X_train_mms, y_train)

StackingClassifier(cv=3,
                   estimators=[('lg',
                                LogisticRegressionCV(class_weight='balanced',
                                                     max_iter=10000,
                                                     solver='liblinear')),
                               ('lgbm',
                                LGBMClassifier(class_weight='balanced',
                                               max_depth=10, n_estimators=500,
                                               num_leaves=20, reg_alpha=0.1)),
                               ('rfc',
                                RandomForestClassifier(class_weight='balanced',
                                                       max_depth=200,
                                                       max_features=25,
                                                       min_samples_leaf=50,
                                                       min_samples_split=30))],
                   n_jobs=-1

In [28]:
joblib.dump(stacking_mms, './Data/Machine Learning Model/Stacking/Min Max Scaler/stacking_mms.pkl')

['./Data/Machine Learning Model/Stacking/Min Max Scaler/stacking_mms.pkl']

#### Recall Score for Stacking

In [29]:
stacking_mms = joblib.load('./Data/Machine Learning Model/Stacking/Min Max Scaler/stacking_mms.pkl')

In [30]:
y_test_mms_stacking_pred = stacking_mms.predict(X_test_mms)

In [31]:
recall_score(y_test, y_test_mms_stacking_pred)

0.7065217391304348

## Comments:
<br>
1) Stacking the various machine learning has a result of lowering the recall score even when holding other parameters constant, example cv=3.
<br>
<br>
2) In descending values of recall score on test data (rounded to 4 d.p.), 
<br>

- 1 - LGBM on Standard Scaler Data (0.9627)
- 2 - LGBM on Min Max Scaler Data (0.9612)
- 3 - Random Forest on Standard Scaler Data (0.9006)
- 4 - Random Forest on Min Max Scaler Data (0.8898)
- 5 - Logistic Regression on Min Max Scaler Data (0.8866)
- 6 - Stacking on Standard Scaler Data (0.7655)
- 7 - Logistic Regression on Standard Scaler Data (0.7609)
- 8 - Stacking on Min Max Scaler Data (0.7065)

In [32]:
display(Javascript('IPython.notebook.save_checkpoint();'))

<IPython.core.display.Javascript object>

In [33]:
#os.system('shutdown -s')