# Ensemble ML Models

## In this lesson you will learn

 - Ensamble models: Bagging, Boosting and Stacking
     - RandomForest, ADA Boost, XGBoost
     
 - Support Vector Machine
 


## Ensemble models


Ensemble models in machine learning combine the predictions from multiple individual models to produce a more accurate and robust prediction. The fundamental idea is that by aggregating the predictions of several models, the ensemble often performs better than any individual model.

The three main classes of ensemble learning methods are bagging, stacking, and boosting

 - **Bagging** involves fitting many decision trees on different samples of the same dataset and averaging the predictions. For example RandomForest (regressors or classifiers).
 
 
 - **Stacking** involves fitting many different models types on the same data and using another model to learn how to best combine the predictions.
 
 
 - **Boosting** involves adding ensemble members sequentially that correct the predictions made by prior models and outputs a weighted average of the predictions.

https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/

### Bagging models

Bootstrap aggregation, or bagging for short, is an ensemble learning method that seeks a diverse group of ensemble members by varying the training data.

This typically involves using a single machine learning algorithm, almost always an unpruned decision tree, and training each model on a different sample of the same training dataset. 

The predictions made by the ensemble members are then combined using simple statistics, such as voting or averaging.




We can **summarize** the key elements of bagging as follows:

 - Different training dataset for each ensembled model.
 - Unpruned decision trees fit on each sample.
 - Simple voting or averaging of predictions.

<!-- ![image.png](attachment:image.png) -->

<div style="display: flex; justify-content: space-between;">
  <img src="../../assets/Bagging_ensemble.png" alt="First Image" style="width: 50%;"/>
</div>

Differences between criterion gini vs entropy

https://www.geeksforgeeks.org/gini-impurity-and-entropy-in-decision-tree-ml/

### Stacking models

It involves combining the predictions from multiple machine learning models on the same dataset

Stacking has its own nomenclature, where ensemble members are referred to as level-0 models, and the model that uses predictions to weight the level-0 models is referred to as a level-1 model.

 - Level-0 Models (Base-Models): Models fit on the training data and whose predictions are compiled.


 - Level-1 Model (Meta-Model): Model that learns how to best combine the predictions of the base models. The meta-model is trained on the predictions made by base models on out-of-sample data.

Unlike bagging, in stacking, the models are typically different (e.g. not all decision trees) and fit on the same dataset (e.g. instead of samples of the training dataset).

Unlike boosting, in stacking, a single model is used to learn how to best combine the predictions from the contributing models (e.g. instead of a sequence of models that correct the predictions of prior models).

We can **summarize** the key elements of stacking as follows:

 - Unchanged training dataset.
 - Different machine learning algorithms for each ensemble member.
 - Machine learning model to learn how to best combine predictions.

<!-- ![image.png](attachment:image.png) -->
<div style="display: flex; justify-content: space-between;">
  <img src="../../assets/Stacking_ensemble.png" alt="First Image" style="width: 50%;"/>
</div>

### Boosting models



Boosting is an ensemble method that seeks to change the training data to focus attention on examples that previous fit models on the training dataset have gotten wrong.

The key property of boosting ensembles is the idea of correcting prediction errors. The models are fit and added to the ensemble sequentially such that the second model attempts to correct the predictions of the first model, the third corrects the second model, and so on.

This typically involves the use of very simple decision trees that only make a single or a few decisions, referred to in boosting as weak learners. The predictions of the weak learners are combined using simple voting or averaging.




<!-- ![image.png](attachment:image.png) -->

<div style="display: flex; justify-content: space-between;">
  <img src="../../assets/Boosting_ensemble.png" alt="First Image" style="width: 50%;"/>
</div>

#### AdaBoost

AdaBoost is one of the most important Boosting ML algorithm.

Three ideas behind AdaBoost:

 - It combines several Decision Trees called weak learners (with depth of one or two)
 
 
 - In the final decision, some Trees have more importance than others
 
 
 - The errors a Tree makes are considered in the next Tree
 
The algorithm works as follow:

 - The first Tree is trained and the quality of the model is calculated (Accuracy for classification and MAPE for regression)
 
 
 - Then it is calculated the importance of that Tree


 - In the first Tree, each observation had the same weight. In the second Tree the wrong predictions will have more importance, so the training of the second Tree is influenced by the quality of the first one. The best the previous model predicts, the higher the increasing influence of the error, but the lower the number of instances.
 
 
  - The input data for the second Tree is selected depending on those weights, so the missclassifications will have higher probability to be selected to train the second Tree.
 
 
 - Then the process is repeated until every instance is well classified or predicted, or the maximum number of Trees parametrized in the AdaBoost is achieved
 
 
 - Finally, we have several Trees and each one make its own prediction for a new observation. The final decision is made by averaging the individual Tree's decision, taking into consideration the importance for each Tree
 
 
https://www.youtube.com/watch?v=LsK-xG1cLYA

#### Boosting Gradient

Gradient Boosting (GB) also combines several Tree models **sequentially** improving prediction or the dependent variable model after model.

 - It combines several Decision Trees 
 
 
 - Initially uses the mean (or the log of the odds in the case of classification) to make the first prediction, and calculates the residuals
 
 
 - Then a Tree is trained to predict those residuals
 
 
 - The predicted residuals are added to the prediction of the dependent variable, improving it, and reducing the initial residuals
 
 
 - A Learning Rate factor is added to the residual prediction to avoid overfitting (usually equals to 0.1)
 
 
 - The last two steps repeats. Several Trees are trained on the residuals left by the previous Tree and adding their predictions to the dependent variable prediction, untill we reach the maximum number of Trees
 
 
 
 - The final algorithm has multiple sequential Trees, each one adding some value to the previous prediction. When a new individual is passed through the model, depending on the value of its features, different residual prediction will be added. During the prediction or validation several predictions of the residuals are estimated, then they are added to the initial mean calculated during training steps.

https://www.youtube.com/watch?v=3CC4N4z3GJc

#### XGBoost

eXtreme Gradient Boost (XGBoost) also sequence several Trees adding residual predictions to the initial dependent variable prediction

 - Initially uses the mean (or the log of the odds in the case of classification) to make the first prediction of the dependent variable, and calculates the residuals
 
 - A similarity score is calculated to estimate the quality of that initial prediction
 
 - Then a Tree is trained to predict the residuals based on the most significant feature and threshold. This significance is calculated through the Information Gain, which depends on the similarity score
 
 - Then new nodes are added to the Tree with new features or different threshold of the same feature than above. Usually the parametrized depth is from 3 to 5.
 
 - Lambda and gamma (learning rate parameters) allow regularize the similarity score and reduce overfitting. 

https://www.youtube.com/watch?v=OtD8wVaFm6E

## Complex models with Python

Firstly, lets prepare the data

 - Dividing in train and test datasets
 - Standardize continuous variables
 - When working with Trees, making Dummies for categoricals variables is not always recommended. So we won't do it. However, we have to encode categorical variables to numbers. If not Python will return an error


#### Import libraries

In [107]:
#!pip install xgboost

Collecting xgboost
  Downloading xgboost-2.0.0-py3-none-win_amd64.whl (99.7 MB)
     ---------------------------------------- 99.7/99.7 MB 3.6 MB/s eta 0:00:00
Installing collected packages: xgboost
Successfully installed xgboost-2.0.0


In [284]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix, roc_curve, roc_auc_score
from imblearn.over_sampling import RandomOverSampler
# RepeatedStratifiedKFold for classification
# RepeatedKFold for regressio

from sklearn.model_selection import RepeatedKFold, RepeatedStratifiedKFold, GridSearchCV, cross_val_score
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor, GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.ensemble import StackingClassifier, StackingRegressor, RandomForestClassifier, RandomForestRegressor
from xgboost import XGBClassifier, XGBRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR


from tabulate import tabulate

import warnings
warnings.filterwarnings("ignore")

### Diabetes dataset

contains more than 700 pacients with several independent variables and one dependent, Outcome, which is binary, for classification. Outcome = 1 means diabetes = Yes

In [264]:
diabetes = pd.read_csv('../data/pima-indians-diabetes.csv', sep = ';')
diabetes.head(3)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1


#### Diabetes Train-Test split

In [265]:
diab_X_train, diab_X_test, diab_y_train, diab_y_test = train_test_split(diabetes[['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI',
                                              'DiabetesPedigreeFunction','Age']], 
                                                    diabetes['Outcome'], train_size = 0.8, random_state = 0)
diab_X_train.head(3)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
603,7,150,78,29,126,35.2,0.692,54
118,4,97,60,23,0,28.2,0.443,22
247,0,165,90,33,680,52.3,0.427,23


### Over Sampling

As wee saw in with the value_counts method there is some unbalance, so let's fix it

In [266]:
diabetes.Outcome.value_counts()

0    500
1    268
Name: Outcome, dtype: int64

In [267]:
ros = RandomOverSampler(random_state=42)
diab_X_train, diab_y_train= ros.fit_resample(diab_X_train, diab_y_train)

In [268]:
scaler = StandardScaler()
sc = scaler.fit(diab_X_train)

train_sc = sc.transform(diab_X_train)
diab_X_train_sc = pd.DataFrame(train_sc)
diab_X_train_sc.columns = ['Pregnancies_st','Glucose_st','BloodPressure_st','SkinThickness_st','Insulin_st','BMI_st',
       'DiabetesPedigreeFunction_st','Age_st']

test_sc = sc.transform(diab_X_test)
diab_X_test_sc = pd.DataFrame(test_sc)
diab_X_test_sc.columns = ['Pregnancies_st','Glucose_st','BloodPressure_st','SkinThickness_st','Insulin_st','BMI_st',
       'DiabetesPedigreeFunction_st','Age_st']

diab_X_test_sc.head(3)

Unnamed: 0,Pregnancies_st,Glucose_st,BloodPressure_st,SkinThickness_st,Insulin_st,BMI_st,DiabetesPedigreeFunction_st,Age_st
0,-0.892614,2.215251,0.318781,1.32409,-0.703392,1.353269,2.715018,-1.061804
1,-0.606949,-0.508176,0.217092,0.536939,0.133295,0.141652,-0.226186,-0.978028
2,-0.035617,-1.425852,-0.393042,-1.279564,-0.703392,0.193765,-0.264808,-0.810476


### Car Miles per Galon dataset

Contains information about 1300 cars. The goal is to predict mpg



In [233]:
cars = pd.read_csv('../data/car_miles_per_galon.csv', sep = ';')
cars.head(3)

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg
0,6,225.0,95,3264,16.0,75,1,19.0
1,6,250.0,88,3139,14.5,71,1,18.0
2,4,98.0,80,2164,15.0,72,1,28.0


#### Car Train test split



In [239]:
cars_X_train, cars_X_test, cars_y_train, cars_y_test = train_test_split(cars[['cylinders','displacement','horsepower',
                                                                              'weight','acceleration','model year','origin']], 
                                                    cars['mpg'], train_size = 0.8, random_state = 0)
cars_X_train.head(3)

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin
1161,4,86.0,64,1875,16.4,81,1
567,4,98.0,68,2135,16.6,78,3
1270,4,141.0,71,3190,24.8,79,2


We dont scale origin

In [102]:
cars['origin'].value_counts()

1    830
3    280
2    219
Name: origin, dtype: int64

In [240]:
column_to_keep_train = cars_X_train['origin']
cars_X_train = cars_X_train.drop(columns=['origin'])
original_index_train = cars_X_train.index

column_to_keep_test = cars_X_test['origin']
cars_X_test = cars_X_test.drop(columns=['origin'])
original_index_test = cars_X_test.index

scaler = StandardScaler()
car_scaler = scaler.fit(cars_X_train)
train_car_scaler = car_scaler.transform(cars_X_train)
test_car_scaler = car_scaler.transform(cars_X_test)

# Crear un nuevo DataFrame con las características estandarizadas
cars_X_train_car_scaler = pd.DataFrame(train_car_scaler, columns=cars_X_train.columns, index = original_index_train)
cars_X_test_car_scaler = pd.DataFrame(test_car_scaler, columns=cars_X_test.columns, index = original_index_test)

# Concatenar la columna no estandarizada de nuevo al DataFrame
cars_X_train_car_scaler = pd.concat([cars_X_train_car_scaler, column_to_keep_train], axis=1)
cars_X_test_car_scaler = pd.concat([cars_X_test_car_scaler, column_to_keep_test], axis=1)
cars_X_train_car_scaler.head(3)

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin
1161,-0.861621,-1.020848,-1.050466,-1.288559,0.275634,1.345204,1
567,-0.861621,-0.904822,-0.943163,-0.978212,0.347677,0.529952,3
1270,-0.861621,-0.489063,-0.862686,0.281082,3.30144,0.801703,2


## Random Forest Regressor

In [None]:
RF_Reg = RandomForestRegressor()

grid = dict()
grid['n_estimators'] = [100, 500] # number of trees
grid['criterion'] = ['gini','entropy']



# define the evaluation procedure
cv = RepeatedKFold(n_splits = 5, n_repeats = 3, random_state = 1)

# define the grid search procedure
grid_search = GridSearchCV(estimator = RF_Reg, param_grid = grid, n_jobs = -1, cv = cv, scoring = 'neg_mean_absolute_error')

# execute the grid search
grid_result = grid_search.fit(cars_X_train_car_scaler, cars_y_train)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

In [None]:
RF_Reg = RandomDorestRegressor(learning_rate = 1, n_estimators = 100)
RF_Reg.fit(cars_X_train_car_scaler, cars_y_train)

In [None]:
cars_X_test_car_scaler['RF_Reg'] = RF_Reg.predict(cars_X_test_car_scaler)
cars_X_train_car_scaler['RF_Reg'] = RF_Reg.predict(cars_X_train_car_scaler)

In [None]:
print("MAE: ", metrics.mean_absolute_error(cars_y_train, cars_X_train_car_scaler['RF_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['RF_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['RF_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_train, cars_X_train_car_scaler['RF_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_train, cars_X_train_car_scaler['RF_Reg']).round(4))

In [None]:
print("MAE: ", metrics.mean_absolute_error(cars_y_test, cars_X_test_car_scaler['RF_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['RF_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['RF_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_test, cars_X_test_car_scaler['RF_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_test, cars_X_test_car_scaler['RF_Reg']).round(4))

## Random Forest Classifier

In [None]:
RF_Cl = RandomForestClassifier()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [100, 500]
grid['criterion'] = ['gini','entropy']

# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=RF_Cl, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy')

# execute the grid search
grid_result = grid_search.fit(diab_X_train_sc, diab_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))
    


In [None]:
RF_Cl = RandomForestClassifier(learning_rate = 0.01, n_estimators = 500)
RF_Cl.fit(diab_X_train_sc, diab_y_train)

In [None]:
print("Train set score (Accuracy) =", RF_Cl.score(diab_X_train_sc, diab_y_train).round(4))
print("Test set score (Accuracy) =", RF_Cl.score(diab_X_test_sc, diab_y_test).round(4))

conf_mat = confusion_matrix(diab_y_test, RF_Cl.predict(diab_X_test_sc))
print(tabulate(conf_mat,headers = ['pred Diab No','pred Diab Yes'], showindex = ['real Diab No','real Diab Yes'], 
               tablefmt = 'fancy_grid'))

print(classification_report(diab_y_test, RF_Cl.predict(diab_X_test_sc)))

In [None]:
diab_X_test_sc['RF_Cl'] = RF_Cl.predict(diab_X_test_sc)
diab_X_train_sc['RF_Cl'] = RF_Cl.predict(diab_X_train_sc)

## Stacking Regression with Python


In [285]:
level0 = list()
level0.append(('lr', LinearRegression()))
level0.append(('RF', RandomForestRegressor()))
level0.append(('svr', SVR()))

level1 = LinearRegression()

# define the stacking ensemble
St_reg = StackingRegressor(estimators = level0, final_estimator = level1)

# fit the model on all available data
St_reg.fit(cars_X_train_car_scaler, cars_y_train)



In [286]:
cars_X_test_car_scaler['St_reg'] = St_reg.predict(cars_X_test_car_scaler)
cars_X_train_car_scaler['St_reg'] = St_reg.predict(cars_X_train_car_scaler)

In [287]:
print("MAE: ", metrics.mean_absolute_error(cars_y_train, cars_X_train_car_scaler['St_reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['St_reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['St_reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_train, cars_X_train_car_scaler['St_reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_train, cars_X_train_car_scaler['St_reg']).round(4))

MAE:  0.0054
MSE:  0.0001
RMSE:  0.0089
MAPE:  0.0002
R2:  1.0


In [288]:
print("MAE: ", metrics.mean_absolute_error(cars_y_test, cars_X_test_car_scaler['St_reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['St_reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['St_reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_test, cars_X_test_car_scaler['St_reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_test, cars_X_test_car_scaler['St_reg']).round(4))


MAE:  0.2709
MSE:  1.3632
RMSE:  1.1676
MAPE:  0.0131
R2:  0.9747


## Stacking Classifier with Python

In [289]:
level0 = list()
level0.append(('lr', LogisticRegression()))
level0.append(('RF', RandomForestClassifier()))
level0.append(('svc', SVC()))

level1 = LogisticRegression()

# define the stacking ensemble
St_Cl = StackingClassifier(estimators = level0, final_estimator = level1)

# fit the model on all available data
St_Cl.fit(diab_X_train_sc, diab_y_train)




In [290]:
print("Train set score (Accuracy) =", St_Cl.score(diab_X_train_sc, diab_y_train).round(4))
print("Test set score (Accuracy) =", St_Cl.score(diab_X_test_sc, diab_y_test).round(4))

conf_mat = confusion_matrix(diab_y_test, St_Cl.predict(diab_X_test_sc))
print(tabulate(conf_mat,headers = ['pred Diab No','pred Diab Yes'], showindex = ['real Diab No','real Diab Yes'], 
               tablefmt = 'fancy_grid'))

print(classification_report(diab_y_test, St_Cl.predict(diab_X_test_sc)))


Train set score (Accuracy) = 1.0
Test set score (Accuracy) = 0.7987
╒═══════════════╤════════════════╤═════════════════╕
│               │   pred Diab No │   pred Diab Yes │
╞═══════════════╪════════════════╪═════════════════╡
│ real Diab No  │             94 │              13 │
├───────────────┼────────────────┼─────────────────┤
│ real Diab Yes │             18 │              29 │
╘═══════════════╧════════════════╧═════════════════╛
              precision    recall  f1-score   support

           0       0.84      0.88      0.86       107
           1       0.69      0.62      0.65        47

    accuracy                           0.80       154
   macro avg       0.76      0.75      0.76       154
weighted avg       0.79      0.80      0.80       154



In [None]:
diab_X_test_sc['St_Cl'] = St_Cl.predict(diab_X_test_sc)
diab_X_train_sc['St_Cl'] = St_Cl.predict(diab_X_train_sc)

## AdaBoost Regression with Python

In [161]:
AB_Reg = AdaBoostRegressor()

grid = dict()
grid['n_estimators'] = [10, 100, 500] # number of trees
grid['learning_rate'] = [ 0.001, 0.01, 0.1, 1.0]



# define the evaluation procedure
cv = RepeatedKFold(n_splits = 5, n_repeats = 3, random_state = 1)

# define the grid search procedure
grid_search = GridSearchCV(estimator = AB_Reg, param_grid = grid, n_jobs = -1, cv = cv, scoring = 'neg_mean_absolute_error')

# execute the grid search
grid_result = grid_search.fit(cars_X_train_car_scaler, cars_y_train)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: -1.767374 using {'learning_rate': 1.0, 'n_estimators': 100}
-1.918695 (0.100473) with: {'learning_rate': 0.001, 'n_estimators': 10}
-1.899432 (0.093246) with: {'learning_rate': 0.001, 'n_estimators': 100}
-1.894002 (0.098157) with: {'learning_rate': 0.001, 'n_estimators': 500}
-1.911295 (0.096290) with: {'learning_rate': 0.01, 'n_estimators': 10}
-1.885670 (0.095171) with: {'learning_rate': 0.01, 'n_estimators': 100}
-1.851997 (0.084398) with: {'learning_rate': 0.01, 'n_estimators': 500}
-1.896395 (0.103773) with: {'learning_rate': 0.1, 'n_estimators': 10}
-1.831037 (0.079944) with: {'learning_rate': 0.1, 'n_estimators': 100}
-1.816237 (0.084577) with: {'learning_rate': 0.1, 'n_estimators': 500}
-1.847538 (0.093567) with: {'learning_rate': 1.0, 'n_estimators': 10}
-1.767374 (0.086779) with: {'learning_rate': 1.0, 'n_estimators': 100}
-1.771633 (0.079390) with: {'learning_rate': 1.0, 'n_estimators': 500}


The best model has 100 Trees and a learning rate of 1. Using Mean Absolute Error has loss function, test using Mean Absolute Percentage Error. It could change the best combination of parameters!!

Once we know which is the best combination let's train the definitive model and make the predictions

In [168]:
AB_Reg = AdaBoostRegressor(learning_rate = 1, n_estimators = 100)
AB_Reg.fit(cars_X_train_car_scaler, cars_y_train)

In [169]:
cars_X_test_car_scaler['AB_Reg'] = AB_Reg.predict(cars_X_test_car_scaler)
cars_X_train_car_scaler['AB_Reg'] = AB_Reg.predict(cars_X_train_car_scaler)

In [170]:
print("MAE: ", metrics.mean_absolute_error(cars_y_train, cars_X_train_car_scaler['AB_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['AB_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['AB_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_train, cars_X_train_car_scaler['AB_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_train, cars_X_train_car_scaler['AB_Reg']).round(4))

MAE:  1.6165
MSE:  3.7673
RMSE:  1.9409
MAPE:  0.0714
R2:  0.9368


In [171]:
print("MAE: ", metrics.mean_absolute_error(cars_y_test, cars_X_test_car_scaler['AB_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['AB_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['AB_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_test, cars_X_test_car_scaler['AB_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_test, cars_X_test_car_scaler['AB_Reg']).round(4))


MAE:  1.6143
MSE:  4.0474
RMSE:  2.0118
MAPE:  0.0745
R2:  0.925


## AdaBoost Classifier with Python

In [172]:
AB_cl = AdaBoostClassifier()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [10, 100, 500]
grid['learning_rate'] = [0.001, 0.01, 0.1, 1.0]

# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=AB_cl, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy')

# execute the grid search
grid_result = grid_search.fit(diab_X_train_sc, diab_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))
    


Best: 0.747554 using {'learning_rate': 0.01, 'n_estimators': 500}
0.713324 (0.033152) with: {'learning_rate': 0.001, 'n_estimators': 10}
0.713324 (0.033152) with: {'learning_rate': 0.001, 'n_estimators': 100}
0.723648 (0.035688) with: {'learning_rate': 0.001, 'n_estimators': 500}
0.713324 (0.033152) with: {'learning_rate': 0.01, 'n_estimators': 10}
0.731281 (0.031052) with: {'learning_rate': 0.01, 'n_estimators': 100}
0.747554 (0.034162) with: {'learning_rate': 0.01, 'n_estimators': 500}
0.730201 (0.032379) with: {'learning_rate': 0.1, 'n_estimators': 10}
0.745942 (0.037197) with: {'learning_rate': 0.1, 'n_estimators': 100}
0.737825 (0.034146) with: {'learning_rate': 0.1, 'n_estimators': 500}
0.730170 (0.042411) with: {'learning_rate': 1.0, 'n_estimators': 10}
0.718819 (0.038249) with: {'learning_rate': 1.0, 'n_estimators': 100}
0.697094 (0.028260) with: {'learning_rate': 1.0, 'n_estimators': 500}


In [271]:
AB_cl = AdaBoostClassifier(learning_rate = 0.01, n_estimators = 500)
AB_cl.fit(diab_X_train_sc, diab_y_train)

# diab_X_train, diab_X_test, diab_y_train, diab_y_test

In [272]:
print("Train set score (Accuracy) =", AB_cl.score(diab_X_train_sc, diab_y_train).round(4))
print("Test set score (Accuracy) =", AB_cl.score(diab_X_test_sc, diab_y_test).round(4))

conf_mat = confusion_matrix(diab_y_test, AB_cl.predict(diab_X_test_sc))
print(tabulate(conf_mat,headers = ['pred Diab No','pred Diab Yes'], showindex = ['real Diab No','real Diab Yes'], 
               tablefmt = 'fancy_grid'))

print(classification_report(diab_y_test, AB_cl.predict(diab_X_test_sc)))

Train set score (Accuracy) = 0.7863
Test set score (Accuracy) = 0.7987
╒═══════════════╤════════════════╤═════════════════╕
│               │   pred Diab No │   pred Diab Yes │
╞═══════════════╪════════════════╪═════════════════╡
│ real Diab No  │             86 │              21 │
├───────────────┼────────────────┼─────────────────┤
│ real Diab Yes │             10 │              37 │
╘═══════════════╧════════════════╧═════════════════╛
              precision    recall  f1-score   support

           0       0.90      0.80      0.85       107
           1       0.64      0.79      0.70        47

    accuracy                           0.80       154
   macro avg       0.77      0.80      0.78       154
weighted avg       0.82      0.80      0.80       154



In [None]:
diab_X_test_sc['AB_Cl'] = AB_cl.predict(diab_X_test_sc)
diab_X_train_sc['AB_Cl'] = AB_cl.predict(diab_X_train_sc)

## Boosting Gradient Regressor with Python

In [211]:
BG_Reg = GradientBoostingRegressor()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [100, 500]
grid['learning_rate'] = [0.01, 0.1, 1.0]
grid['max_depth'] = [3, 5, 8]

# define the evaluation procedure
cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=BG_Reg, param_grid=grid, n_jobs=-1, cv=cv, scoring='neg_mean_absolute_error')
# execute the grid search
grid_result = grid_search.fit(cars_X_train_car_scaler, cars_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))


Best: -0.170705 using {'learning_rate': 0.1, 'max_depth': 8, 'n_estimators': 500, 'subsample': 0.5}
-2.756249 (0.171753) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.5}
-2.757431 (0.169782) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.7}
-2.759255 (0.171752) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}
-1.215482 (0.082850) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500, 'subsample': 0.5}
-1.218314 (0.083073) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500, 'subsample': 0.7}
-1.232304 (0.083389) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500, 'subsample': 1.0}
-2.645642 (0.143866) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 100, 'subsample': 0.5}
-2.636124 (0.142124) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 100, 'subsample': 0.7}
-2.628138 (0.140084) with: {'learning_rate': 0.01, '

In [251]:
BG_Reg = GradientBoostingRegressor(learning_rate = 1, n_estimators = 100, max_depth = 2)
BG_Reg.fit(cars_X_train_car_scaler, cars_y_train)

In [252]:
cars_X_test_car_scaler['BG_Reg'] = BG_Reg.predict(cars_X_test_car_scaler)
cars_X_train_car_scaler['BG_Reg'] = BG_Reg.predict(cars_X_train_car_scaler)

In [253]:
print("MAE: ", metrics.mean_absolute_error(cars_y_train, cars_X_train_car_scaler['BG_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['BG_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['BG_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_train, cars_X_train_car_scaler['BG_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_train, cars_X_train_car_scaler['BG_Reg']).round(4))

MAE:  0.0799
MSE:  0.012
RMSE:  0.1094
MAPE:  0.0037
R2:  0.9998


In [254]:
print("MAE: ", metrics.mean_absolute_error(cars_y_test, cars_X_test_car_scaler['BG_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['BG_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['BG_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_test, cars_X_test_car_scaler['BG_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_test, cars_X_test_car_scaler['BG_Reg']).round(4))

MAE:  0.3473
MSE:  1.4328
RMSE:  1.197
MAPE:  0.0167
R2:  0.9734


## Boosting Gradient Classifier with Python


In [180]:
BG_cl = GradientBoostingClassifier()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [100, 500]
grid['learning_rate'] = [0.01, 0.1, 1.0]
grid['max_depth'] = [3, 5, 8]

# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=BG_cl, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy')
# execute the grid search

grid_result = grid_search.fit(diab_X_train_sc, diab_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.750842 using {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 500, 'subsample': 0.7}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.5}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.7}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 1.0}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.5}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.7}
0.640064 (0.003169) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}
0.718806 (0.033561) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 500, 'subsample': 0.5}
0.714461 (0.031336) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 500, 'subsample': 0.7}
0.703616 (0.025514) with: {'learning_rate': 0.001, 'max

In [255]:
BG_cl = GradientBoostingClassifier(learning_rate = 0.01, max_depth = 5, n_estimators = 500)
BG_cl.fit(diab_X_train_sc, diab_y_train)

In [209]:
print("Train set score (Accuracy) =", BG_cl.score(diab_X_train_sc, diab_y_train).round(4))
print("Test set score (Accuracy) =", BG_cl.score(diab_X_test_sc, diab_y_test).round(4))

conf_mat = confusion_matrix(diab_y_test, BG_cl.predict(diab_X_test_sc))
print(tabulate(conf_mat,headers = ['pred Diab No','pred Diab Yes'], showindex = ['real Diab No','real Diab Yes'], 
               tablefmt = 'fancy_grid'))

print(classification_report(diab_y_test, BG_cl.predict(diab_X_test_sc)))

diab_X_test_sc['AB_Cl'] = AB_cl.predict(diab_X_test_sc)
diab_X_train_sc['AB_Cl'] = AB_cl.predict(diab_X_train_sc)

Train set score (Accuracy) = 0.9847
Test set score (Accuracy) = 0.7987
╒═══════════════╤════════════════╤═════════════════╕
│               │   pred Diab No │   pred Diab Yes │
╞═══════════════╪════════════════╪═════════════════╡
│ real Diab No  │             88 │              19 │
├───────────────┼────────────────┼─────────────────┤
│ real Diab Yes │             12 │              35 │
╘═══════════════╧════════════════╧═════════════════╛
              precision    recall  f1-score   support

           0       0.88      0.82      0.85       107
           1       0.65      0.74      0.69        47

    accuracy                           0.80       154
   macro avg       0.76      0.78      0.77       154
weighted avg       0.81      0.80      0.80       154



## XGBoost Regressor with Python


In [256]:
XG_Reg = GradientBoostingRegressor()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [50, 100]
grid['learning_rate'] = [0.01, 0.1, 1.0]
grid['max_depth'] = [3, 5, 8]

# define the evaluation procedure
cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=XG_Reg, param_grid=grid, n_jobs=-1, cv=cv, scoring='neg_mean_absolute_error')
# execute the grid search
grid_result = grid_search.fit(cars_X_train_car_scaler, cars_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))


Best: -0.013461 using {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500}
-2.409940 (0.134989) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100}
-0.065114 (0.006873) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500}
-2.378757 (0.125581) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 100}
-0.058029 (0.007673) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 500}
-2.378105 (0.123946) with: {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 100}
-0.055142 (0.008199) with: {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 500}
-0.031272 (0.004521) with: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
-0.017524 (0.004248) with: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 500}
-0.018695 (0.005417) with: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100}
-0.013461 (0.005263) with: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500}
-0.014311 (0.007493) with: {'learning_rate

In [257]:
XG_Reg = GradientBoostingRegressor(learning_rate = 0.1, n_estimators = 100, max_depth = 5)
XG_Reg.fit(cars_X_train_car_scaler, cars_y_train)

In [258]:
cars_X_test_car_scaler['XG_Reg'] = XG_Reg.predict(cars_X_test_car_scaler)
cars_X_train_car_scaler['XG_Reg'] = XG_Reg.predict(cars_X_train_car_scaler)

In [259]:
print("MAE: ", metrics.mean_absolute_error(cars_y_train, cars_X_train_car_scaler['XG_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['XG_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_train, cars_X_train_car_scaler['XG_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_train, cars_X_train_car_scaler['XG_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_train, cars_X_train_car_scaler['XG_Reg']).round(4))

MAE:  0.0054
MSE:  0.0001
RMSE:  0.0097
MAPE:  0.0002
R2:  1.0


In [260]:
print("MAE: ", metrics.mean_absolute_error(cars_y_test, cars_X_test_car_scaler['XG_Reg']).round(4))
print("MSE: ", metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['XG_Reg']).round(4))
print("RMSE: ", np.sqrt(metrics.mean_squared_error(cars_y_test, cars_X_test_car_scaler['XG_Reg'])).round(4))
print("MAPE: ", metrics.mean_absolute_percentage_error(cars_y_test, cars_X_test_car_scaler['XG_Reg']).round(4))
print("R2: ", metrics.r2_score(cars_y_test, cars_X_test_car_scaler['XG_Reg']).round(4))

MAE:  0.271
MSE:  1.3648
RMSE:  1.1683
MAPE:  0.0131
R2:  0.9747


## XGBoost Classifier with Python

In [261]:
XG_cl = GradientBoostingClassifier()

# define the grid of values to search
grid = dict()
grid['n_estimators'] = [50, 100]
grid['learning_rate'] = [0.01, 0.1, 1.0]
grid['max_depth'] = [3, 5, 8]

# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=1)

# define the grid search procedure
grid_search = GridSearchCV(estimator=XG_cl, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy')
# execute the grid search

grid_result = grid_search.fit(diab_X_train_sc, diab_y_train)

# summarize the best score and configuration
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

# summarize all scores that were evaluated
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.838848 using {'learning_rate': 1.0, 'max_depth': 8, 'n_estimators': 50}
0.758263 (0.030810) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 50}
0.766317 (0.029425) with: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100}
0.775240 (0.041400) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 50}
0.785842 (0.030856) with: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 100}
0.802795 (0.039673) with: {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 50}
0.808743 (0.040271) with: {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 100}
0.787957 (0.027168) with: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50}
0.801521 (0.025535) with: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
0.814679 (0.023642) with: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 50}
0.823153 (0.026228) with: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100}
0.825703 (0.036398) with: {'learning_rate': 0.1, 'max_depth

In [262]:
XG_cl = GradientBoostingClassifier(learning_rate = 0.01, max_depth = 5, n_estimators = 500, subsample = 0.7)
XG_cl.fit(diab_X_train_sc, diab_y_train)

In [263]:
print("Train set score (Accuracy) =", XG_cl.score(diab_X_train_sc, diab_y_train).round(4))
print("Test set score (Accuracy) =", XG_cl.score(diab_X_test_sc, diab_y_test).round(4))

conf_mat = confusion_matrix(diab_y_test, XG_cl.predict(diab_X_test_sc))
print(tabulate(conf_mat,headers = ['pred Diab No','pred Diab Yes'], showindex = ['real Diab No','real Diab Yes'], 
               tablefmt = 'fancy_grid'))

print(classification_report(diab_y_test, XG_cl.predict(diab_X_test_sc)))

diab_X_test_sc['XG_cl'] = XG_cl.predict(diab_X_test_sc)
diab_X_train_sc['XG_cl'] = XG_cl.predict(diab_X_train_sc)

Train set score (Accuracy) = 0.9847
Test set score (Accuracy) = 0.7987
╒═══════════════╤════════════════╤═════════════════╕
│               │   pred Diab No │   pred Diab Yes │
╞═══════════════╪════════════════╪═════════════════╡
│ real Diab No  │             87 │              20 │
├───────────────┼────────────────┼─────────────────┤
│ real Diab Yes │             11 │              36 │
╘═══════════════╧════════════════╧═════════════════╛
              precision    recall  f1-score   support

           0       0.89      0.81      0.85       107
           1       0.64      0.77      0.70        47

    accuracy                           0.80       154
   macro avg       0.77      0.79      0.77       154
weighted avg       0.81      0.80      0.80       154

