### Boosting
##### What is Boosting?
Definition: The term ‘Boosting’ refers to a family of algorithms which converts weak learner to strong learners.

##### How Boosting Algorithms works?
    
    Step 1:  The base learner takes all the distributions and assign equal weight or attention to each observation.

    Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher attention to observations having prediction error. Then, we apply the next base learning algorithm.

    Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is achieved.

    Finally, it combines the outputs from weak learner and creates  a strong learner which eventually improves the prediction power of the model. Boosting pays higher focus on examples which are mis-classiﬁed or have higher errors by preceding weak rules.
    
##### Types of Boosting Algorithms
Underlying engine used for boosting algorithms can be anything.  It can be decision stamp, margin-maximizing classification algorithm etc. There are many boosting algorithms which use other types of engine such as:

    AdaBoost (Adaptive Boosting)
    Gradient Tree Boosting
    XGBoost

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Adaptive Boosting aka AdaBoost
from sklearn.ensemble import AdaBoostClassifier

In [2]:
#Loading of the data
data = pd.read_csv("bank.csv")

X = data.drop('deposit', axis=1)
y = data['deposit']

In [3]:
#Splitting the dataset into test and train
X_train,X_test,y_train,y_test = train_test_split(X, y, random_state = 0,test_size = 0.3)

In [4]:
#Fitting of Weak Classifier
# Weak Learner or weak classifier means the algorithm that consists of error or mistakes
dt_clf = DecisionTreeClassifier(max_depth=1,random_state=0)
dt_clf.fit(X_train,y_train)
dt_score = dt_clf.score(X_test,y_test)
print("decision Tree score: ",dt_score)

decision Tree score:  0.7094655120931621


##### Ada Boosting:
    1- Preparation of weak classifier - like above cell Decision Tree
    2- Calculation of misclassification rate (how many data points are wrongly classsified a true or false)
    3- Calculation of 'stage value' (stage value is used to weight predictions from the model based on the misclassification error for the model.)
    4- Updation of training weights (the training weights are updated giving more weight to incorrectly predicted instances, and less weight to correctly predicted instances.)

Note: The initial weight before applying adaboost is set to 1/n (Where n is the total number of instances)

In [5]:
# Fitting of weak classifier with Adaboost
ada_clf = AdaBoostClassifier(base_estimator=dt_clf,random_state=0)
ada_clf.fit(X_train,y_train)
# n_estimators: It controls the number of weak learners.
# learning_rate:Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.
# base_estimators: It helps to specify different ML algorithm.

AdaBoostClassifier(algorithm='SAMME.R',
                   base_estimator=DecisionTreeClassifier(class_weight=None,
                                                         criterion='gini',
                                                         max_depth=1,
                                                         max_features=None,
                                                         max_leaf_nodes=None,
                                                         min_impurity_decrease=0.0,
                                                         min_impurity_split=None,
                                                         min_samples_leaf=1,
                                                         min_samples_split=2,
                                                         min_weight_fraction_leaf=0.0,
                                                         presort=False,
                                                         random_state=0,
                                

In [6]:
ada_score = ada_clf.score(X_test,y_test)
print("Ada boost Score: ",ada_score)

Ada boost Score:  0.8244252015527023


##### Gradient Tree Boosting
Gradient boosting approaches the problem a bit differently. Instead of adjusting weights of data points, Gradient boosting focuses on the difference between the prediction and the ground truth.

In [7]:
from sklearn.ensemble import GradientBoostingClassifier

#Default model is Decision Tree

gb_clf = GradientBoostingClassifier(random_state=0)
gb_clf.fit(X_train,y_train)

GradientBoostingClassifier(criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_iter_no_change=None, presort='auto',
                           random_state=0, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

In [8]:
gb_score = gb_clf.score(X_test,y_test)
print("Gradient Boost Score: ",gb_score)

Gradient Boost Score:  0.8462227530606151


##### XGBoost
XGBoost (standing for eXtreme Gradient Boosting) 

Gradient boosting machines are generally very slow in implementation because of sequential model training. Hence, they are not very scalable

However, it all changed with XGboost Library. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.

XGBoost has better because:
1. Handling sparse data: Missing values or data processing steps like one-hot encoding make data sparse. XGBoost incorporates a sparsity-aware split finding algorithm to handle different types of sparsity patterns in the data.
2. Regularization: XGBoost has an option to penalize complex models using both L1 and L2 regularization. Regularization helps prevent overfitting.
3. Weighted quantile sketch: XGBoost has a distributed weighted quantile sketch algorithm to effectively handle weighted data.
4. Out-of-core computing: This feature optimizes the available disk space and maximizes its usage when handling huge datasets that do not fit into memory
5. Block structure for parallel learning: For faster computing, XGBoost can make use of multiple cores on the CPU.

In [9]:
# pip install xgboost
from xgboost import XGBClassifier

xgb_clf = XGBClassifier(base_estimator=dt_clf, random_state=0)
xgb_clf.fit(X_train,y_train)

XGBClassifier(base_estimator=DecisionTreeClassifier(class_weight=None,
                                                    criterion='gini',
                                                    max_depth=1,
                                                    max_features=None,
                                                    max_leaf_nodes=None,
                                                    min_impurity_decrease=0.0,
                                                    min_impurity_split=None,
                                                    min_samples_leaf=1,
                                                    min_samples_split=2,
                                                    min_weight_fraction_leaf=0.0,
                                                    presort=False,
                                                    random_state=0,
                                                    splitter='best'),
              base_score=0.5, booster=None, colsample_bylevel=

In [10]:
xgb_score = xgb_clf.score(X_test,y_test)
print("XGBoost Score: ",xgb_score)

XGBoost Score:  0.8504031054045984
