## Introduction to Boosting Algorithms

* In almost all of the Competitions, the winning solutions use Boosting Algorithms and Gradient Boosting Algorithms are very popular in this Case.
* Boosting is a Method of Converting week learners into Strong Learners.
* In Boosting, each new tree is fit on a modified version of the original dataset.
* The Very First Boosting Algorithm is AdaBoost Algorithm, It becomes very easy to understand other Boosting Algorihtms after Understanding the Ada Boost Algorithm.


### Ada Boost Algorithm

* 1. The AdaBoost Algorithm begins by training a decision tree in which each observation is assigned an equal weight.
* 2. After evaluating the first tree, we increase the weights of those observations that are difficult to classify and lower the weights for those that are easy to classify.
* 3. The second tree is therefore grown on this weighted data. 
* 4. Here, the idea is to improve upon the predictions of the first tree. 
* 5. Here, the idea is to improve upon the predictions of the first tree. 
* 6. Our new model is therefore Tree 1 + Tree 2.
* 7. We then compute the classification error from this new 2-tree ensemble model and grow a third tree to predict the revised residuals.
* 8.  We repeat this process for a specified number of iterations. Subsequent trees help us to classify observations that are not well classified by the previous trees.
* 9. Predictions of the final ensemble model is therefore the weighted sum of the predictions made by the previous tree models.

&nbsp;

### Gradient Boost Algorithm

* Gradient Boosting trains many models in a gradual, additive and sequential manner.
* While the AdaBoost model identifies the shortcomings by using high weight data points, gradient boosting performs the same by using gradients in the loss function (y=ax+b+e , e needs a special mention as it is the error term).
* The loss function is a measure indicating how good are model’s coefficients are at fitting the underlying data.
*  One of the biggest motivations of using gradient boosting is that it allows one to optimise a user specified cost function, instead of a loss function that usually offers less control and does not essentially correspond with real world applications.


![image.png](attachment:image.png)

### Gradient Boosting Parameters

* ```Tree-Specific Parameters```: These affect each individual tree in the model.
* ```Boosting Parameters```: These affect the boosting operation in the model.
* ```Miscellaneous Parameters```: Other parameters for overall functioning.

**Tree Specific Parameters**: 

* 1. ```min_samples_split```: 
    * Defines the minimum number of samples (or observations) which are required in a node to be considered for splitting.
    * Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
    * Too high values can lead to under-fitting hence, it should be tuned using CV.

* 2. ```min_samples_leaf```: 
    * Defines the minimum samples (or observations) required in a terminal node or leaf.
    * Used to control over-fitting similar to min_samples_split.
    * Generally lower values should be chosen for imbalanced class problems because the regions in which the minority class will be in majority will be very small.
    
* 3. ```min_weight_fraction_leaf```:
    * Similar to min_samples_leaf but defined as a fraction of the total number of observations instead of an integer.
    * Only any one of min-samples-leaf and min-weight-fraction-leaf can be considered.

* 4. ```max_depth```:
    * The maximum depth of a tree.
    * Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
    * Should be tuned using CV.
    
* 5. ```max_leaf_nodes```:
    * The maximum number of terminal nodes or leaves in a tree.
    * Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
    * If this is defined, GBM will ignore max_depth.
    
* 6. ```max_faetures```:
    * The number of features to consider while searching for a best split. These will be randomly selected.
    * As a thumb-rule, square root of the total number of features works great but we should check upto 30-40% of the total number of features.
    * Higher values can lead to over-fitting but depends on case to case.
    
**Pseudo Code/Algorithm for Gradient Boosting**
1. Initialize the outcome
2. Iterate from 1 to total number of trees
    * 2.1 Update the weights for targets based on previous run (higher for the ones mis-classified)
    * 2.2 Fit the model on selected subsample of data
    * 2.3 Make predictions on the full set of observations
    * 2.4 Update the output with current results taking into account the learning rate
3. Return the final output.
    
    
**Boosting Parameters**

* 1. ```learning rate```: 
    * This determines the impact of each tree on the final outcome (step 2.4). GBM works by starting with an initial estimate which is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates.
    * Lower values are generally preferred as they make the model robust to the specific characteristics of tree and thus allowing it to generalize well.
    * Lower values would require higher number of trees to model all the relations and will be computationally expensive.

* 2. ```n_estimators```:
    * The number of sequential trees to be modeled (step 2)
Though GBM is fairly robust at higher number of trees but it can still overfit at a point. Hence, this should be tuned using CV for a particular learning rate.

* 3. ```subsample```:
    * The fraction of observations to be selected for each tree. Selection is done by random sampling.
    * Values slightly less than 1 make the model robust by reducing the variance.
    * Typical values ~0.8 generally work fine but can be fine-tuned further.
    
**Miscellaneous Parameters**:

* 1. ```loss```:
    * It refers to the loss function to be minimized in each split.
    * It can have various values for classification and regression case. Generally the default values work fine. Other values should be chosen only if you understand their impact on the model.

* 2. ```init```:
    * This affects initialization of the output.
    * This can be used if we have made another model whose outcome is to be used as the initial estimates for GBM.

* 3. ```random_state```:
    * The random number seed so that same random numbers are generated every time.
    * This is important for parameter tuning. If we don’t fix the random number, then we’ll have different outcomes for subsequent runs on the same parameters and it becomes difficult to compare models.
    * It can potentially result in overfitting to a particular random sample selected. We can try running models for different random samples, which is computationally expensive and generally not used.
    
* 4. ```warm_start```:
    * This parameter has an interesting application and can help a lot if used judicially.
    * Using this, we can fit additional trees on previous fits of a model. It can save a lot of time and you should explore this option for advanced applications

* 5. ```presort```:
    * Select whether to presort data for faster splits.
    * It makes the selection automatically by default but it can be changed if needed.
    

### XGBoost Algorithm

* ```XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework```.
* Lets see the evolution of tree based Machine Learning Predictive Models, and try to understand the Origin of Xg Boost 

#### Evolution of Tree Based Models

* 1. ```Decision Trees```: The Most Basic Predictive Model based on Gini Index and Information gain. It uses the most immportant attribute as the root node and least important one as the leaf node.

* 2. ```The Bagging Algorithm```: Bagging or Bootstrap aggregating is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. It creates n number of different bags from the dataset so that model gets introduced to so many different sets of datasets while training process.

* 3. ```Random Forest Algorithm```: Random Forest Algorithm is a Bagging Based Algorithm wherein so many decision trees are formed with different parameters and then their aggregation is done to reach out to the result.

* 4. ```The Boosting Algorithm```: The term 'Boosting' refers to a family of algorithms which converts weak learner to strong learners. Boosting is an ensemble method for improving the model predictions of any given learning algorithm. The idea of boosting is to train weak learners sequentially, each trying to correct its predecessor. 

* 5. ```The Gradient Boosting Algorithm```: A special case of boosting where errors are minimized by gradient descent algorithm. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. The key idea is to set the target outcomes for this next model in order to minimize the error.

* 6. ```The XgBoost Algorithm```: It is also known as Extreme Gradient Boosting.  It is a perfect combination of software and hardware optimization techniques to yield superior results using less computing resources in the shortest amount of time.


![image.png](attachment:image.png)

```The major difference between AdaBoost and Gradient Boosting Algorithm is how the two algorithms identify the shortcomings of weak learners (eg. decision trees).``` 

## Parameters Tuning for Xg Boost Model

**General Parameters**
* 1. booster: [default=gbtree]
    * Select the type of model to run at each iteration. It has 2 options:
        * gbtree: tree-based models
        * gblinear: linear models
* 2. silent: [default=0]:
    * Silent mode is activated is set to 1, i.e. no running messages will be printed.
    * It’s generally good to keep it 0 as the messages might help in understanding the model.

* 3. nthread: [default to maximum number of threads available if not set]
    * This is used for parallel processing and number of cores in the system should be entered
    * If you wish to run on all cores, value should not be entered and algorithm will detect automatically


There are 2 more parameters which are set automatically by XGBoost and there is need not worry about them. Lets move on to Booster parameters.

**Boosting Parameters**

* 1. ```eta``` [default=0.3]
    * Analogous to learning rate in GBM
    * Makes the model more robust by shrinking the weights on each step
    * Typical final values to be used: 0.01-0.2

* 2. ```min_child_weight``` [default=1]
    * Defines the minimum sum of weights of all observations required in a child.
    * This is similar to min_child_leaf in GBM but not exactly. This refers to min “sum of weights” of observations while GBM has min “number of observations”.
    * Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
    * Too high values can lead to under-fitting hence, it should be tuned using CV.

* 3. ```max_depth``` [default=6]
    * The maximum depth of a tree, same as GBM.
    * Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
    * Should be tuned using CV.
    * Typical values: 3-10

* 4. ```max_leaf_nodes```
    * The maximum number of terminal nodes or leaves in a tree.
    * Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
    * If this is defined, GBM will ignore max_depth.

* 5. ```gamma``` [default=0]
    * A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
    * Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.

* 6. ```max_delta_step``` [default=0]
    * In maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
    * Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.

* 7. ```subsample``` [default=1]
    * Same as the subsample of GBM. Denotes the fraction of observations to be randomly samples for each tree.
    * Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
    * Typical values: 0.5-1
* 8. ```colsample_bytree``` [default=1]
    * Similar to max_features in GBM. Denotes the fraction of columns to be randomly samples for each tree.
    * Typical values: 0.5-1

* 9. ```colsample_bylevel``` [default=1]
    * Denotes the subsample ratio of columns for each split, in each level.
    
* 10. ```lambda``` [default=1]
    * L2 regularization term on weights (analogous to Ridge regression)
    * This used to handle the regularization part of XGBoost. Though many data scientists don’t use it often, it should be explored to reduce overfitting.

* 11. ```alpha``` [default=0]
    * L1 regularization term on weight (analogous to Lasso regression)
    * Can be used in case of very high dimensionality so that the algorithm runs faster when implemented

* 12. ```scale_pos_weight``` [default=1]
    * A value greater than 0 should be used in case of high class imbalance as it helps in faster convergence.
    
**Learning Parameters**

* 1. ```objective``` [default=reg:linear]
    * This defines the loss function to be minimized. Mostly used values are:
        * binary:logistic –logistic regression for binary classification, returns predicted probability (not class)
        * multi:softmax –multiclass classification using the softmax objective, returns predicted class (not probabilities)
        * multi:softprob –same as softmax, but returns predicted probability of each data point belonging to each class.
* 2. ```eval_metric``` [ default according to objective ]
    * The metric to be used for validation data.
    * The default values are rmse for regression and error for classification.
    Typical values are:
        * rmse – root mean square error
        * mae – mean absolute error
        * logloss – negative log-likelihood
        * error – Binary classification error rate (0.5 threshold)
        * merror – Multiclass classification error rate
        * mlogloss – Multiclass logloss
        * auc: Area under the curve

* 3. ```seed``` [default=0]
    * The random number seed.
    * Can be used for generating reproducible results and also for parameter tuning.

## Advantages:

**System Optimizations**:

* 1. ```Parallelization```:  XGBoost approaches the process of sequential tree building using parallelized implementation.
* 2. ```Tree Pruning```: The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward. This ‘depth-first’ approach improves computational performance significantly

* 3. ```Regularization```:  It penalizes more complex models through both LASSO (L1) and Ridge (L2) regularization to prevent overfitting.
* 4. ```Sparsity Awareness```: XGBoost naturally admits sparse features for inputs by automatically ‘learning’ best missing value depending on training loss and handles different types of sparsity patterns in the data more efficiently.

* 5. ```Weighted Quantile Sketch```: XGBoost employs the distributed weighted Quantile Sketch algorithm to effectively find the optimal split points among weighted datasets.

* 6. ```Cross Validation```:  The algorithm comes with built-in cross-validation method at each iteration, taking away the need to explicitly program this search and to specify the exact number of boosting iterations required in a single run.


![image.png](attachment:image.png)

### Light Boost Algorithm

* Light GBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm, used for ranking, classification and many other machine learning tasks.

Since it is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise. So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms. Also, it is surprisingly very fast, hence the word ‘Light’.


**Difference Between Light Gradient Boosting and Extreme Gradient Boosting**

![image.png](attachment:image.png)

Leaf wise splits lead to increase in complexity and may lead to overfitting and it can be overcome by specifying another parameter max-depth which specifies the depth to which splitting will occur.

## Advantages of Light Gradient Boosting

* Faster training speed and higher efficiency
* Lower memory usage
* Better accuracy than any other boosting algorithm
* Compatibility with Large Datasets
* Parallel learning supported.

## Parameters for Light Gradient Boosting

* Light GBM uses leaf wise splitting over depth-wise splitting which enables it to converge much faster but also leads to overfitting. So here is a quick guide to tune the parameters in Light GBM.

**For Best Fitting**:

    * num_leaves : This parameter is used to set the number of leaves to be formed in a tree. Theoretically relation between num_leaves and max_depth is num_leaves= 2^(max_depth). However, this is not a good estimate in case of Light GBM since splitting takes place leaf wise rather than depth wise. Hence num_leaves set must be smaller than 2^(max_depth) otherwise it may lead to overfitting. Light GBM does not have a direct relation between num_leaves and max_depth and hence the two must not be linked with each other.
    
    * min_data_in_leaf : It is also one of the important parameters in dealing with overfitting. Setting its value smaller may cause overfitting and hence must be set accordingly. Its value should be hundreds to thousands of large datasets.

    * max_depth: It specifies the maximum depth or level up to which tree can grow.


**For Faster Speed**:

    * bagging_fraction : Is used to perform bagging for faster results
    * feature_fraction : Set fraction of the features to be used at each iteration
    * max_bin : Smaller value of max_bin can save much time as it buckets the feature values in discrete bins which is computationally inexpensive.
    
**For Better Accuracy**:

    * Use bigger training data
    * num_leaves : Setting it to high value produces deeper trees with increased accuracy but lead to overfitting. Hence its higher value is not preferred.
    * max_bin : Setting it to high values has similar effect as caused by increasing value of num_leaves and also slower our training procedure.

In [9]:
# lets import the libraries

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

import xgboost as xgb
import lightgbm as lgb

In [5]:
train_data.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [25]:
data = pd.read_csv('cancer/data.csv')

In [39]:
data.shape

(569, 33)

In [26]:
data.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [27]:
data.columns

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst', 'Unnamed: 32'],
      dtype='object')

In [28]:
data.isnull().sum()

id                           0
diagnosis                    0
radius_mean                  0
texture_mean                 0
perimeter_mean               0
area_mean                    0
smoothness_mean              0
compactness_mean             0
concavity_mean               0
concave points_mean          0
symmetry_mean                0
fractal_dimension_mean       0
radius_se                    0
texture_se                   0
perimeter_se                 0
area_se                      0
smoothness_se                0
compactness_se               0
concavity_se                 0
concave points_se            0
symmetry_se                  0
fractal_dimension_se         0
radius_worst                 0
texture_worst                0
perimeter_worst              0
area_worst                   0
smoothness_worst             0
compactness_worst            0
concavity_worst              0
concave points_worst         0
symmetry_worst               0
fractal_dimension_worst      0
Unnamed:

In [29]:
# encoding the target column

data['diagnosis'] = data['diagnosis'].replace(('M','B'),(1, 0))

In [41]:
y = data['diagnosis']
x = data.drop(['diagnosis','Unnamed: 32', 'id'], axis = 1)

print("Shape of x:", x.shape)
print("Shape of y:", y.shape)

Shape of x: (569, 30)
Shape of y: (569,)


In [42]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

print("Shape of x_train: ", x_train.shape)
print("Shape of x_test :", x_test.shape)

Shape of x_train:  (455, 30)
Shape of x_test : (114, 30)


## Applying XgBoost Algorithm

In [32]:
#The data is stored in a DMatrix object 
#label is used to define our outcome variable

dtrain = xgb.DMatrix(x_train, label = y_train)
dtest = xgb.DMatrix(x_test)

In [33]:
#setting parameters for xgboost
parameters={'max_depth':7,
            'eta':1,
            'silent':1,
            'objective':'binary:logistic',
            'eval_metric':'auc',
            'learning_rate':.05
        }

In [34]:
#training our model 

num_round=50
from datetime import datetime 

start = datetime.now() 
xg=xgb.train(parameters,dtrain,num_round) 
stop = datetime.now()

In [36]:
#now predicting our model on test set 
ypred=xg.predict(dtest) 
ypred

array([0.9521347 , 0.09990706, 0.04416602, 0.08305214, 0.04416602,
       0.05473065, 0.05220229, 0.04416602, 0.04836307, 0.04416602,
       0.5177092 , 0.05878786, 0.04770402, 0.77660877, 0.40951762,
       0.78631026, 0.10438541, 0.95116407, 0.9527889 , 0.95439816,
       0.94031286, 0.9486958 , 0.08482296, 0.04416602, 0.93303514,
       0.04416602, 0.04416602, 0.81817454, 0.04416602, 0.95439816,
       0.04846001, 0.91073424, 0.06526709, 0.93903255, 0.04416602,
       0.9521347 , 0.04416602, 0.92709106, 0.04440524, 0.94389176,
       0.548067  , 0.05521864, 0.861944  , 0.04416602, 0.1562841 ,
       0.94948083, 0.04416602, 0.0476751 , 0.04416602, 0.95439816,
       0.9334919 , 0.9485579 , 0.93903255, 0.04416602, 0.04416602,
       0.04416602, 0.08347554, 0.07989693, 0.08308426, 0.95439816,
       0.9521347 , 0.94184697, 0.04416602, 0.04416602, 0.93857783,
       0.26097167, 0.941086  , 0.95439816, 0.94934565, 0.0484628 ,
       0.18572491, 0.95439816, 0.04416602, 0.46086285, 0.95139

In [43]:
#Converting probabilities into 1 or 0  
for i in range(0,114):
    if ypred[i]>=.5:       # setting threshold to .5 
        ypred[i]=1 
    else: 
        ypred[i]=0  

In [47]:
#calculating accuracy of our model 
from sklearn.metrics import accuracy_score 
accuracy_xgb = accuracy_score(y_test,ypred) 
print("Accuracy of XG Boost Model is {:.2f}".format(accuracy_xgb*100))

Accuracy of XG Boost Model is 97.37


## Applying Light Gradient Boosting Algorithm

In [48]:
train_data=lgb.Dataset(x_train,label=y_train)

In [49]:
#setting parameters for lightgbm

param = {'num_leaves':150,
         'objective':'binary',
         'max_depth':7,
         'learning_rate':.05,
         'max_bin':200}

param['metric'] = ['auc', 'binary_logloss']

In [50]:
#training our model using light gbm
num_round=50

start=datetime.now()
lgbm=lgb.train(param,train_data,num_round)
stop=datetime.now()

In [51]:
#predicting on test set
ypred2=lgbm.predict(x_test)
ypred2

array([0.95077592, 0.06309084, 0.0428505 , 0.04065144, 0.05354329,
       0.02876393, 0.03888023, 0.03460286, 0.04119295, 0.03096087,
       0.42142082, 0.08125836, 0.03156123, 0.66126046, 0.37117911,
       0.86087431, 0.08323739, 0.94256759, 0.94356544, 0.95026507,
       0.90099745, 0.93038069, 0.07906226, 0.02983562, 0.91134792,
       0.03019317, 0.02992174, 0.8426933 , 0.02890352, 0.95024847,
       0.04197821, 0.89322764, 0.05598539, 0.94128684, 0.03286804,
       0.94551573, 0.03408877, 0.92973154, 0.02999688, 0.95405579,
       0.42038599, 0.03107911, 0.73645446, 0.02994411, 0.25130094,
       0.94196308, 0.03286804, 0.04151666, 0.03201545, 0.94488095,
       0.94634761, 0.90247508, 0.93357408, 0.02894752, 0.02855055,
       0.03254383, 0.06682875, 0.08696983, 0.04987251, 0.94912874,
       0.94366834, 0.95364203, 0.02947393, 0.02898915, 0.9436154 ,
       0.1216132 , 0.94705685, 0.94844406, 0.94444268, 0.03900295,
       0.31644796, 0.94987952, 0.0300086 , 0.5207737 , 0.86685

In [53]:
#converting probabilities into 0 or 1
for i in range(0, 114):
    if ypred2[i]>=.5:       # setting threshold to .5
        ypred2[i]=1
    else:  
        ypred2[i]=0

In [57]:
#calculating accuracy
accuracy_lgbm = accuracy_score(ypred2,y_test)
print("Accuracy of Light Gradient Boosting Model is {0:.2f}".format(accuracy_lgbm*100))

Accuracy of Light Gradient Boosting Model is 97.37


## Lets check the ROC AUC Score for both of the Models

In [62]:
from sklearn.metrics import roc_auc_score

#calculating roc_auc_score for xgboost
auc_xgb =  roc_auc_score(y_test,ypred)
print("ROC AUC Score for Xg Boost Model:", auc_xgb)

#calculating roc_auc_score for light gbm. 
auc_lgbm = roc_auc_score(y_test,ypred2)
print("ROC AUC Score for Lg Boost Model:", auc_lgbm)


ROC AUC Score for Xg Boost Model: 0.9744363289933312
ROC AUC Score for Lg Boost Model: 0.971260717688155
