## Ensemble Learning Techniques in Machine Learning

**Ensemble Techniques divided into 2 types:**
1. Simple ensemble techniques
2. Advanced ensemble techniques

### 1. Simple ensemble techniques:


#### A. Max Voting:
* This method is mainly used for the classification.
* In this we are using multiple models to make the predictions for each datapoint.
* In this we consider each model is as vote.
* The predictions which we are getting from the models are used as final prediction.

#### B. Averaging:
* The Averaging is similar to the max voting , but in this multiple prediction are done by each data point.
* In this method we can take average of all the models and make the final prediction.
* It is used for both regression and classification problems.

#### C. Weighted Average:
* This is the extended version of averaging method.
* In this every model is assigned different weights which defines the importance of each model for the predictions.

### 2. Advanced Ensemble techniques:

#### A. Stacking:
* This is one type of ensemble learning technique which make the predictions from the multiple models.

For example:
* knn, svm and decision tree to build a new model. The predictions are mainly depends on test data set. In this we can easily create multiple models.

#### B. Blending:
* This blending technique is very similar to the stacking but it uses only validation set from the training set.
* The predictions are mainly done on the validation set and the validation set and predictions are used to build a model and that model will run on test set.

#### C. Bagging:
* The bagging technique tells us to combine all the models to get the generalized result. 
* Here we are using the same data for all the models then there is a high chance to get the same result from all the models.
* That’s why we can use bootstrap sampling technique to create the multiple subsets from the original data with replacement.
* Bagging or bootstrap aggregation is an ensemble technique that uses the sample datasets. These sample datasets are non other than the part of or the subset of original dataset.

**In this bagging,**
1. We can create a multiple subsets and selecting the observations with replacement.
2. Create a base model on each subset.
3. These all models are independent with each other and run in parallel.
4. Combine all these models to get the final predictions.

![bagging%20%281%29.png](attachment:bagging%20%281%29.png)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [23]:
from sklearn.datasets import load_iris
iris_data=load_iris()
print(iris_data)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     

In [24]:
data_input=iris_data.data
data_output=iris_data.target
print(data_input)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

In [25]:
print(data_output)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [26]:
from sklearn.model_selection import KFold
kf=KFold(n_splits=5,shuffle=True)
print(kf)

KFold(n_splits=5, random_state=None, shuffle=True)


In [27]:
for train_set,test_set in kf.split(data_input):
    print(train_set, test_set)

[  0   1   2   3   5   6   7   8  10  11  12  13  15  20  21  22  23  24
  25  26  27  28  30  33  36  38  39  40  41  42  43  45  46  47  48  49
  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  68
  69  72  73  74  75  76  77  78  79  80  81  82  84  85  86  87  89  90
  91  92  94  95  97  98  99 100 102 103 105 106 107 108 109 110 111 112
 113 114 115 116 118 119 120 121 122 123 124 125 126 127 129 130 131 132
 134 135 137 138 140 141 142 144 145 146 148 149] [  4   9  14  16  17  18  19  29  31  32  34  35  37  44  67  70  71  83
  88  93  96 101 104 117 128 133 136 139 143 147]
[  0   2   4   5   7   8   9  10  11  12  13  14  15  16  17  18  19  20
  21  23  24  25  26  29  30  31  32  33  34  35  36  37  38  39  40  41
  42  44  47  48  49  51  52  54  57  59  60  61  62  63  65  66  67  68
  69  70  71  72  73  74  75  77  78  80  81  82  83  86  87  88  90  91
  92  93  94  95  96  97  99 100 101 102 103 104 105 106 107 108 109 110
 111 112 116 117 118 119

In [28]:
from sklearn.ensemble import RandomForestClassifier
rf_class=RandomForestClassifier(n_estimators=10)

from sklearn.model_selection import cross_val_score
cv=cross_val_score(rf_class,data_input,data_output,scoring='accuracy',cv=10)
print(cv)

[1.         0.93333333 1.         0.93333333 0.93333333 0.93333333
 0.8        1.         1.         1.        ]


In [32]:
accuracy=cv.mean()*100
print(accuracy)

95.33333333333334


#### D. Boosting:
* Boosting is a sequential process, each subsequent model correct the predictions of previous model and the subsequent models are dependent on the previous models.

**In this boosting,**
1. We create a subset from the original dataset.
2. All data points are given equal weights.
3. Create the base model for the subset
4. The model is used to make the predictions on the total data set.
5. Now errors are calculated by using the actual and predicted values.
6. If you find any in corrected prediction, that gets higher weights.
7. For that misclassified we can create another model to reduce that error.
8. And now create multiple models to reduce the error in the previous model.
9. And the final model is combining all the weak models to get the weighted mean of all models.
10. Individual learners are not performing well on the entire dataset they perform well only on some part of the dataset that’s why we can use the Boosting algorithm to combine all weak learners and form a strong learner.

![1_zTgGBTQIMlASWm5QuS2UpA.jpeg](attachment:1_zTgGBTQIMlASWm5QuS2UpA.jpeg)

**There are three types of Machine Learning boosting algorithms:**
1. Adaptive Boosting (also known as AdaBoosta)
2. Gradient Boosting
3. XGBoost

#### 1. Adaptive Boosting(AdaBoost):
1. It fits a sequence of weak learners on different weighted training data.
2. It starts by predicting original data set and gives equal weight to each observation.
3. If prediction is incorrect using the first learner, then it gives higher weight to observation which have been predicted incorrectly.
4. Being an iterative process, it continues to add learner(s) until a limit is reached in the number of models or accuracy.
5. Mostly, we use decision stamps with AdaBoost. But, we can use any machine learning algorithms as base learner if it accepts weight on training data set. We can use AdaBoost algorithms for both classification & Regression problem. 

Tune the parameters to optimize the performance of algorithms, Like,
* n_estimators: It controls the number of weak learners.
* learning_rate:Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.
* base_estimators: It helps to specify different ML algorithm.

In [3]:
df=pd.read_csv('D:\PGP IN DATA SCIENCE with Careerera\Data Sets\\ML Datasets\\diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
print("Shape", df.shape)
print("Size", df.size)
print("Data Types\n", df.dtypes)

Shape (768, 9)
Size 6912
Data Types
 Pregnancies                   int64
Glucose                       int64
BloodPressure                 int64
SkinThickness                 int64
Insulin                       int64
BMI                         float64
DiabetesPedigreeFunction    float64
Age                           int64
Outcome                       int64
dtype: object


In [7]:
df.isna().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

Data set has no null values.

#### Create Model & Train the model

In [8]:
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier

In [10]:
value=df.values
X=value[:,0:8]
Y=value[:,8]
seed=7
num_trees=10

In [18]:
kfold=model_selection.KFold(n_splits=10)
model=AdaBoostClassifier(n_estimators=num_trees)
results=model_selection.cross_val_score(model, X,Y, cv=kfold)
print(results.mean())

0.7525974025974026


#### 2. Gradient Boosting:
1. In gradient boosting, it trains many model sequentially.
2. Each new model gradually minimizes the loss function (y = ax + b + e, e needs special attention as it is an error term) of the whole system using Gradient Descent method.
3. The learning procedure consecutively fit new models to provide a more accurate estimate of the response variable.

* The principle idea behind this algorithm is to construct new base learners which can be maximally correlated with negative gradient of the loss function, associated with the whole ensemble.
* It can be used for both regression and classification problems.

Tune the parameters to optimize the performance of algorithms, Like,
* n_estimators: It controls the number of weak learners.
* learning_rate:Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.
* max_depth: maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree.
* the best value depends on the interaction of the input variables.

#### 3. XGBoost:

#### Parameters in XGBoost Algorithm:
**1. General Parameters:**
  * **silent**: this parameter retains its default values as 0 and we need to explicitly specify the value 1 for silent mode while 0 is used for printing running messages.
  * **booster**: we use this parameter to specify the value of the booster. It has gbtree as the default value which is used for tree-based booster and other is gblinear for linear function.
  * **num_pbuffer**: we do not need to explicitly set the value for this parameter since the XGBoost algorithm automatically sets the value for this parameter.
  * **num_feature**: like num_pbuffer, the XGBoost algorithm automatically sets the value for this parameter and we do not need to explicitly set the value for this.

**2. Booster Parameters:**
  * XGBooster deals with tree-specific parameters.
  * **eta**: the parameter attains the default value of 0.3 but we need to specify the step size shrinkage in an attempt to avoid overfitting. After the algorithm proceeds to each boosting step, we can automatically get the value of the weights of the new features. The main aim of eta is to shrink the feature weights that is consequently making the boosting process more conservative. Though the value for eta ranges from 0 to 1, a lower eta will indicate a model that is robust to overfitting.
  * **gamma**: the gamma parameter attains 0 as its default value while we need to specify the minimum loss reduction to make further participation on any leaf node. A larger value of gamma will indicate a more conservative algorithm. The range of values this parameter can attain is 0 to infinite.
  * **max_depth**: the parameter attains the default value as 6 while you have to specify the maximum depth of the tree. The range of values for the parameter ranges from 0 to infinite.
  * **min_child_weight**: the parameter attains the default values as 1 while you need to specify the minimum sum of instances of weights for a child. If the tree partition step will result in a leaf node, then the sum of weights is less than the min_child_weight. The parameter range value is from 0 to infinite.
  * **max_delta_step**: the parameter attains the default value as 0 and the max_delta_step will allow the tree’s weight estimation. The default value 0 states that the tree is set to no constraints. If we set the parameter with a positive value, then the update step becomes more conservative. The ideal values for this parameter range from 1 to 10 to obtain better results. The range of values for this parameter is from 0 to infinite.
  * **subsample**: the parameter attains the default value as 1 while we need to specify the subsample ratio of the training instance. As an example, if the value of this parameter is set to 0.5, then it means that the algorithm has chosen half of the data instances. The range of values for the subsample parameter is from 0 to 1.
  * **cosample_bytree**: the parameter attains the default value as 1 while we need to set the subsample ratio of columns when constructing trees for the model. The range of values for the cosample_bytree parameter is from 0 to 1.

**3. Linear Booster Specific Parameters:**
  * **lambda and alpha**: these are the regularization terms for the weights of the leaf. While lambda attains 1 as its default value, alpha attains the default as 0.
  * **lambda_bias**: it is an L2 regularization term on the bias with the default value of 0.

**4. Learning Task Parameters:**
 * **base_score**: the parameter attains the default value as 0.5 while we need to specify the initial prediction score of all the instances including global bias.
 * **objective**: the default value of the parameter is reg:linear while we need to specify the type of learner that we want for the algorithm. This includes linear regression, Poisson regression, and so on.
 * **eval_metric**: we need to specify the evaluation of the metrics for data validation. Then the algorithm will assign a default metric to the objective.
 * **seed**: for this parameter, we need to specify the seed to reproduce the same set of outputs.

In [21]:
from sklearn import svm
from xgboost import XGBClassifier
clf=XGBClassifier()

seed=7
num_trees=30

kfold=model_selection.KFold(n_splits=10)
model=XGBClassifier(n_estimators=num_trees)
results=model_selection.cross_val_score(model, X,Y, cv=kfold)
print(results.mean())

0.7499487354750513
