# Ensemble learning
* Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produces more accurate solutions than a single model would.
* Techniques in ensemble learning:

## Simple Ensemble Techniques
* Max Voting
* Averaging
* Weighted Averaging

### Max Voting
* Generally used for classification problems. In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as a ‘vote’. The predictions which we get from the majority of the models are used as the final prediction.
* For example, when you asked 5 of your colleagues to rate your movie (out of 5); we’ll assume three of them rated it as 4 while two of them gave it a 5. Since the majority gave a rating of 4, the final rating will be taken as 4. You can consider this as taking the mode of all the predictions.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values   #breast cancer dataset

print(dataset)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

In [3]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [4]:
model1 = LogisticRegression(random_state=1)
model2 = DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')
model.fit(X_train,y_train)
model.score(X_test,y_test)   #combining two classifiers gives us 93% of accuracy which is lower than our both individual accuracy which 

0.935672514619883

### Averaging
* Similar to the max voting technique, multiple predictions are made for each data point in averaging. In this method, we take an average of predictions from all the models and use it to make the final prediction.
* Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.
* For example, the averaging method would take the average of all the values. i.e. (5+4+5+4+4)/5 = 4.4

In [5]:
from sklearn.neighbors import KNeighborsClassifier   #probabilities for classification problem
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values   #breast cancer dataset

print(dataset)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1=model1.predict_proba(X_test)
pred2=model2.predict_proba(X_test)
pred3=model3.predict_proba(X_test)

#averaging
finalpred=(pred1+pred2+pred3)/3
finalpred   #it was binary classification,  in the array at index0 is prob of class 0 and at index1 is the probability of class1

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

array([[9.97246751e-01, 2.75324889e-03],
       [9.98345743e-01, 1.65425676e-03],
       [1.95145383e-02, 9.80485462e-01],
       [5.76087848e-03, 9.94239122e-01],
       [9.99196818e-01, 8.03181785e-04],
       [9.98450185e-01, 1.54981476e-03],
       [9.94336081e-01, 5.66391889e-03],
       [3.33457249e-01, 6.66542751e-01],
       [9.98358761e-01, 1.64123914e-03],
       [9.98638305e-01, 1.36169539e-03],
       [3.19601891e-04, 9.99680398e-01],
       [9.96141718e-01, 3.85828161e-03],
       [1.81935460e-05, 9.99981806e-01],
       [9.98222785e-01, 1.77721544e-03],
       [9.91762050e-01, 8.23795044e-03],
       [7.75861463e-01, 2.24138537e-01],
       [2.29171837e-03, 9.97708282e-01],
       [3.31288969e-04, 9.99668711e-01],
       [1.05529831e-04, 9.99894470e-01],
       [9.94313709e-01, 5.68629088e-03],
       [9.98406101e-01, 1.59389892e-03],
       [9.97094276e-01, 2.90572446e-03],
       [1.82783472e-01, 8.17216528e-01],
       [9.98883224e-01, 1.11677643e-03],
       [1.377996

### Weighted Average
* This is an extension of the averaging method. All models are assigned different weights defining the importance of each model for prediction. For instance, if two of your colleagues are critics, while others have no prior experience in this field, then the answers by these two friends are given more importance as compared to the other people.
* The result is calculated as [(5\*0.23) + (4\*0.23) + (5\*0.18) + (4\*0.18) + (4\*0.18)] = 4.41.

In [6]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values   #breast cancer dataset
print(dataset)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1=model1.predict_proba(X_test)
pred2=model2.predict_proba(X_test)
pred3=model3.predict_proba(X_test)

finalpred=(pred1*0.3 + pred2*0.3 + pred3*0.4)   #weights to every model
finalpred

##the above averaging gives more correct prediction

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

array([[9.96696101e-01, 3.30389867e-03],
       [9.98014892e-01, 1.98510812e-03],
       [2.34174460e-02, 9.76582554e-01],
       [6.91305417e-03, 9.93086946e-01],
       [9.99036182e-01, 9.63818142e-04],
       [9.98140222e-01, 1.85977771e-03],
       [9.93203297e-01, 6.79670267e-03],
       [3.00148698e-01, 6.99851302e-01],
       [9.98030513e-01, 1.96948697e-03],
       [9.98365966e-01, 1.63403447e-03],
       [3.83522269e-04, 9.99616478e-01],
       [9.95370062e-01, 4.62993793e-03],
       [2.18322552e-05, 9.99978168e-01],
       [9.97867341e-01, 2.13265853e-03],
       [9.90114459e-01, 9.88554052e-03],
       [7.71033755e-01, 2.28966245e-01],
       [2.75006204e-03, 9.97249938e-01],
       [3.97546762e-04, 9.99602453e-01],
       [1.26635797e-04, 9.99873364e-01],
       [9.93176451e-01, 6.82354906e-03],
       [9.98087321e-01, 1.91267870e-03],
       [9.96513131e-01, 3.48686935e-03],
       [1.99340166e-01, 8.00659834e-01],
       [9.98659868e-01, 1.34013171e-03],
       [1.653595

## Advanced Ensemble techniques
* Stacking
* Blending
* Bagging
* Boosting

### Stacking
* Stacking is an ensemble learning technique that uses predictions from multiple models(for example decision tree, knn) to build a new model. This model is used for making predictions on the test set. Below are the steps for a simple stacked ensemble:
1. The train set is split into 10 parts.
2. A base model(for say, decision tree) is fitted on 9 parts and predictions are made for the 10th part. This is done for each part of the train set.
3. The base model(for say, decision tree) is then fitted on the whole train dataset.
4. Using this model, predictions are made on the test set.
5. Steps 2 to 4 are repeated for another base model(say knn) resulting in another set of predictions for the train set and test set.
6. The predictions from the train set are used as features to build a new model.
7. This model is used to make final predictions on the test prediction set.

### Blending
* Blending follows the same approach as stacking but uses only a holdout (validation) set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set. Below are the steps for the blending process:
1. The train set is split into training and validation sets.
2. Models are fitted on the training set.
3. The predictions are made on the validation set and the test set.
4. The validation set and its predictions are used as features to build a new model.
5. This model is used to make final predictions on the test.

### Bagging
* The idea behind bagging is combining the results of multiple models (for instance, all decision trees) to get a generalized result. If you give two models the same dataset then it's likely that they would give the same result.
* To avoid this we could use **Bootstrapping**.
* Bootstrappping is a technique in which we create subsets from the dataset, with replacement. The size of subsets is same as the original set.
* Replacement means: if we have: [1,2,3,4,5,6,7,8,9,10] and B=2 where B is the number of subsets, the result would be two subsets with duplicate data and those subsets as a union may or may not include the whole original dataset values: [1,2,7,4,1,3,2,1,9,3], [4,10,9,2,6,10,3,1,1,10]. Some observations are not even included in both the subsets like 5 and 8.
* So these left-out observations are called Out-of-Bag observation. These Out-Of-Bag observations are given to every Decision Tree that didn't consists this observation but as an unseen sample. After predicting by every Decision Tree, the majority vote is finalised and given to this observation.
* Bagging (or Bootstrap Aggregating) technique uses these subsets(bags) to get a fair idea of the distribution(complete set). The size of subsets created for bagging may be less than the original set.
* Bagging works like: 
1. Multiple subsets are created from the original dataset, selecting observations with replacement.
2. A base model(weak model) is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions from all the models.

### Boosting
* If a data point is incorrectly predicted by every model then combining the predictions won't give us better results. So to solve this problem we have Boosting. 
* Boosting is a process, where each model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model. Boosting steps:
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.
5. Errors are calculated using the actual values and predicted values.
6. The observations which are incorrectly predicted, are given higher weights.
7. Another model is created and predictions are made on the dataset. This model tries to correct the errors from the previous model.
8. Similarly, multiple models are created, each correcting the errors of the previous model.
9. The final model(strong learner) is the weighted mean of all the models(weak learners).
* The boosting algorithm combines a number of weak learners to form a strong learner. The individual models would not perform well on the entire dataset, but they work well for some part of the dataset. Here, each model actually boosts the performance.

## Algorithms based on Bagging and Boosting
### Bagging algorithms:
* Bagging meta-estimator
* Random forest
#### Bagging meta-estimators
* Ensembling algorithm that can be used for both classification(BaggingClassifier) and regression(BaggingRegressor) problems. It follows the typical bagging technique to make predictions. 
* Random subsets are created from the original dataset(Bootstrapping) which includes all features.
* A user-specified base model is fitted on each of these smaller sets and then predictions from each model are combined to get the final result.

In [7]:
#Bagging Regressor

rdataset = pd.read_csv('Data.csv')
rX = rdataset.iloc[:, :-1].values
ry = rdataset.iloc[:, -1].values
print(rdataset)
#predicting the energy(dependent varibale), using engine temp, exhaust vaccume, ambient pressure and relative humidity(independent variable)

rX_train, rX_test, ry_train, ry_test = train_test_split(rX, ry, test_size = 0.2, random_state = 0)

from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
rmodel = BaggingRegressor(DecisionTreeRegressor(random_state=1))
rmodel.fit(rX_train, ry_train)
rmodel.score(rX_test,ry_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.8532367816091954

In [8]:
#Bagging Classifier

cdataset = pd.read_csv('Data.csv')
cX = cdataset.iloc[:, :-1].values
cy = cdataset.iloc[:, -1].values   #breast cancer dataset
print(cdataset)

cX_train, cX_test, cy_train, cy_test = train_test_split(cX, cy, test_size = 0.25, random_state = 0)

sc = StandardScaler()
cX_train = sc.fit_transform(cX_train)
cX_test = sc.transform(cX_test)

from sklearn.ensemble import BaggingClassifier
cmodel = BaggingClassifier(DecisionTreeClassifier(random_state=1))
cmodel.fit(cX_train, cy_train)
cmodel.score(cX_test,cy_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.9239766081871345

#### Random Forest 
* Ensemble machine learning algorithm that follows the bagging technique. It is an extension of the bagging estimator algorithm. The base models in random forest are decision trees.
* Unlike bagging meta estimator, random forest randomly selects a set of features which are used to decide the best split at each node of the decision tree.
* Random forest randomly selects data points and features, and builds multiple trees (Forest) .

In [9]:
#Random Forest Regressor

print(rdataset)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(rX_train, ry_train)
regressor.score(rX_test, ry_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.8447333333333333

In [10]:
#Random Forest Classifier

print(cdataset)

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(cX_train, cy_train)

classifier.score(cX_test, cy_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.935672514619883

### Boosting Algorithms
* AdaBoost
* Gradient Boosting
* XGBoost
* Light GBM
* CatBoost

#### AdaBoost 
* Adaptive boosting or AdaBoost is one of the simplest boosting algorithms.
* Decision trees are used for modelling. Multiple sequential models are created, each correcting the errors from the last model. 
* AdaBoost assigns weights to the observations which are incorrectly predicted and the next model works to predict these values correctly.
1. All observations in the dataset are given equal weights.
2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual values.
5. While creating the next model, higher weights are given to the data points which were predicted incorrectly.
6. Weights can be determined using the error value. For say, higher the error more is the weight assigned to the observation.
7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

In [11]:
#AdaBoost Regressor

print(rdataset)

from sklearn.ensemble import AdaBoostRegressor
rmodel = AdaBoostRegressor(random_state=1)
rmodel.fit(rX_train, ry_train)
rmodel.score(rX_test, ry_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.892108932614837

In [12]:
#AdaBoost Classifier

print(cdataset)

from sklearn.ensemble import AdaBoostClassifier
cmodel = AdaBoostRegressor(random_state=1)
cmodel.fit(cX_train, cy_train)
cmodel.score(cX_test, cy_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.8258441883047928

####  Gradient Boosting (GBM)
* Gradient Boosting or GBM is another ensemble machine learning algorithm that works for both regression and classification problems. GBM uses the boosting technique, combining a number of weak learners to form a strong learner. 
* Regression trees used as a base learner, each subsequent tree in series is built on the errors calculated by the previous tree.
* To get a hold of how Gradient Boosting works go to: https://www.youtube.com/watch?v=3CC4N4z3GJc for Regressor and, go to: https://www.youtube.com/watch?v=jxuNLH5dXCs for Classifier.

In [13]:
#Gradient Boost Regressor

print(rdataset)
from sklearn.ensemble import GradientBoostingRegressor
rmodel= GradientBoostingRegressor()
rmodel.fit(rX_train, ry_train)
rmodel.score(rX_test,ry_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.8659570570388071

In [14]:
#Gradient Boost Classifier

print(cdataset)
from sklearn.ensemble import GradientBoostingClassifier
cmodel= GradientBoostingClassifier()
cmodel.fit(cX_train, cy_train)
cmodel.score(cX_test,cy_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.9415204678362573

#### XGBoost
* XGBoost(extreme Gradient Boosting) is an advanced implementation of the gradient boosting algorithm. XGBoost has proved to be a highly effective ML algorithm, extensively used in machine learning competitions and hackathons.
* XGBoost has high predictive power and is almost 10 times faster than the other gradient boosting techniques. It also includes a variety of regularization which reduces overfitting and improves overall performance. Hence it is also known as 'regularized boosting' technique.
* XGBoost takes care of the missing values itself, you do not have to impute the missing values.
* Get a clear view of XGBoost: https://www.youtube.com/watch?v=OtD8wVaFm6E and https://www.youtube.com/watch?v=8b1JEDvenQU.

In [15]:
!pip install xgboost

Collecting xgboost
  Using cached xgboost-1.4.2-py3-none-win_amd64.whl (97.8 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.4.2


In [16]:
#XGBoost Regressor

print(rdataset)
import xgboost as xgb
rmodel=xgb.XGBRegressor()
rmodel.fit(rX_train, ry_train)
rmodel.score(rX_test, ry_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        

0.8811209525035884

In [17]:
#XGBoost Classifier

print(cdataset)
cmodel=xgb.XGBClassifier(random_state=1,learning_rate=0.01)
cmodel.fit(cX_train, cy_train)
cmodel.score(cX_test, cy_test)

     Sample code number  Clump Thickness  Uniformity of Cell Size  \
0               1000025                5                        1   
1               1002945                5                        4   
2               1015425                3                        1   
3               1016277                6                        8   
4               1017023                4                        1   
..                  ...              ...                      ...   
678              776715                3                        1   
679              841769                2                        1   
680              888820                5                       10   
681              897471                4                        8   
682              897471                4                        8   

     Uniformity of Cell Shape  Marginal Adhesion  Single Epithelial Cell Size  \
0                           1                  1                            2   
1        



0.9473684210526315