# Ensemble Learning Algorithm

    Ensemble model in Machine Learning operate on multiple models to improve the overall performance.
    
**This can be achieved in various ways:-**

## 1). Max Voting:

    a). The max voting method is generally used for classification problem.
    b). In this technique multiple models are used to make predictions for each data points.
    c). The prediction by each model are considered as vote.
    d). The prediction which we get from the majority of the models are useful as the final prediction.
    
    Example:- When you ask your friends to suggest you for a mobile phone, there will be multiple options being suggested,
                
                Friend 1 -> Iphone
                Friend 2 -> Oppo
                Friend 3 -> One plus
                Friend 4 -> One Plus
                Friend 5 -> Red Mi
                
    - In this case the maximum vote is for One Plus hence the decision of purchasing one plus is taken by you.

### Hard Voting Vs Soft Voting

**Hard Voting:-**

    - In classification a voting ensemble involves making a prediction.
    - A Hard voting ensemble involves summing the votes for crisp and labelled data.
    - It also involves summing up all the votes and predicting with the most vote.
    - Typically used in numerical & categorical data.

**Soft Voting:-**

    - In soft voting ensemble involves summing up the predicted probablities.
    - Typically used in classed labeled data.
    - It predicts class with the largest summed of probablity for the model.

### Example : Max Voting

In [1]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier

In [2]:
iris = load_iris()
X = iris.data[:,1:3]
Y = iris.target

In [3]:
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = KNeighborsClassifier()
clf4 = GaussianNB()
clf5 = DecisionTreeClassifier(random_state=1)

In [4]:
labels = ['Logistic', 'Random Forest', 'KNN', 'Gaussian' ,'Decision Tree']
for clf,label in zip([clf1, clf2, clf3, clf4, clf5], labels):
    score = cross_val_score(clf, X, Y, cv=5, scoring='accuracy')
    
    print("Accuracy Score: %0.2f (+/- %0.2f)[%s]"%(score.mean(), score.std(),label))
    

Accuracy Score: 0.95 (+/- 0.04)[Logistic]
Accuracy Score: 0.94 (+/- 0.04)[Random Forest]
Accuracy Score: 0.95 (+/- 0.04)[KNN]
Accuracy Score: 0.91 (+/- 0.04)[Gaussian]
Accuracy Score: 0.91 (+/- 0.03)[Decision Tree]


In [5]:
voting_clf_hard = VotingClassifier(estimators=[
    (labels[0],clf1),
    (labels[1],clf2),
    (labels[2],clf3),
    (labels[3],clf4),
    (labels[4],clf5)],voting='hard')

voting_clf_soft = VotingClassifier(estimators=[
    (labels[0],clf1),
    (labels[1],clf2),
    (labels[2],clf3),
    (labels[3],clf4),
    (labels[4],clf5)],voting='soft')

In [6]:
labels_new = ['Logistics','Random Forest', 'KNN', 'Gaussian', 'Decision Tree', 'Hard Voting','Soft Voting']
for (clf,label) in zip([clf1, clf2, clf3, clf4, clf5, voting_clf_hard, voting_clf_soft],labels_new):
    scores = cross_val_score(clf, X, Y, cv=5, scoring='accuracy')
    
    print("Accuracy Score: %0.2f (+/- %0.2f)[%s]"%(scores.mean(),scores.std(),label))

Accuracy Score: 0.95 (+/- 0.04)[Logistics]
Accuracy Score: 0.94 (+/- 0.04)[Random Forest]
Accuracy Score: 0.95 (+/- 0.04)[KNN]
Accuracy Score: 0.91 (+/- 0.04)[Gaussian]
Accuracy Score: 0.91 (+/- 0.03)[Decision Tree]
Accuracy Score: 0.95 (+/- 0.04)[Hard Voting]
Accuracy Score: 0.95 (+/- 0.04)[Soft Voting]


## 2). Averaging:

    a). Similar to Max voting technique, multiple predictions are made for each data points in averaging.
    b). In this method we take an average of predictions from all the models and use it to make final predicitions.
    c). Averaging can be used for making predicitons in regression problem or while calculating the probablities for
        classification problems.
    
    Example:- You have asked for opinion to purchase mobile phone from 5 friends,
        
                F1 -> IPhone
                F2 -> Red Mi
                F3 -> One Plus
                F4 -> One Plus
                F5 -> Oppo
                
    - Here the average would be take:-
    
                (1 + 1 + 2 + 1)/5
    
    - And the final value would be the predictive value for the model.

### Example : Average Technique

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X.shape

(150, 2)

In [9]:
Y.shape

(150,)

In [10]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

In [11]:
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

In [12]:
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = KNeighborsClassifier()
clf4 = GaussianNB()

In [13]:
clf1.fit(X_train, Y_train)
clf2.fit(X_train, Y_train)
clf3.fit(X_train, Y_train)
clf4.fit(X_train, Y_train)

GaussianNB()

In [14]:
pred1 = clf1.predict_proba(X_test)
pred2 = clf2.predict_proba(X_test)
pred3 = clf3.predict_proba(X_test)
pred4 = clf4.predict_proba(X_test)

In [15]:
avg_pred = (pred1 + pred2 + pred3)/3

In [16]:
avg_pred

array([[3.12350634e-04, 1.04379934e-01, 8.95307715e-01],
       [1.03183775e-02, 9.55733646e-01, 3.39479764e-02],
       [9.95859921e-01, 4.14003981e-03, 3.89841561e-08],
       [5.81762118e-07, 1.26046086e-02, 9.87394810e-01],
       [9.88272975e-01, 1.17268379e-02, 1.87143969e-07],
       [3.33850649e-03, 1.42575582e-02, 9.82403935e-01],
       [9.93268542e-01, 6.73140319e-03, 5.43266375e-08],
       [8.99935240e-03, 7.89636846e-01, 2.01363802e-01],
       [1.13878420e-03, 4.87553330e-01, 5.11307886e-01],
       [1.29232540e-02, 9.66617805e-01, 2.04589406e-02],
       [1.89470950e-05, 2.59611312e-02, 9.74019922e-01],
       [8.39371275e-03, 8.98879800e-01, 9.27264873e-02],
       [1.66164092e-03, 6.97460578e-01, 3.00877781e-01],
       [2.36134051e-03, 8.02217537e-01, 1.95421122e-01],
       [1.86097274e-03, 7.12657901e-01, 2.85481127e-01],
       [9.92359857e-01, 7.64005902e-03, 8.35284249e-08],
       [3.65264071e-03, 8.85346997e-01, 1.11000362e-01],
       [3.58341073e-03, 8.03657

## 3). Weighted Average:

    a). Weighted Average Technique is an extension of the average method.
    b). All models are assigned different weights, defining the importance of each model for prediction.
    
    - For instance if 4 of your friends have prior experience in using mobile phone and while 3 of them have no prior
      experience of using mobile phone.
    - In this scenario the opinion of 4 friends who have experience are weighted higher as compared to the rest of the 3.
        
            F1 - IPhone
            F2 - Redmi
            F3 - Oppo
            F4 - Vivo
            F5 - Samsung
            F6 - One Plus
            F7 - One Plus
            
                  F1   F2   F3   F4   F5   F6   F7
        Weight:  0.23 0.23 0.23 0.23 0.21 0.10 0.10
        Rating:   1    2    2    3    4    5    5

### Example : Weighted Avarage

In [17]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

In [18]:
lr = LogisticRegression()
dtc = DecisionTreeClassifier()
knn = KNeighborsClassifier()

In [19]:
lr.fit(X_train, Y_train)
dtc.fit(X_train, Y_train)
knn.fit(X_train, Y_train)

KNeighborsClassifier()

In [20]:
voting = VotingClassifier(estimators=[
    ('lr',lr),
    ('dtc',dtc),
    ('knn',knn)
])

In [21]:
lr_pred = lr.predict_proba(X_test)
dtc_pred = dtc.predict_proba(X_test)
knn_pred = knn.predict_proba(X_test)

In [22]:
labels_new = ['Logistics', 'KNN', 'Decision Tree', 'Voting']
for (clf,label) in zip([lr, knn, dtc, voting],labels_new):
    scores = cross_val_score(clf, X, Y, cv=5, scoring='accuracy')
    
    print("Accuracy Score: %0.2f (+/- %0.2f)[%s]"%(scores.mean(),scores.std(),label))

Accuracy Score: 0.95 (+/- 0.04)[Logistics]
Accuracy Score: 0.95 (+/- 0.04)[KNN]
Accuracy Score: 0.91 (+/- 0.03)[Decision Tree]
Accuracy Score: 0.95 (+/- 0.04)[Voting]


     Model      Score      Weights
     
    LogisticR    0.95        0.4
      KNN        0.95        0.4
    DecisionT    0.93        0.2
                           = 1.0

In [23]:
w_avg_pred = (lr_pred*0.4 + dtc_pred*0.2 + knn_pred*0.4)

In [24]:
w_avg_pred

array([[3.74820761e-04, 1.13255921e-01, 8.86369258e-01],
       [8.38205301e-03, 9.62880375e-01, 2.87375717e-02],
       [9.95031905e-01, 4.96804777e-03, 4.67809873e-08],
       [6.98114541e-07, 3.12553028e-03, 9.96873772e-01],
       [9.85927570e-01, 1.40722055e-02, 2.24572763e-07],
       [6.20779225e-06, 9.10906982e-03, 9.90884722e-01],
       [9.91922251e-01, 8.07768382e-03, 6.51919650e-08],
       [2.79922288e-03, 7.63564215e-01, 2.33636562e-01],
       [1.36654104e-03, 4.45063996e-01, 5.53569463e-01],
       [1.55079048e-02, 9.59941366e-01, 2.45507288e-02],
       [2.27365140e-05, 2.71533574e-02, 9.72823906e-01],
       [6.07245530e-03, 8.98655760e-01, 9.52717848e-02],
       [1.99396911e-03, 6.76952694e-01, 3.21053337e-01],
       [2.83360862e-03, 7.86661045e-01, 2.10505347e-01],
       [2.23316729e-03, 6.79189481e-01, 3.18577352e-01],
       [9.90831829e-01, 9.16807083e-03, 1.00234110e-07],
       [4.38316885e-03, 8.94416396e-01, 1.01200435e-01],
       [4.30009287e-03, 8.32388

# Advanced Technique In Ensemble Learning Algorithm

## 1) Stacking:-

    a). Stacking also known as "Stacked Generalization" is an ensemble technique that combines multiple classification or
        regression model via meta-classifier or meta-regressor.
    b). The base level model are trained on a complete training set then the meta model id trained on the features that are
        output of the base level model.
    c). The base level often consists of different learning algorithm and therefore stacking algorithm is often
        heterogenous(different).

**Meta Classifier:-**
        
    - It is a classifier that makes a final prediciton among all the predictions by using those predictions as a feature.
    - It takes classes by various classifiers and pick the final one as the result.
        
**Meta Regressor:-**
        
    - Meta regressor is defined to be a meta analysis to combine, compare and synthesize research findings.
    - A meta regressor analysis aim to reconcile conflicting studies or colaborate consisting once.
    - It combines the data of multiple studies to identify overall trend of the data.

## 2) Bagging:-

    a). Bagging is an Ensemble Learning technique which aims to reduce the error learning through the implementation of a
        set of homogeneous machine learning algorithms.
    b). The key idea of bagging is the use of multiple base learners which are trained seperately with a random sample from
        the training set, which is through voting or averaging approach which produces more stable and accurate model.

**There are 2 main components of bagging technique:-**

     i). Random sampling with replacement (bootstraping),
    ii). The set of homogeneous machine learning algorithm (ensemble learning)

### The bagging process involves the following steps:

    1). It is extracted "n" subsets from the training set,
    2). These subsets are use to train "n" base learners of same type,
    3). For making predictions each one of the "n" learners are feed with the test sample,
    4). The output of each learner is averaged (in case of regression) or voted (in case of classification).


### Example : Bagging

In [25]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

In [26]:
x, y = load_breast_cancer(return_X_y=True)

In [27]:
x.shape

(569, 30)

In [28]:
y.shape

(569,)

In [29]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=23)

In [30]:
tree = DecisionTreeClassifier(max_depth=3, random_state=23)

In [31]:
bagging = BaggingClassifier(base_estimator=tree, n_estimators=5, max_samples=50, bootstrap=True)

# base_estimator : Decision Tree,
# n_estimators   : It will create 5 subsets to train 5 Decision Tree Models,
# max_samples    : It will take randomly 50 items with replacement,
# bootstrap      : It means that the sample will be a replacement

In [32]:
bagging.fit(x_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(max_depth=3,
                                                        random_state=23),
                  max_samples=50, n_estimators=5)

In [33]:
print("Training Data Accuracy :", bagging.score(x_train, y_train))
print("Testing Data Accuracy :", bagging.score(x_test, y_test))

Training Data Accuracy : 0.9366197183098591
Testing Data Accuracy : 0.9440559440559441


## 3) Boosting:-

    a). Boosting is an Ensemble Learning Technique similar to bagging.
    b). It typically makes use of base learners to imporve the stability and effectiveness of a ML model.
    c). The main concept behind a boosting technique is the generation of sequential hypotheses, where each hypothesis
        tries to improve or correct the mistake (error) made.
    d). The main idea of boosting is the implementation of homogeneous ML algorithm in a sequential way, where each of
        these ML algorithm tries to improve the stability of the model by focusing on the error made by previous Machine
        Learning Algorithm.
    e). The process in which the error of each base learner is considered to be improved with the next base learner in the
        sequence is the key for boosting technique.
        
**Most commonly type of boosting technique used are:-**

     i). ADA Boost (Adaptive Boosting),
    ii). Gradient Boosting

## ADAPTIVE BOOSTING:

    1). AdaBoost is a algorithm based on the boosting technique.
    2). AdaBoost implements a vector of weight to penalize those samples that were incorrectly predicted (by increasing the
        weight) and reward those that were correctly predicted (by decreasing their weight).
    3). Each base learner in the sequence will have assigned a weight the higher the performance the heigher the weight and
        greater the impact of this base learner for the final decision.
    4). To make prediction each base learner in the sequence will be fed with test data, each of prediction will be voted
        (for classification) OR averaged (in case of regression).

### Example : ADA Boost

In [35]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import AdaBoostClassifier

In [36]:
iris = load_iris()

In [37]:
x = iris.data

In [38]:
y = iris.target

In [39]:
x.shape

(150, 4)

In [40]:
y.shape

(150,)

In [62]:
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2, random_state=23)

In [63]:
X_train.shape

(120, 4)

In [64]:
X_test.shape

(30, 4)

In [65]:
# creating ADA BOOST Model

AdaModel = AdaBoostClassifier(n_estimators=100, learning_rate=1)

In [66]:
AdaModel.fit(X_train, Y_train)

AdaBoostClassifier(learning_rate=1, n_estimators=100)

In [67]:
Y_pred = AdaModel.predict(X_test)

In [68]:
Y_test

array([2, 2, 1, 0, 2, 1, 0, 2, 0, 1, 1, 0, 2, 0, 0, 2, 1, 1, 2, 0, 2, 0,
       0, 0, 2, 0, 0, 2, 1, 1])

In [69]:
Y_pred

array([2, 2, 1, 0, 2, 1, 0, 2, 0, 1, 1, 0, 2, 0, 0, 1, 1, 1, 2, 0, 2, 0,
       0, 0, 2, 0, 0, 2, 1, 1])

In [70]:
print("Accuracy is:",accuracy_score(Y_test, Y_pred))

Accuracy is: 0.9666666666666667


#### using Support Vector Machine model

In [71]:
from sklearn.svm import SVC

In [77]:
svc = SVC(probability=True, kernel='linear')

In [78]:
adb = AdaBoostClassifier(base_estimator=svc, n_estimators=100, learning_rate=1)

In [79]:
adb.fit(X_train, Y_train)

AdaBoostClassifier(base_estimator=SVC(kernel='linear', probability=True),
                   learning_rate=1, n_estimators=100)

In [80]:
Y_pred1 = adb.predict(X_test)

In [81]:
print("Using SVC Accuracy is:",accuracy_score(Y_test, Y_pred1))

Using SVC Accuracy is: 0.9666666666666667


## GRADIENT BOOSTING:

    1). The Gradient boosting method does not implement a vector of weights like Adaboost does.
    2). As the name implies, it does the calculation of the gradient for the optimization of the model.
    3). Each base learner added to the sequence will minimize the residuals (errors) determined by the previous base
        learners.
    4). To make prediction each base learner in the sequence will be fed with test data, each of prediction will be voted
        (for classification) OR averaged (in case of regression).

### Example : Gradient Boost

In [82]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier

In [83]:
X, Y = load_breast_cancer(return_X_y=True)

In [84]:
X.shape

(569, 30)

In [85]:
Y.shape

(569,)

In [87]:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size=0.25, random_state=30)

In [88]:
X_Train.shape

(426, 30)

In [89]:
X_Test.shape

(143, 30)

In [90]:
gbmodel = GradientBoostingClassifier(n_estimators=5, learning_rate=1, max_depth=2,random_state=30)

# base learning algorithm is Decision Tree by default
# n_estimator is the number of subset
# depth for each decision tree is 2
# learning rate for each estimator in the sequence is 1

In [91]:
gbmodel.fit(X_Train, Y_Train)

GradientBoostingClassifier(learning_rate=1, max_depth=2, n_estimators=5,
                           random_state=30)

In [92]:
Y_Pred = gbmodel.predict(X_Test)

In [98]:
print("Accuracy Score:",accuracy_score(Y_Test, Y_Pred))

Accuracy Score: 0.9370629370629371


In [101]:
Y_Test - Y_Pred

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,
        0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  1, -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
       -1,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,
        0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0])

In [102]:
from sklearn.metrics import confusion_matrix, classification_report

In [103]:
confusion_matrix(Y_Test, Y_Pred)

array([[46,  6],
       [ 3, 88]], dtype=int64)

In [105]:
print(classification_report(Y_Test, Y_Pred))

              precision    recall  f1-score   support

           0       0.94      0.88      0.91        52
           1       0.94      0.97      0.95        91

    accuracy                           0.94       143
   macro avg       0.94      0.93      0.93       143
weighted avg       0.94      0.94      0.94       143



In [110]:
Y_Train - gbmodel.predict(X_Train)

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0

In [111]:
confusion_matrix(Y_Train, gbmodel.predict(X_Train))

array([[158,   2],
       [  1, 265]], dtype=int64)

In [112]:
print(classification_report(Y_Train, gbmodel.predict(X_Train)))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       160
           1       0.99      1.00      0.99       266

    accuracy                           0.99       426
   macro avg       0.99      0.99      0.99       426
weighted avg       0.99      0.99      0.99       426

