Ensemble method is used to combine different classifiers into **meta-classifier** that has a **better generalization** performance than each individual classifier alone.

## Majority Voting

Majority voting simply means that we select the class labels that has got more than **50 percent** of the votes.Majority vote refers to **binary class** settings, but it is easy to generalize the majority voting principle to the **multi-class** settings which is called **plural voting**.

## Implemenation of Majority Voting:

In [94]:
#imports
import pandas as pd 
import numpy as np 
from sklearn import datasets
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline

In [95]:
#We will use the iris data set
iris = datasets.load_iris()


In [96]:
#We select sepal width and petal length, for iris-versicolor and Iris-verginica
X = iris.data[50:,[1,2]]
y = iris.target[50:]


In [97]:
# To encode the values in 0 and 1 form rather than 1 and 2
le = LabelEncoder()
y =le.fit_transform(y)
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [98]:
#Spliting the data into training and test set
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.5, stratify =y ,random_state=1)

In [99]:
#Now we will train three different Classifier
clf1 = LogisticRegression(penalty='l2', C=0.001, random_state= 1)

In [101]:
#Decision tree classifier
clf2 = DecisionTreeClassifier(max_depth=1, criterion='entropy',random_state=0)

In [102]:
#K-NeighboursClassifer
#clf3 = KNeighborsClassifier(n_neighbors=2,p=2,metric='minkowski')

In [103]:
#Now we put all the classifiers in the pipeline
pipe1 = Pipeline([['sc' ,StandardScaler()],
                 ['clf' ,clf1]])

**Note**: No need to make the pipeline for the decision tree because we dont need to standardize the data for **tree classifier** 

In [111]:
pipe3 =Pipeline([['sc' ,StandardScaler()],
                 ['clf' ,clf3]])

In [120]:
class_labels = ['LogisticRegression', 'Decision Tree']

In [121]:
print('10-fold cross validation:\n')

10-fold cross validation:



In [122]:
for clf , label in zip([pipe1,clf2],class_labels):
    scores = cross_val_score(estimator=clf,
                            X =X_train,
                            y=y_train,
                            cv =10,
                            scoring='roc_auc')
    print("Roc AUC: %0.2f (+/- %0.2f) [%s]"
         % (scores.mean(),scores.std(),label))

Roc AUC: 0.87 (+/- 0.17) [LogisticRegression]
Roc AUC: 0.89 (+/- 0.16) [Decision Tree]




In [131]:
#Majority Voting 
from sklearn.ensemble import VotingClassifier
mv_clf =VotingClassifier(estimators=[('lr', pipe1), ('DT', clf2)],voting='soft')
print(mv_clf)

VotingClassifier(estimators=[('lr', Pipeline(memory=None,
     steps=[['sc', StandardScaler(copy=True, with_mean=True, with_std=True)], ['clf', LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalt...         min_weight_fraction_leaf=0.0, presort=False, random_state=0,
            splitter='best'))],
         flatten_transform=None, n_jobs=None, voting='soft', weights=None)


In [132]:
class_labels += ['VotingClassifier']
all_clf =[pipe1,clf2,mv_clf]
for clf , label in zip(all_clf,class_labels):
    scores =cross_val_score(estimator=clf,
                            X =X_train,
                            y=y_train,
                            cv=10,
                            scoring='roc_auc')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]"
         % (scores.mean(),scores.std(),label))

Accuracy: 0.87 (+/- 0.17) [LogisticRegression]
Accuracy: 0.89 (+/- 0.16) [Decision Tree]
Accuracy: 0.92 (+/- 0.14) [VotingClassifier]




In [130]:
#class_labels.remove('VotingClassifier')