# Voting Classifier

A collection of several models working together on a single set is called an ensemble. The method is called ensemble learning. It is much more useful use all defferent models rather than any one.

Voting is one of the simplest way of combining the predictions from multiple machine learning algorithms. Voting classifier isn'n an actual classifier but wrapper for set of different ones that are trained and valuatd in parallel in order to exploit the different peculiarities of each algorithm.

We can train data set using different algorithms and ensemble then to predict the final output. The final output on a prediction is taken by majority vote according to two different strategies:

**Hard voting/Majority voting:** Hard voting is the simplest case of majority voting. In this case, the class that received the highest number of votes will be chosen. 

**Soft voting:** In this case, the probability vector for each predicted class (for all classifiers) are summed up & averaged. The winning class is the on corresponding to the highest value (only recommended if the classifiers are well calibrated).

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples = 500, noise = 0.30, random_state = 42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

In [2]:
X_train.shape, X_test.shape

((375, 2), (125, 2))

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC


# hard voting

log_clf = LogisticRegression(solver = "lbfgs")
rf_clf = RandomForestClassifier()
svm_clf = SVC(gamma = "scale")

voting_clf = VotingClassifier(
    estimators = [('lr', log_clf), ('rf', rf_clf), ('svc', svm_clf)],
    voting = 'hard'
)

In [4]:
voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()), ('svc', SVC())])

In [5]:
from sklearn.metrics import accuracy_score

for clf in (log_clf, rf_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.904
SVC 0.896
VotingClassifier 0.912


In [6]:
# soft voting

log_clf = LogisticRegression(solver = "lbfgs")
rf_clf = RandomForestClassifier()
svm_clf = SVC(gamma = "scale", probability = True)

voting_clf = VotingClassifier(
    estimators = [('lr', log_clf), ('rf', rf_clf), ('svc', svm_clf)],
    voting = 'soft'
)

In [7]:
voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()),
                             ('svc', SVC(probability=True))],
                 voting='soft')

In [8]:
for clf in (log_clf, rf_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.888
SVC 0.896
VotingClassifier 0.912


Ensemble methods work best when the predictors are as independent from one another as possible. One way to get diverse classifiers is to train them using very different algorithms. This increase the chance that they will make very different types of errors, improving the ensemble's accuracy.