# Combining Different Models Into a Voting Classifier

So far, we saw how to combine different instances of the same classifier or regressor into an
ensemble. In this chapter, we are going to take this idea a step further and combine
conceptually different classifiers into what is known as a **voting classifier**.

The idea behind voting classifiers is that the individual learners in the ensemble don't
necessarily need to be of the same type. After all, no matter how the individual classifiers
arrived at their prediction, in the end, we are going to apply a decision rule that integrates
all the votes of the individual classifiers. This is also known as a **voting scheme**.

Two different voting schemes are common among voting classifiers:
- In **hard voting** (also known as **majority voting**), every individual classifier votes for a class, and the majority wins. In statistical terms, the predicted target label of the ensemble is the mode of the distribution of individually predicted labels.
- In **soft voting**, every individual classifier provides a probability value that a specific data point belongs to a particular target class. The predictions are weighted by the classifier's importance and summed up. Then the target label with the greatest sum of weighted probabilities wins the vote.

You can find an example of how these voting schemes work in practice in the book.

## Implementing a Voting Classifier

Let's look at a simple example of a voting classifier that combines three different algorithms:
- A logistic regression classifier from [Chapter 3](../Chapter03/03.00-First-Steps-in-Supervised-Learning.ipynb), *First Steps in Supervised Learning*
- A Gaussian naive Bayes classifier from [Chapter 7](../Chapter07/07.00-Implementing-a-Spam-Filter-with-Bayesian-Learning.ipynb), *Implementing a Spam Filter with Bayesian Learning*
- A random forest classifier from this chapter

We can combine these three algorithms into a voting classifier and apply it to the breast
cancer dataset with the following steps.

Load the dataset, and split it into training and test sets:

We start by importing a module called `warnings` which will hide all the warnings that might appear while executing a certain cell. The warnings might appear due to an old package being used or a slight mismatch in module dependencies.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [27]:
from sklearn.datasets import load_breast_cancer
iris = load_breast_cancer()
X = iris.data
y = iris.target

In [28]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=13)

Instantiate the individual classifiers:

In [29]:
from sklearn.linear_model import LogisticRegression
model1 = LogisticRegression(random_state=13)

In [33]:
from sklearn.naive_bayes import GaussianNB
model2 = GaussianNB()

In [34]:
from sklearn.ensemble import RandomForestClassifier
model3 = RandomForestClassifier(random_state=13)

Assign the individual classifiers to the voting ensemble. Here, we need to pass a
list of tuples (`estimators`), where every tuple consists of the name of the
classifier (a string of letters depicting a short name of each classifier) and the model object. The voting scheme can
be either `voting='hard'` or `voting='soft'`. For now, we will choose `voting='hard'`:

In [44]:
from sklearn.ensemble import VotingClassifier
vote = VotingClassifier(estimators=[('lr', model1),('gnb', model2),('rfc', model3)],voting='hard')

Fit the ensemble to the training data and score it on the test data:

In [45]:
vote.fit(X_train, y_train)
vote.score(X_test, y_test)

0.951048951048951

In order to convince us that 95.1% is a great accuracy score, we can compare the ensemble's
performance to the theoretical performance of each individual classifier. We do this by
fitting the individual classifiers to the data. Then we will see that the logistic regression
model achieves 94.4% accuracy on its own:

In [46]:
model1.fit(X_train, y_train)
model1.score(X_test, y_test)

0.9440559440559441

Similarly, the naive Bayes classifier achieves 93.0% accuracy:

In [47]:
model2.fit(X_train, y_train)
model2.score(X_test, y_test)

0.9300699300699301

Last but not least, the random forest classifier also achieved 94.4% accuracy:

In [48]:
model3.fit(X_train, y_train)
model3.score(X_test, y_test)

0.9440559440559441

All in all, we were just able to gain a good percent in performance by combining three
unrelated classifiers into an ensemble. Each of these classifiers might have made different
mistakes on the training set, but that's OK because on average, we need just two out of three
classifiers to be correct.