<img src="./images/banner.png" width="800">

# Voting Classifiers

Ensembling is a powerful concept in machine learning that combines multiple models (often referred to as *weak learners* or *base estimators*) to produce a final prediction. **Voting** is one of the simplest yet effective ensemble methods where each model “votes” for a particular outcome, and the aggregated vote forms the final prediction. This approach can often outperform individual models by reducing variance and leveraging the complementary strengths of different learning algorithms.


<img src="./images/voting.png" width="800">

Voting can be intuitive when you consider how committees make decisions. By combining *a variety of opinions*, the group’s decision is often more reliable than any single individual’s perspective. The same principle applies to machine learning:

- **Improved Accuracy**: A mix of models helps reduce overfitting and stabilizes predictions.
- **Model Diversity**: Different algorithms capture different patterns, and their combined synergy can boost performance.


💡 **Tip:** Even simple models like *Logistic Regression* and *Decision Trees* can form a strong ensemble if they contribute *unique error patterns*.


The concept of voting in ensemble methods can be understood in two primary forms: **hard voting** and **soft voting**.


- **Hard Voting**: Each classifier outputs a predicted label, and the final label is chosen by majority vote. Mathematically, if we have classifiers $ C_1, C_2, \ldots, C_k $ producing labels $ \hat{y}_1, \hat{y}_2, \ldots, \hat{y}_k $, then the final label $ \hat{y} $ is:
  $$
  \hat{y} = \operatorname{mode} \{\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_k\}
  $$


- **Soft Voting**: Each classifier outputs a probability distribution over possible labels, and the final label is decided by averaging these probabilities. Specifically, if $ p_{i,j} $ is the predicted probability of the $ i$-th classifier for class $ j $, then the combined probability for class $ j $ is:
  $$
  \bar{p}_j = \frac{1}{k}\sum_{i=1}^{k} p_{i,j}
  $$
  and the predicted class is $ \hat{y} = \arg \max_j \bar{p}_j $.


❗️ **Important Note:** Soft voting often achieves better performance than hard voting *if* probabilities are well-calibrated.


Voting is straightforward to implement and can significantly improve model performance under the right conditions. However, **not all problems benefit equally.** Here are some key points:

- **Advantages**:
  • Simple to implement using libraries like *scikit-learn*.
  • Naturally reduces variance when combining diverse models.
  • Often boosts performance even if individual models are relatively weak.

- **Challenges**:
  • Requires multiple models, increasing computational cost.
  • Performance gains depend on the diversity and quality of base estimators.
  • Hard voting can ignore *confidence information* if models differ in predictive certainty.


Below is a quick illustrative snippet showing how to set up a basic voting ensemble in Python using scikit-learn:


```python
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

# Create base estimators
log_clf = LogisticRegression()
tree_clf = DecisionTreeClassifier()
svm_clf = SVC(probability=True)  # probability=True for soft voting

# Combine them into a voting classifier
voting_clf = VotingClassifier(
    estimators=[
        ('lr', log_clf),
        ('dt', tree_clf),
        ('svm', svm_clf)
    ],
    voting='soft'  # change to 'hard' for majority voting
)

# voting_clf can now be fit and used for predictions like any other scikit-learn model
```


By understanding these foundational ideas of *why* we use voting, *how* it works, and *what* challenges it comes with, you will be well-prepared to explore more advanced ensemble techniques later in this lecture series.

**Table of contents**<a id='toc0_'></a>    
- [Types of Voting Techniques](#toc1_)    
  - [Hard Voting](#toc1_1_)    
  - [Soft Voting](#toc1_2_)    
  - [Weighted Voting](#toc1_3_)    
  - [Choosing the Right Voting Approach](#toc1_4_)    
- [Practical Implementation and Examples](#toc2_)    
  - [Implementation Steps and Workflow](#toc2_1_)    
  - [Common Libraries and Functions](#toc2_2_)    
  - [Basic Voting Example in Python](#toc2_3_)    
  - [Hyperparameter Tuning for Voting](#toc2_4_)    
- [Summary](#toc3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Types of Voting Techniques](#toc0_)

In machine learning, **voting** generally refers to how multiple base estimators (classifiers or regressors) merge their individual predictions into one final output. While the concept is straightforward, there are practical variations that can have a profound impact on performance. In this section, we’ll explore the most common approaches: *Hard Voting*, *Soft Voting*, and *Weighted Voting*, and then discuss **Choosing the Right Voting Approach** based on your problem requirements.


### <a id='toc1_1_'></a>[Hard Voting](#toc0_)

Hard voting is the simplest form of voting. Each classifier predicts a class label, and the **final decision** is made by a majority vote.


For example, if you have three classifiers $(C_1, C_2, C_3)$ outputting the labels $(\hat{y}_1, \hat{y}_2, \hat{y}_3)$, the final prediction $\hat{y}$ becomes:
$$
\hat{y} = \operatorname{mode}\{\hat{y}_1, \hat{y}_2, \hat{y}_3\}.
$$


**Practical Insight:**  
• Hard voting is easy to explain and implement, but since it discards the confidence level of each prediction, it may not always yield the best results.  
• In simple tasks (or when you’re just getting started with ensemble methods), hard voting can be a quick solution that already improves upon individual models.


Below is a brief pseudo-code example of a simple majority voting procedure:


```python
predictions = [model.predict(X_test) for model in models]  # list of label arrays
final_predictions = []

for i in range(len(X_test)):
    # Gather the labels for each sample
    votes = [pred[i] for pred in predictions]
    # Find the mode (most common label)
    final_label = max(set(votes), key=votes.count)
    final_predictions.append(final_label)
```


### <a id='toc1_2_'></a>[Soft Voting](#toc0_)

In contrast to hard voting, **soft voting** takes into account the predicted probabilities from each classifier. Instead of seeing only the final labels, we look at how confident each model is about its predictions.


1. Each model outputs probabilities for each class: $p_{i,j}$ for the $i$-th model and class $j$.  
2. We then average these probabilities to obtain $\bar{p}_j$:  
   $$
   \bar{p}_j = \frac{1}{k}\sum_{i=1}^{k} p_{i,j}
   $$
3. The final class label is the one with the highest average probability:
   $$
   \hat{y} = \arg \max_j \bar{p}_j
   $$


**Practical Insight:**  
• Soft voting usually outperforms hard voting when probabilities are well-calibrated.  
• Implementing it in libraries like scikit-learn simply requires enabling `voting='soft'` and ensuring each model can produce probability estimates (e.g., using `SVC(probability=True)`).


❗️ **Important Note:** If your models are **not** well-calibrated, they might output misleading probabilities, rendering soft voting less effective.


### <a id='toc1_3_'></a>[Weighted Voting](#toc0_)

Weighted voting is an extension of **soft voting**, though it can also be applied in a hard-voting scenario. Instead of giving each model an equal say, you assign weights to each model's prediction to reflect its relative importance or historical performance.


Suppose model $ i $ has weight $ w_i $. In a soft-voting context, the combined probability becomes:
$$
\bar{p}_j = \frac{\sum_{i=1}^{k} w_i \cdot p_{i,j}}{\sum_{i=1}^{k} w_i}.
$$
The final class is still:
$$
\hat{y} = \arg \max_j \bar{p}_j.
$$


**Practical Insight:**  
• Assigning weights can be based on cross-validation scores, domain expertise, or model architecture (e.g., boosting kernels).  
• Properly calibrated weights can lead to a noticeable performance improvement—*but poor weight selection can degrade results*.


Below is a short code snippet illustrating how to specify weights in a scikit-learn `VotingClassifier`:


```python
from sklearn.ensemble import VotingClassifier

voting_clf_weighted = VotingClassifier(
    estimators=[
        ('lr', log_clf),
        ('dt', tree_clf),
        ('svm', svm_clf)
    ],
    voting='soft',
    weights=[1.5, 1, 2]  # custom weights for each model
)
```


### <a id='toc1_4_'></a>[Choosing the Right Voting Approach](#toc0_)

Selecting between **hard voting**, **soft voting**, or **weighted voting** often depends on factors such as data size, model performance, model diversity, and the **confidence calibration** of each base estimator. If your models produce trustworthy probability estimates, soft voting or weighted voting may be ideal. On the other hand, if calibration is challenging or your models are purely outputting class labels without probabilities, hard voting is the simpler and more direct choice.


💡 **Tip:** Start with a simple **soft voting** approach. Then, experiment with weighting strategies if you notice that some of your models consistently perform better than others, or if your domain knowledge suggests certain models should have more influence.

## <a id='toc2_'></a>[Practical Implementation and Examples](#toc0_)

Building and deploying a **voting ensemble** in a real-world scenario involves careful planning, implementation, verification, and optimization. In this section, we will walk through a typical workflow for creating a voting-based ensemble, highlight commonly used libraries, provide hands-on examples, and discuss techniques to tune hyperparameters for best results.


### <a id='toc2_1_'></a>[Implementation Steps and Workflow](#toc0_)

An organized workflow ensures that your voting ensemble is built on strong foundations:

1. **Data Preparation**  
   Gather clean, representative data. Handle missing values, outliers, and perform feature engineering as necessary.  
2. **Model Selection**  
   Choose diverse base estimators that can balance each other’s weaknesses. For classification, you might pick a *Decision Tree*, a *Logistic Regression*, and an *SVM*.  
3. **Voting Mechanism**  
   Decide on either *hard voting*, *soft voting*, or *weighted voting*. This choice often depends on whether your base models can output *probability estimates* and how confident you are in their calibration.  
4. **Implementation & Training**  
   Use libraries like *scikit-learn* to implement a `VotingClassifier` or `VotingRegressor`. Train the voting ensemble on your dataset.  
5. **Evaluation & Validation**  
   Evaluate model performance using metrics like *accuracy*, *F1 score*, or *ROC AUC* for classification, and compare with standalone models. Apply cross-validation to ensure the ensemble’s robustness.  
6. **Hyperparameter Tuning**  
   Employ methods like *Grid Search* or *Randomized Search* to find optimal settings (e.g., weights for weighted voting, or hyperparameters of the base estimators).  
7. **Deployment & Monitoring**  
   Once satisfied with the performance, move the ensemble model into production and continually monitor for *data drift* or performance degradation.


### <a id='toc2_2_'></a>[Common Libraries and Functions](#toc0_)

Python’s machine learning ecosystem provides a **rich set of tools** to facilitate voting ensembles. The most popular one for quick implementation is **scikit-learn**:

- **VotingClassifier** and **VotingRegressor**: Offered as part of `sklearn.ensemble`.  
- **Base Estimators**: Such as `LogisticRegression` (from `sklearn.linear_model`), `DecisionTreeClassifier` (from `sklearn.tree`), or `SVC` (from `sklearn.svm` for classification). For regression, you might use `LinearRegression`, `Ridge`, `SVR`, and so on.  
- **Metrics**: Accuracy (`accuracy_score`), precision (`precision_score`), recall (`recall_score`), F1 (`f1_score`), and regression metrics (e.g., RMSE, MAE) help gauge the ensemble’s performance.  
- **Model Selection Tools**: `GridSearchCV` and `RandomizedSearchCV` can optimize parameters of base estimators and the voting mechanism (like the `weights` parameter).


💡 **Tip:** Before deciding on your final ensemble, test each estimator independently to verify its strengths and limitations. This helps in determining effective weighting and voting strategies.


### <a id='toc2_3_'></a>[Basic Voting Example in Python](#toc0_)

To get a hands-on feel for voting ensembles, let’s build a simple classification system using three different models on the *Iris* dataset:


In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [2]:
# 1. Load and split data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


In [3]:
# 2. Define individual models
log_clf = LogisticRegression(max_iter=1000)  # logistic regression
svc_clf = SVC(probability=True)             # SVM for soft voting
dt_clf  = DecisionTreeClassifier()


In [4]:
# 3. Construct a VotingClassifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('svc', svc_clf), ('dt', dt_clf)],
    voting='soft'
)


In [5]:
# 4. Train the ensemble
voting_clf.fit(X_train, y_train)


In [6]:
# 5. Evaluate
y_pred = voting_clf.predict(X_test)
print("Voting Ensemble Accuracy:", accuracy_score(y_test, y_pred))


Voting Ensemble Accuracy: 1.0


In [7]:
# Let's compare with individual models
for clf in (log_clf, svc_clf, dt_clf):
    clf.fit(X_train, y_train)
    y_pred_indiv = clf.predict(X_test)
    print(clf.__class__.__name__, "Accuracy:", accuracy_score(y_test, y_pred_indiv))

LogisticRegression Accuracy: 1.0
SVC Accuracy: 1.0
DecisionTreeClassifier Accuracy: 1.0


In this snippet:

• We used *soft voting* by enabling `probability=True` in **SVC**.  
• The **VotingClassifier** automatically combines predictions of the three base models.  
• We can compare the ensemble’s accuracy with each individual model to observe the benefits of voting.


### <a id='toc2_4_'></a>[Hyperparameter Tuning for Voting](#toc0_)

**Hyperparameter tuning** is critical to maximize your ensemble’s potential. You can tune both the base estimators’ hyperparameters and the ensemble’s configuration (e.g., weights if you’re using weighted voting). Scikit-learn’s `GridSearchCV` or `RandomizedSearchCV` can handle both levels of optimization.


A typical procedure involves:

1. Defining a parameter grid for each base estimator (e.g., `max_depth` for decision trees, `C` for Logistic Regression or SVM).  
2. Including the ensemble-level parameters like `voting` (hard or soft) or `weights` for each model in the search space.  
3. Using cross-validation to systematically evaluate each parameter combination.


Below is a simplified example of how one might encapsulate the above with `GridSearchCV`:


In [8]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'lr__C': [0.1, 1, 10],
    'svc__C': [0.1, 1, 10],
    'dt__max_depth': [None, 3, 5],
    'weights': [(1,1,1), (1,2,1), (2,1,2)]
}

grid_search = GridSearchCV(
    estimator=voting_clf,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy'
)
grid_search.fit(X_train, y_train)
print("Best params:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)

Best params: {'dt__max_depth': 5, 'lr__C': 10, 'svc__C': 1, 'weights': (1, 2, 1)}
Best score: 0.9619047619047618


❗️ **Important Note:** Be mindful of computational costs, as searching over multiple parameters in multiple models can grow the search space rapidly.


With these core concepts and examples, you should now feel comfortable implementing and experimenting with voting ensembles. Next, we’ll explore real-world applications and case studies to see how voting can shine in more complex scenarios.

## <a id='toc3_'></a>[Summary](#toc0_)

Voting techniques provide a straightforward yet powerful way to enhance predictive performance by combining multiple models. Whether you use simple **hard voting** for quick majority decisions or more nuanced **soft voting** and **weighted voting** to incorporate confidence levels and model contributions, the ensemble can often outperform individual estimators. Here are some key takeaways from this lecture:

- **Versatility**: Voting can be applied to classification and regression tasks alike, thanks to libraries like scikit-learn’s `VotingClassifier` and `VotingRegressor`.  
- **Simple but Effective**: Even a handful of base estimators (e.g., *Logistic Regression*, *Decision Tree*, *SVM*) can show remarkable improvements when combined.  
- **Importance of Calibration**: Models that output well-calibrated probabilities can significantly benefit from **soft voting**.  
- **Weighting for Fine-Tuning**: Weighted voting allows you to give certain models a stronger say, reflecting their relative performance or domain importance.  
- **Hyperparameter Tuning**: Optimize both individual models and ensemble-level parameters (like weights) to get the most out of your voting approach.


💡 **Tip:** While voting techniques are often overshadowed by more advanced ensemble methods like **Bagging** or **Boosting**, they provide a simple and effective starting point to increase predictive performance without an exhaustive amount of feature engineering or model tinkering.


Here are some recommended resources:
- [Ensemble Methods in scikit-learn: Official Documentation](https://scikit-learn.org/stable/modules/ensemble.html)  
- [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)  
- [Medium / Towards Data Science articles on Ensemble Methods](https://towardsdatascience.com/search?q=ensemble%20methods)  
- [Kaggle for real-world examples of Ensemble Methods in competitions](https://www.kaggle.com)


With voting ensembles under your belt, you are well-prepared to explore more sophisticated methods like **Bagging**, **Random Forests**, **Boosting**, and **Stacking**—all of which expand on the idea of leveraging multiple models to generate superior predictions.