<a href="https://colab.research.google.com/github/rida-manzoor/ML/blob/main/Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The **ensemble methods** in machine learning combine the insights obtained from multiple learning models to facilitate accurate and improved decisions. These methods follow the same principle.

In learning models, noise, variance, and bias are the major sources of error. The ensemble methods in machine learning help minimize these error-causing factors, thereby ensuring the accuracy and stability of machine learning (ML) algorithms.

## Basic Techniques

1. **Max Voting**

  In Max Voting, multiple models make predictions, and the final prediction is determined by the majority vote.

  **Example:** Suppose you have three models A, B, and C. If models A and B predict class 1, and model C predicts class 0, the Max Voting ensemble would choose class 1 as the final prediction.

2. **Averaging**

  Averaging involves aggregating predictions by taking the average of individual model predictions.

  **Example:** If models A, B, and C predict values of 0.8, 0.7, and 0.9, respectively, the averaged prediction would be
  
  $
  \frac{0.8 + 0.7 + 0.9}{3}
  = 0.8
  $

3. **Weighted Average**

  Similar to averaging, but each model's prediction is weighted differently.

  **Example:** If models A, B, and C have weights 0.3, 0.4, and 0.3, respectively, the weighted average would be

  $
  0.3 * pred_A + 0.4 * pred_B + 0.3 * pred_C
  $

4. **Rank Average**

  In Rank Average, predictions are ranked, and the final prediction is determined by the average rank.
  
  **Example:** If models A, B, and C rank an item 2nd, 1st, and 3rd, respectively, the average rank would be
  
   $
   \frac{2 +1 +3}{3}
  = 2
  $

## Advance Techniques

1. **Stacking**

  Stacking is a widely used ensemble machine learning technique designed to enhance model performance by combining the predictions of multiple base models to create a new model. This approach involves training several models to address similar problems and leveraging their collective output to construct a more effective model.

**Key Components of Stacking:**

1. **Input and Output:**
   - The algorithm takes the outputs of sub-models (base models) as input and endeavors to learn how to optimally combine these input predictions to generate an improved output prediction.

2. **Model Composition:**
   - Stacking, also known as stacked generalization, extends the Model Averaging Ensemble technique. It involves multiple sub-models, each contributing to the new model according to their performance weights. The new model is stacked on top of the others, which gives stacking its name.

   ![Stacking Architecture](https://editor.analyticsvidhya.com/uploads/39725Stacking.png)

**Architecture of a Stacking Model:**

The architecture of a stacking model comprises:

1. **Level-0 Models (Base-Models):**
   - These are models fitted on the training data, and their predictions are compiled.

2. **Level-1 Model (Meta-Model):**
   - The meta-model learns how to optimally combine the predictions of the base models to produce an enhanced final prediction.

**Training Process:**

- The meta-model is trained on predictions made by the base models on out-of-sample data.
- Out-of-sample data, not used to train the base models, is fed to the base models for prediction.
- The resulting predictions, along with the expected outputs, form the input and output pairs for training the meta-model.

**Preparing Training Dataset for Meta-Model:**

- Commonly, k-fold cross-validation of the base models is employed to create the training dataset for the meta-model.
- Out-of-fold predictions serve as the basis for the meta-model's training dataset.

**Data Input to Meta-Model:**

- The inputs to the base models (e.g., elements of the training data) may be included in the training data for the meta-model.
- This provides additional context to the meta-model on how to effectively combine predictions from the base models.

**Training Process:**

- Once the training dataset is prepared, the meta-model is trained independently on this dataset.
- The base models are trained on the entire original training dataset.

**Note:** The output from base models used as input to the meta-model may vary, such as real values for regression or probability values/class labels for classification. This flexibility makes stacking suitable for various problem types.


  



3. **Bootstrap Sampling**

4. **Bagging**

5. **Boosting**

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a set of base models
base_models = [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
               ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42)),
               ('svc', SVC(probability=True))]

# Create a StackingClassifier with a meta-model (Logistic Regression)
stacking_model = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())

# Fit the stacking model on the training data
stacking_model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = stacking_model.predict(X_test)

# Evaluate the stacking model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Accuracy: 1.00


2. **Blending**

Blending is one of the many ensemble machine-learning techniques that make use of a machine-learning model to figure out how to blend predictions from several ensemble member models in the most effective way. It is similar to Stacking (another ensemble learning technique). Hence it might get interchanged in some illustrations like magazines and research papers.

The architecture of a blending and most of the ensemble learning models consists of two or more base models, also known as level-0 models, and a meta-model, or level-1 model, which integrates the predictions of the base models. The meta-model (main model) is trained on the predictions made by the base models on out-of-sample data.

4. **Bootstrap sampling**

  is a resampling technique that involves generating multiple datasets by randomly sampling with replacement from the original dataset. The term "bootstrap" is derived from the phrase "pulling oneself up by one's bootstraps," and in statistics, it refers to the idea of creating new samples from the observed data to estimate the properties of a population.

Here are the key steps involved in bootstrap sampling:

1. **Original Dataset:**
   - Begin with a dataset containing \(n\) observations.

2. **Random Sampling with Replacement:**
   - Draw \(n\) samples from the dataset with replacement. This means that each observation in the original dataset has an equal chance of being selected in each draw, and it's possible for the same observation to be selected multiple times.

3. **Create Bootstrap Sample:**
   - The set of \(n\) observations obtained through random sampling with replacement constitutes a bootstrap sample. This sample is typically the same size as the original dataset.

4. **Repeat:**
   - Repeat the process (steps 2 and 3) multiple times to create multiple bootstrap samples. The number of iterations is determined by the desired number of bootstrap samples.

5. **Estimation and Inference:**
   - Use each bootstrap sample to estimate population parameters or assess the variability of a statistic. For example, compute the mean, variance, or confidence intervals based on the bootstrap samples.

The main purpose of bootstrap sampling is to approximate the distribution of a statistic or parameter by resampling from the observed data. This technique is particularly useful when analytical methods for estimating the distribution are challenging or unavailable.

Key points about bootstrap sampling:

- **Resampling with Replacement:** Each observation in the original dataset has an equal probability of being selected in each iteration. This allows for the creation of diverse bootstrap samples.

- **Size of Bootstrap Sample:** The size of each bootstrap sample is typically the same as the size of the original dataset (\(n\)).

- **Applications:** Bootstrap sampling is widely used for estimating standard errors, confidence intervals, and making statistical inferences in situations where the underlying distribution is not well-known or assumptions are violated.

- **Bootstrap Confidence Intervals:** Bootstrap samples can be used to compute confidence intervals for parameters or statistics. This provides a non-parametric approach to statistical inference.

Bootstrap sampling is a versatile and powerful technique in statistics and machine learning, especially when dealing with small or complex datasets. It is commonly used in conjunction with methods like bootstrapped aggregation (bagging) and in the estimation of uncertainty in predictive modeling.

3. **Bootstrap Aggregation (Bagging)**
  
  is an ensemble machine learning technique that aims to improve the stability and accuracy of models by combining multiple base models trained on different subsets of the training data. It is particularly effective in reducing overfitting and variance in the predictions. The most common application of bagging is in constructing random forests, although it can be applied to other base learners as well.

  ![alt](https://www.simplilearn.com/ice9/free_resources_article_thumb/Bagging.PNG)

Here are the key components and principles of Bootstrap Aggregation:

1. **Bootstrap Sampling:**
   - Bagging involves creating multiple bootstrap samples from the original training dataset. A bootstrap sample is obtained by randomly sampling with replacement from the training data. Each bootstrap sample is of the same size as the original dataset, but it may contain duplicate instances and exclude some original instances.

2. **Base Model Training:**
   - A base model (learner) is trained on each bootstrap sample independently. The base models can be of any type, but decision trees are commonly used.

3. **Parallel Training:**
   - The training of base models is typically done in parallel, allowing for efficient use of computational resources.

4. **Aggregation:**
   - The predictions of individual base models are aggregated to form the final ensemble prediction. The method of aggregation depends on the task:
      - For classification: Voting or averaging of class probabilities.
      - For regression: Averaging of individual predictions.

5. **Reduction of Variance:**
   - Bagging helps reduce the variance of the model by introducing diversity among the base models. This is achieved by training each base model on a slightly different dataset due to the randomness introduced by bootstrap sampling.



7. **Out-of-Bag (OOB) Evaluation:**
   - As each base model is trained on a bootstrap sample, there are instances not included in its training set. These out-of-bag instances can be used for model evaluation without the need for a separate validation set.

8. **Application to Various Models:**
   - While bagging is commonly associated with decision trees and Random Forests, it can be applied to various base models, including linear models, support vector machines, and more.

Bagging is effective when the base models are unstable or have high variance. By combining multiple models trained on different subsets of data, bagging improves generalization and robustness, making it a powerful technique in ensemble learning.

5. **Boosting**

  is an ensemble machine learning technique that aims to improve the predictive performance of a model by combining the predictions of multiple weak learners. Unlike bagging, where models are trained independently and their predictions are averaged or voted upon, boosting trains models sequentially, with each new model focusing on correcting the errors made by the previous ones.

Here are the key concepts and principles of boosting:

1. **Weak Learners:**
   - Boosting typically uses weak learners as base models. A weak learner is a model that performs slightly better than random chance.

2. **Sequential Training:**
   - Models are trained sequentially, and each subsequent model focuses on the mistakes made by the ensemble of models trained so far.

3. **Weighted Training Instances:**
   - During training, instances that are misclassified by the previous models are assigned higher weights. This gives more emphasis to the difficult-to-classify instances in the subsequent models.

4. **Model Weighting:**
   - Each model is assigned a weight based on its accuracy. Models with higher accuracy are given more weight in the final prediction.

5. **Adaptive Learning Rate:**
   - Boosting algorithms often use an adaptive learning rate, adjusting the contribution of each model based on its performance. This helps prevent overshooting and oscillations.

6. **Common Boosting Algorithms:**
   - Some well-known boosting algorithms include:
     - **AdaBoost (Adaptive Boosting):** Focuses on correcting misclassifications by assigning higher weights to misclassified instances.
     - **Gradient Boosting:** Minimizes the errors of the previous models by fitting new models to the residuals. Common implementations include XGBoost, LightGBM, and CatBoost.
     - **Stochastic Gradient Boosting (SGD):** Applies gradient boosting in a stochastic manner, using random subsets of data for each iteration.

7. **Final Prediction:**
   - The final prediction is a weighted combination of the predictions of all the weak learners. Models with higher accuracy contribute more to the final prediction.

8. **Reducing Bias and Variance:**
   - Boosting aims to reduce both bias and variance. It reduces bias by sequentially correcting mistakes, and it reduces variance by combining diverse models.

9. **Robust to Overfitting:**
   - Boosting is less prone to overfitting compared to training a single complex model. The emphasis on misclassified instances during training helps the model generalize well to unseen data.

10. **Feature Importance:**
    - Boosting algorithms often provide insights into feature importance, indicating which features are more influential in making accurate predictions.

Boosting is widely used in practice and has been successful in various machine learning applications, including classification, regression, and ranking. It has become a fundamental technique and is implemented in popular machine learning libraries.