#### 1. Averaging Method

* Ensemble averaging is a technique in machine learning where multiple models are combined to produce a more accurate prediction than any individual model. It is one of the simplest types of ensemble methods, along with boosting.
* The key idea is to create a set of models with low bias and high variance, then average their predictions to obtain a model with low bias and low variance. This helps resolve the bias-variance tradeoff.

In [15]:
# Import utilities modules and ML models for predictions
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Load and split the dataset
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

# Define base models
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)

# Create an ensemble using averaging
averaging_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='soft')
averaging_model.fit(X_train, y_train)

# Predict and evaluate
y_pred = averaging_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Accuracy: 95.61%


 #### Bagging:
* It is also known as a bootstrapping method. 
* Base models are run on bags to get a fair distribution of the whole dataset. 
* A bag is a subset of the dataset along with a replacement to make the size of the bag the same as the whole dataset. 
* The final output is formed after combining the output of all base models. 

In [25]:
# importing utility modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# importing machine learning models for prediction
import xgboost as xgb

# importing bagging module
from sklearn.ensemble import BaggingRegressor

# loading cancer dataset from sklearn
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# getting target data from the dataset
target = data.target

# getting train data from the dataset
train = data.data

# Splitting between train data into training and validation dataset
X_train, X_test, y_train, y_test = train_test_split(train, target, test_size=0.20)

# initializing the bagging model using XGboost as base model with default parameters
model = BaggingRegressor(base_estimator=xgb.XGBRegressor())

# training model
model.fit(X_train, y_train)

# predicting the output on the test dataset
pred = model.predict(X_test)

# printing the mean squared error between real value and predicted value
print(mean_squared_error(y_test, pred))


0.02899591298226622


#### Boosting: 
* Boosting is a sequential method–it aims to prevent a wrong base model from affecting the final output. 
* Instead of combining the base models, the method focuses on building a new model that is dependent on the previous one.
* A new model tries to remove the errors made by its previous one. 
* Each of these models is called weak learners.
* The final model (aka strong learner) is formed by getting the weighted mean of all the weak learners. 

In [23]:
# importing utility modules and ML models for prediction
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Load the cancer dataset
cancer = load_breast_cancer()

# Get the features and target
X = cancer.data
y = cancer.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Gradient Boosting Regressor
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Calculate the Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Mean Squared Error: 0.0322
