### Method 1: Model Averaging
Model averaging is an ensemble technique where the predictions of multiple models are combined to create a single prediction. In model averaging, the weights assigned to each model are typically determined by their performance on a validation dataset. The predictions of the models are then weighted by these values, and the resulting weighted average is used as the final prediction.

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, random_state=42)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

# Train a logistic regression model
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)

# Train a random forest classifier
rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)

# Make predictions on the validation set using each base model
preds1 = lr_model.predict(X_val)
preds2 = rf_model.predict(X_val)

# Calculate the accuracy of each base model on the validation set
acc1 = np.mean(preds1 == y_val)
acc2 = np.mean(preds2 == y_val)

# Determine the weight for each model based on its validation set accuracy
weights = [acc1 / (acc1 + acc2), acc2 / (acc1 + acc2)]

# Combine the predictions of each base model using the weights determined above
ensemble_preds = (weights[0] * preds1) + (weights[1] * preds2)

# Evaluate the performance of the ensemble model on a test set
ensemble_acc = np.mean(ensemble_preds.round() == y_test)

print(ensemble_acc)

0.46


### Method 2: Voting
The voting classifier works by taking the predictions from each model and using a majority vote to determine the final prediction. This can be done either by 'hard voting' (where the class with the most votes is selected) or 'soft voting' (where the predicted probabilities for each class are averaged across the models, and the class with the highest average probability is selected).

In [2]:
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.svm import SVC

# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define your base models
model1 = RandomForestClassifier(n_estimators=100)
model2 = SVC(kernel='rbf', probability=True)

# Create an ensemble model using the VotingClassifier
ensemble = VotingClassifier(estimators=[('rf', model1), ('svm', model2)], voting='soft')

# Train the ensemble model on your dataset
ensemble.fit(X_train, y_train)

# Evaluate the ensemble model
ensemble.score(X_test, y_test)

0.94

### Method 3: Stacking
The idea behind stacking is to train several models on the same dataset and use their predictions as input to a higher-level model, which then makes the final prediction. The base models predictions are used as features for the meta-model. Once the meta-model is trained, it can be used to make predictions on new data.

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np

# Create a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define two base models
base_model_1 = RandomForestClassifier(random_state=42)
base_model_2 = LogisticRegression(random_state=42)

# Define a meta model
meta_model = LogisticRegression(random_state=42)

class StackingModel:
    
    def __init__(self, base_models, meta_model):
        self.base_models = base_models
        self.meta_model = meta_model
        
    def fit(self, X, y):
        base_model_preds = np.zeros((X.shape[0], len(self.base_models)))
        for i, model in enumerate(self.base_models):
            model.fit(X, y)
            y_pred = model.predict(X)
            base_model_preds[:, i] = y_pred
        self.meta_model.fit(base_model_preds, y)
    
    def predict(self, X):
        base_model_preds = np.zeros((X.shape[0], len(self.base_models)))
        for i, model in enumerate(self.base_models):
            y_pred = model.predict(X)
            base_model_preds[:, i] = y_pred
        return self.meta_model.predict(base_model_preds)

# Create the stacking model
stacking_model = StackingModel([base_model_1, base_model_2], meta_model)

# Train the stacking model on the training data
stacking_model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = stacking_model.predict(X_test)

# Evaluate the accuracy of the stacking model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


Accuracy: 0.945
