In [None]:
'https://scikit-learn.org/stable/modules/ensemble.html'

In [None]:
# plots 
'https://scikit-learn.org/stable/auto_examples/ensemble/plot_voting_regressor.html#sphx-glr-auto-examples-ensemble-plot-voting-regressor-py'

### Voting Classifier

Here's an example code snippet for ensemble learning using a technique called "voting" in Python

In this example, we use the VotingClassifier from the sklearn.ensemble module to create an ensemble model. Three different classifiers, namely DecisionTreeClassifier, SVC (Support Vector Classifier), and LogisticRegression, are defined as individual models. These models are then combined into an ensemble using the VotingClassifier, specifying the voting strategy as 'hard', which means the majority class prediction is selected.

The ensemble is trained using the fit() function on the training data. Predictions are made on the test set using the predict() method of the ensemble model. Finally, the accuracy of the ensemble predictions is evaluated using the accuracy_score() function from sklearn.metrics.

Note that this is just a basic example, and there are various other techniques and strategies for ensemble learning, such as weighted voting, stacking, and boosting, which can be explored depending on the problem and data characteristics.

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define individual classifiers
classifier_1 = DecisionTreeClassifier()
classifier_2 = SVC(probability=True)
classifier_3 = LogisticRegression()

# Create an ensemble using voting
ensemble = VotingClassifier(estimators=[('dt', classifier_1), ('svm', classifier_2), ('lr', classifier_3)], voting='hard')

# Fit the ensemble model
ensemble.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ensemble.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Ensemble Accuracy:", accuracy)

### Stacking Ensemble

Stacking is an ensemble learning technique that involves training multiple models and using another model, called a meta-model, to combine their predictions. Here's an example code snippet for stacking ensemble using scikit-learn library in Python.

In this example, we use the StackingClassifier from the sklearn.ensemble module to create a stacking ensemble model. Three different base classifiers, namely DecisionTreeClassifier, SVC (Support Vector Classifier), and LogisticRegression, are defined. These base classifiers are specified as estimators in the StackingClassifier.

A meta-classifier, DecisionTreeClassifier in this case, is also defined as the final estimator. It is trained to combine the predictions of the base classifiers.

The stacking ensemble is trained using the fit() function on the training data. Predictions are made on the test set using the predict() method of the ensemble model. Finally, the accuracy of the ensemble predictions is evaluated using the accuracy_score() function from sklearn.metrics.

Note that in stacking, the base classifiers can be trained on the entire training set, while the meta-classifier is typically trained on a holdout set or using cross-validation to avoid overfitting.

This is a basic example to demonstrate the stacking ensemble technique, and you can further explore and customize the approach based on your specific problem and data.

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define individual base classifiers
classifier_1 = DecisionTreeClassifier()
classifier_2 = SVC(probability=True)
classifier_3 = LogisticRegression()

# Define the meta-model
meta_classifier = DecisionTreeClassifier()

# Create the stacking ensemble
ensemble = StackingClassifier(estimators=[('dt', classifier_1), ('svm', classifier_2), ('lr', classifier_3)], 
                              final_estimator=meta_classifier)

# Fit the stacking ensemble model
ensemble.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ensemble.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Ensemble Accuracy:", accuracy)

### Bagging Ensemble

Bagging, short for Bootstrap Aggregating, is an ensemble learning technique that involves training multiple models on different subsets of the training data and combining their predictions. Here's an example code snippet for bagging ensemble using scikit-learn library in Python:

In this example, we use the BaggingClassifier from the sklearn.ensemble module to create a bagging ensemble model. A base classifier, in this case, DecisionTreeClassifier, is defined as the base estimator for the bagging ensemble.

The n_estimators parameter is set to 10, indicating that the ensemble will consist of 10 individual classifiers. Each classifier is trained on a randomly selected subset of the training data, created using the bootstrap sampling technique.

The bagging ensemble is trained using the fit() function on the training data. Predictions are made on the test set using the predict() method of the ensemble model. Finally, the accuracy of the ensemble predictions is evaluated using the accuracy_score() function from sklearn.metrics.

Note that bagging can be used with various base classifiers and other techniques like random feature selection (random subspaces) to further diversify the ensemble. Additionally, bagging can be applied to other types of models, such as regression or clustering algorithms, with appropriate modifications.

This is a basic example to demonstrate the bagging ensemble technique, and you can further explore and customize the approach based on your specific problem and data.

In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the base classifier
base_classifier = DecisionTreeClassifier()

# Create the bagging ensemble
ensemble = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)

# Fit the bagging ensemble model
ensemble.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ensemble.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Ensemble Accuracy:", accuracy)

### Bagging ensemble code with svm as base estimator

Bagging ensemble using Support Vector Machines (SVM) as the base estimator in scikit-learn library in Python

In this example, we use the BaggingClassifier from the sklearn.ensemble module to create a bagging ensemble model. The base classifier is set as SVC (Support Vector Classifier) with the kernel='linear' parameter.

The n_estimators parameter is set to 10, indicating that the ensemble will consist of 10 individual classifiers. Each classifier is trained on a randomly selected subset of the training data, created using the bootstrap sampling technique.

The bagging ensemble is trained using the fit() function on the training data. Predictions are made on the test set using the predict() method of the ensemble model. Finally, the accuracy of the ensemble predictions is evaluated using the accuracy_score() function from sklearn.metrics.

Feel free to adjust the parameters, such as the kernel or the number of estimators, based on your specific problem and data requirements.

Remember to import the necessary libraries (sklearn.ensemble, sklearn.svm, sklearn.datasets, sklearn.model_selection, sklearn.metrics, etc.) before running the code.

In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the base classifier
base_classifier = SVC(kernel='linear')

# Create the bagging ensemble
ensemble = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)

# Fit the bagging ensemble model
ensemble.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ensemble.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Ensemble Accuracy:", accuracy)


### Bagging ensemble code using Logistic Regression as the base estimator in Python:

In this code, we first generate a synthetic classification dataset using the make_classification function from scikit-learn. Then, we split the dataset into training and testing sets.

Next, we create an instance of the Logistic Regression classifier as the base estimator. We then initialize the BaggingClassifier, passing the base estimator and the number of estimators (10 in this example). The random_state parameter ensures reproducibility of results.

We train the bagging ensemble by calling the fit method on the bagging object, using the training data.

After training, we make predictions on the test set using the predict method and calculate the accuracy of the bagging ensemble using the score method.

Feel free to modify the code as per your specific requirements or dataset.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression

# Generate a synthetic dataset for classification
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a base logistic regression estimator
base_estimator = LogisticRegression()

# Create the bagging classifier
bagging = BaggingClassifier(base_estimator=base_estimator, n_estimators=10, random_state=42)

# Train the bagging ensemble
bagging.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging.predict(X_test)

# Evaluate the performance of the bagging ensemble
accuracy = bagging.score(X_test, y_test)
print("Accuracy:", accuracy)

### Bagging ensemble code with Random forest as base estimator

bagging ensemble code using Random Forest as the base estimator in Python:
    
In this code, we first generate a synthetic classification dataset using the make_classification function from scikit-learn. Then, we split the dataset into training and testing sets.

Next, we create an instance of the Random Forest classifier as the base estimator. We then initialize the BaggingClassifier, passing the base estimator, the number of estimators (10 in this example), and the random_state parameter for reproducibility.

We train the bagging ensemble by calling the fit method on the bagging object, using the training data.

After training, we make predictions on the test set using the predict method and calculate the accuracy of the bagging ensemble using the score method.

Feel free to modify the code as per your specific requirements or dataset.    

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier

# Generate a synthetic dataset for classification
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a base random forest estimator
base_estimator = RandomForestClassifier(n_estimators=100, random_state=42)

# Create the bagging classifier
bagging = BaggingClassifier(base_estimator=base_estimator, n_estimators=10, random_state=42)

# Train the bagging ensemble
bagging.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging.predict(X_test)

# Evaluate the performance of the bagging ensemble
accuracy = bagging.score(X_test, y_test)
print("Accuracy:", accuracy)

### Boosting Ensemble

Boosting is an ensemble learning technique that combines multiple weak learners (models with low predictive power) to create a strong learner. Here's an example code snippet for boosting ensemble using scikit-learn library in Python:

In this example, we use the AdaBoostClassifier from the sklearn.ensemble module to create a boosting ensemble model. A base classifier, in this case, DecisionTreeClassifier with max_depth=1, is defined as the weak learner for the boosting ensemble.

The n_estimators parameter is set to 50, indicating that the ensemble will consist of 50 weak learners. Each weak learner is trained sequentially, with each subsequent learner focusing more on the samples that were misclassified by the previous learners.

The boosting ensemble is trained using the fit() function on the training data. Predictions are made on the test set using the predict() method of the ensemble model. Finally, the accuracy of the ensemble predictions is evaluated using the accuracy_score() function from sklearn.metrics.

Note that boosting can be used with various base classifiers and other techniques like gradient boosting (e.g., GradientBoostingClassifier) or XGBoost to further improve the performance of the ensemble.

This is a basic example to demonstrate the boosting ensemble technique, and you can further explore and customize the approach based on your specific problem and data.

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the base classifier
base_classifier = DecisionTreeClassifier(max_depth=1)

# Create the boosting ensemble
ensemble = AdaBoostClassifier(base_classifier, n_estimators=50, random_state=42)

# Fit the boosting ensemble model
ensemble.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ensemble.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Ensemble Accuracy:", accuracy)


### Boosting ensemble code with svm as base estimator

boosting ensemble code using SVM (Support Vector Machine) as the base estimator in Python:
    
In this code, we first generate a synthetic classification dataset using the make_classification function from scikit-learn. Then, we split the dataset into training and testing sets.

Next, we create an instance of the SVM classifier as the base estimator. We set probability=True to enable probability estimates, which are required for boosting.

We then initialize the AdaBoostClassifier, passing the base estimator, the number of estimators (10 in this example), and the random_state parameter for reproducibility.

We train the boosting ensemble by calling the fit method on the boosting object, using the training data.

After training, we make predictions on the test set using the predict method and calculate the accuracy of the boosting ensemble using the score method.

Feel free to modify the code as per your specific requirements or dataset.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC

# Generate a synthetic dataset for classification
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a base SVM estimator
base_estimator = SVC(probability=True, random_state=42)

# Create the boosting classifier
boosting = AdaBoostClassifier(base_estimator=base_estimator, n_estimators=10, random_state=42)

# Train the boosting ensemble
boosting.fit(X_train, y_train)

# Make predictions on the test set
y_pred = boosting.predict(X_test)

# Evaluate the performance of the boosting ensemble
accuracy = boosting.score(X_test, y_test)
print("Accuracy:", accuracy)