#Theoretical Questions


1. Can we use Bagging for regression problems?

  -> Yes, Bagging can be used for regression problems. It improves the stability and accuracy of regression models by averaging predictions from multiple models trained on different bootstrap samples of the data. For example, a Bagging Regressor can be applied to predict housing prices by averaging the outputs of several decision trees trained on different subsets of the dataset​.

2.What is the difference between multiple model training and single model training?

  -> Multiple model training involves training several models independently and combining their predictions, while single model training focuses on optimizing one model. For instance, in ensemble methods like Bagging, multiple decision trees are trained separately and their predictions are averaged, whereas a single decision tree would be trained on the entire dataset without aggregation​.

3. Explain the concept of feature randomness in Random Forest.

  -> Feature randomness in Random Forest refers to the practice of selecting a random subset of features for each decision tree during training. This reduces correlation among trees and enhances model diversity. For example, if a dataset has 10 features, a tree might only use 3 randomly selected features to make splits, leading to more robust predictions​.

4. What is OOB (Out-of-Bag) Score?

  ->  The OOB Score is an internal validation method used in Random Forests to estimate the model's performance. It uses the data points that were not included in the bootstrap samples for each tree to evaluate the model. This provides an unbiased estimate of the model's accuracy without needing a separate validation set​.

5. How can you measure the importance of features in a Random Forest model?

  ->  Feature importance in a Random Forest model can be measured using two main methods: Mean Decrease Impurity (MDI) and Mean Decrease Accuracy (MDA). MDI calculates the total decrease in node impurity (e.g., Gini impurity) brought by a feature, while MDA assesses the drop in model accuracy when the feature's values are permuted. Features with higher importance scores are more influential in making predictions​.

6. Explain the working principle of a Bagging Classifier.

  -> A Bagging Classifier works by creating multiple bootstrap samples from the training dataset, training a separate classifier (e.g., decision tree) on each sample, and then aggregating their predictions through majority voting for classification tasks. This process reduces variance and improves overall model accuracy. For example, if three classifiers predict classes A, A, and B, the final prediction will be A​.

7. How do you evaluate a Bagging Classifier’s performance?

  -> The performance of a Bagging Classifier can be evaluated using metrics such as accuracy, precision, recall, and F1-score. Additionally, the Out-of-Bag (OOB) error can be used as a validation score, providing an estimate of the model's generalization performance without needing a separate test set​.

8. How does a Bagging Regressor work?

  ->  A Bagging Regressor operates similarly to a Bagging Classifier but focuses on regression tasks. It generates multiple bootstrap samples, trains a regressor (e.g., decision tree) on each sample, and averages the predictions from all regressors to produce the final output. This helps to reduce overfitting and improve prediction accuracy. For instance, predicting house prices by averaging the outputs of several regression trees trained on different subsets of the data​.

9. What is the main advantage of ensemble techniques?

  -> The main advantage of ensemble techniques is their ability to improve predictive performance by combining multiple models, which reduces the risk of overfitting and increases robustness. For example, Random Forests, an ensemble of decision trees, often outperform individual trees by leveraging the diversity of multiple models to achieve better accuracy​.

10. What is the main challenge of ensemble methods?

   ->  The main challenge of ensemble methods is their increased computational complexity and resource requirements, as they involve training multiple models. This can lead to longer training times and higher memory usage, especially with large datasets or complex models. For instance, training a Random Forest with hundreds of trees can be computationally intensive compared to a single decision tree​.

11. Explain the key idea behind ensemble techniques.

   -> The key idea behind ensemble techniques is to combine the predictions of multiple models to improve overall performance. By aggregating diverse models, ensembles can reduce errors and enhance generalization. For example, in Bagging, multiple decision trees are trained on different subsets of data, and their predictions are averaged to create a more accurate final prediction​.

12. What is a Random Forest Classifier?

  ->  A Random Forest Classifier is an ensemble learning method that constructs a multitude of decision trees during training and outputs the mode of their predictions for classification tasks. It uses bootstrap sampling and feature randomness to create diverse trees, which helps improve accuracy and reduce overfitting. For example, it can classify whether an email is spam or not based on various features like keywords and sender information​.

13.What are the main types of ensemble techniques?

  ->  The main types of ensemble techniques include:
    Bagging: Combines predictions from multiple models trained on different bootstrap samples (e.g., Random Forest).
    Boosting: Sequentially trains models, where each new model focuses on correcting errors made by previous ones (e.g., AdaBoost).
    Stacking: Trains multiple models and then uses another model to combine their predictions (e.g., using logistic regression to combine outputs from decision trees and SVMs)​.

14. What is ensemble learning in machine learning?

 ->   Ensemble learning is a machine learning paradigm that combines multiple models to improve overall performance and robustness. By aggregating the predictions of various models, ensemble methods can reduce variance, bias, and improve accuracy. For example, using a combination of decision trees and logistic regression can yield better results than using either model alone​.

15. When should we avoid using ensemble methods?

   -> Ensemble methods should be avoided when the dataset is small or when the individual models are already highly accurate and stable. In such cases, the added complexity and computational cost of ensembles may not provide significant benefits. For instance, using an ensemble approach on a small dataset may lead to overfitting rather than improved generalization​.

16. How does Bagging help in reducing overfitting?

  ->  Bagging helps reduce overfitting by averaging the predictions of multiple models trained on different subsets of the data. This averaging process smooths out the predictions and mitigates the impact of noise and outliers, leading to a more generalized model. For example, in a Bagging Regressor, the predictions from several decision trees are averaged, which reduces the variance associated with individual trees and improves overall accuracy​.

17. Why is Random Forest better than a single Decision Tree?

  -> Random Forest is better than a single Decision Tree because it reduces overfitting and improves accuracy by aggregating the predictions of multiple trees. While a single tree can be highly sensitive to noise and outliers, a Random Forest averages the results from many trees, leading to more stable and reliable predictions. For example, in predicting customer churn, a Random Forest can provide more accurate results by considering various decision paths from multiple trees​.

18. What is the role of bootstrap sampling in Bagging?

  ->  Bootstrap sampling in Bagging involves creating multiple subsets of the training data by sampling with replacement. Each subset is used to train a separate model, and the predictions from these models are aggregated to form the final prediction. This process helps to reduce variance and improve model stability. For instance, if a dataset has 100 samples, bootstrap sampling might create several subsets, each containing around 67 samples, with some samples appearing multiple times and others not at all​.

19. What are some real-world applications of ensemble techniques?

  -> Real-world applications of ensemble techniques include:
   Finance: Credit scoring models use ensembles to predict loan defaults.
   Healthcare: Disease diagnosis models combine predictions from various algorithms to improve accuracy.
  Marketing: Customer segmentation models use ensembles to identify target audiences based on purchasing behavior​.

20. What is the difference between Bagging and Boosting?

   ->  The difference between Bagging and Boosting lies in their approach to model training. Bagging trains multiple models independently on different subsets of the data and combines their predictions, while Boosting trains models sequentially, where each new model focuses on correcting the errors of the previous ones. For example, Bagging might use random forests, while Boosting might use AdaBoost to improve weak learners by emphasizing misclassified instances in subsequent iterations​.

# Practical Questions

In [None]:
#1. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy
from sklearn.datasets import load_iris
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Bagging Classifier with Decision Trees

bagging_classifier = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=42)

# Train the model
bagging_classifier.fit(X_train, y_train)

# Make predictions
y_pred = bagging_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Bagging Classifier Accuracy: {accuracy:.4f}")

In [None]:
#2. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression

# Create a sample regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Bagging Regressor with Decision Trees
bagging_regressor = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=50, random_state=42)

# Train the model
bagging_regressor.fit(X_train, y_train)

# Make predictions
y_pred = bagging_regressor.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Bagging Regressor MSE: {mse:.4f}")

In [None]:
#3. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier

# Load the breast cancer dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_classifier.fit(X_train, y_train)

# Get feature importance scores
feature_importances = rf_classifier.feature_importances_

# Print feature importance scores
for i, feature in enumerate(breast_cancer.feature_names):
    print(f"{feature}: {feature_importances[i]:.4f}")

In [None]:
#4. Train a Random Forest Regressor and compare its performance with a single Decision Tree
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
# Import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor # This line is added to import the necessary class
# Create a sample regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a Decision Tree Regressor
dt_regressor = DecisionTreeRegressor(random_state=42)
dt_regressor.fit(X_train, y_train)
y_pred_dt = dt_regressor.predict(X_test)
mse_dt = mean_squared_error(y_test, y_pred_dt)

# Train a Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train, y_train)
y_pred_rf = rf_regressor.predict(X_test)
mse_rf = mean_squared_error(y_test, y_pred_rf)

# Compare MSE
print(f"Decision Tree Regressor MSE: {mse_dt:.4f}")
print(f"Random Forest Regressor MSE: {mse_rf:.4f}")

In [None]:
#5. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier
# Import RandomForestRegressor for regression tasks
from sklearn.ensemble import RandomForestRegressor # Import the appropriate class for regression

# Initialize the Random Forest Regressor with OOB score enabled
rf_regressor_oob = RandomForestRegressor(n_estimators=100, oob_score=True, random_state=42) # Use RandomForestRegressor for continuous targets

# Train the model
rf_regressor_oob.fit(X_train, y_train)

# Print the OOB score
print(f"OOB Score: {rf_regressor_oob.oob_score_:.4f}")

In [None]:
#6. Train a Bagging Classifier using SVM as a base estimator and print accuracy
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assuming you want to use the breast cancer dataset for this classification task
from sklearn.datasets import load_breast_cancer # Import the dataset

# Load the breast cancer dataset
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Bagging Classifier with SVM
bagging_svm_classifier = BaggingClassifier(estimator=SVC(probability=True), n_estimators=50, random_state=42)

# Train the model
bagging_svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred_svm = bagging_svm_classifier.predict(X_test)

# Calculate accuracy
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Bagging Classifier with SVM Accuracy: {accuracy_svm:.4f}")

In [None]:
#7. Train a Random Forest Classifier with different numbers of trees and compare accuracy
# Initialize lists to store accuracy results
n_estimators_list = [10, 50, 100, 200]
accuracies = []

for n in n_estimators_list:
    rf_classifier = RandomForestClassifier(n_estimators=n, random_state=42)
    rf_classifier.fit(X_train, y_train)
    y_pred = rf_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"Random Forest with {n} trees Accuracy: {accuracy:.4f}")

In [None]:
#8. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Initialize the Bagging Classifier with Logistic Regression
bagging_lr_classifier = BaggingClassifier(estimator=LogisticRegression(), n_estimators=50, random_state=42)

# Train the model
bagging_lr_classifier.fit(X_train, y_train)

# Make predictions
y_pred_lr = bagging_lr_classifier.predict_proba(X_test)[:, 1]

# Calculate AUC score
auc_score = roc_auc_score(y_test, y_pred_lr)
print(f"Bagging Classifier with Logistic Regression AUC Score: {auc_score:.4f}")

In [None]:
#9. Train a Random Forest Regressor and analyze feature importance scores
# Initialize the Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_regressor.fit(X_train, y_train)

# Get feature importance scores
feature_importances_regressor = rf_regressor.feature_importances_

# Print feature importance scores
for i, feature in enumerate(breast_cancer.feature_names):
    print(f"{feature}: {feature_importances_regressor[i]:.4f}")

In [None]:
#10. Train an ensemble model using both Bagging and Random Forest and compare accuracy
# Initialize the Bagging Classifier with Decision Trees
# Change 'base_estimator' to 'estimator'
bagging_classifier = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=42)

# Train the Bagging Classifier
bagging_classifier.fit(X_train, y_train)

# Make predictions
y_pred_bagging = bagging_classifier.predict(X_test)

# Calculate accuracy
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

# Train a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)
y_pred_rf = rf_classifier.predict(X_test)

# Calculate accuracy for Random Forest
accuracy_rf = accuracy_score(y_test, y_pred_rf)

# Compare accuracies
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.4f}")
print(f"Random Forest Classifier Accuracy: {accuracy_rf:.4f}")

In [None]:
#11. Train a Random Forest Classifier and tune hyperparameters using GridSearchCV
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize the Random Forest Classifier
rf_classifier = RandomForestClassifier(random_state=42)

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=rf_classifier, param_grid=param_grid, cv=5, scoring='accuracy')

# Train the model
grid_search.fit(X_train, y_train)

# Print the best parameters and accuracy
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.4f}")

In [None]:
#12. Train a Bagging Regressor with different numbers of base estimators and compare performance
# Initialize lists to store MSE results
n_estimators_list = [10, 50, 100, 200]
mse_results = []

for n in n_estimators_list:
    bagging_regressor = BaggingRegressor(DecisionTreeRegressor(), n_estimators=n, random_state=42)
    bagging_regressor.fit(X_train, y_train)
    y_pred = bagging_regressor.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_results.append(mse)
    print(f"Bagging Regressor with {n} estimators MSE: {mse:.4f}")

In [None]:
#13. Train a Random Forest Classifier and analyze misclassified samples
# Train the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Make predictions
y_pred_rf = rf_classifier.predict(X_test)

# Identify misclassified samples
misclassified_samples = X_test[y_pred_rf != y_test]
print(f"Number of Misclassified Samples: {len(misclassified_samples)}")



In [None]:
#14. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier
# Train a single Decision Tree Classifier
single_tree_classifier = DecisionTreeClassifier(random_state=42)
single_tree_classifier.fit(X_train, y_train)
y_pred_single_tree = single_tree_classifier.predict(X_test)
accuracy_single_tree = accuracy_score(y_test, y_pred_single_tree)

# Compare with Bagging Classifier
print(f"Single Decision Tree Classifier Accuracy: {accuracy_single_tree:.4f}")
print(f"Bagging Classifier Accuracy: {accuracy_bagging:.4f}")

In [None]:
#15. Train a Random Forest Classifier and visualize the confusion matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Train the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Make predictions
y_pred_rf = rf_classifier.predict(X_test)

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred_rf)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Malignant', 'Benign'], yticklabels=['Malignant', 'Benign'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix for Random Forest Classifier')
plt.show()

In [None]:
#16. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy
from sklearn.ensemble import StackingClassifier

# Define the base models
base_models = [
    ('dt', DecisionTreeClassifier()),
    ('svm', SVC(probability=True)),
    ('lr', LogisticRegression())
]

# Initialize the Stacking Classifier
stacking_classifier = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())

# Train the Stacking Classifier
stacking_classifier.fit(X_train, y_train)

# Make predictions
y_pred_stacking = stacking_classifier.predict(X_test)

# Calculate accuracy
accuracy_stacking = accuracy_score(y_test, y_pred_stacking)
print(f"Stacking Classifier Accuracy: {accuracy_stacking:.4f}")

In [None]:
#17. Train a Random Forest Classifier and plot the Precision-Recall curve
from sklearn.metrics import precision_recall_curve

# Train the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Make predictions
y_scores = rf_classifier.predict_proba(X_test)[:, 1]

# Calculate precision and recall
precision, recall, _ = precision_recall_curve(y_test, y_scores)

# Plot Precision-Recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve for Random Forest Classifier')
plt.grid()
plt.show()

In [None]:
#18. Train a Random Forest Classifier and print the top 5 most important features
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Get feature importances
importances = rf_classifier.feature_importances_

# Create a DataFrame for better visualization
feature_importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values('Importance', ascending=False)

# Print the top 5 most important features
top_5_features = feature_importance_df.head(5)
print("Top 5 Most Important Features:")
print(top_5_features)

In [None]:
#19. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, recall_score, f1_score

# Initialize the Bagging Classifier with Decision Trees
bagging_classifier = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)

# Train the model
bagging_classifier.fit(X_train, y_train)

# Make predictions
y_pred = bagging_classifier.predict(X_test)

# Calculate Precision, Recall, and F1-score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.4f}, Recall: {recall:.4f}, F1-score: {f1:.4f}")

In [None]:
#20. Train a Random Forest Classifier and analyze the effect of max_depth on accuracy
max_depths = [None, 5, 10, 15, 20]
accuracies = []

for max_depth in max_depths:
    rf_classifier = RandomForestClassifier(n_estimators=100, max_depth=max_depth, random_state=42)
    rf_classifier.fit(X_train, y_train)
    accuracy = rf_classifier.score(X_test, y_test)
    accuracies.append(accuracy)
    print(f"Random Forest with max_depth={max_depth} Accuracy: {accuracy:.4f}")

In [None]:
#21. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Create a sample regression dataset
from sklearn.datasets import make_regression
X_reg, y_reg = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)

# Split the dataset into training and testing sets
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)

# Initialize Bagging Regressors
bagging_dt = BaggingRegressor(DecisionTreeRegressor(), n_estimators=50, random_state=42)
bagging_knn = BaggingRegressor(KNeighborsRegressor(), n_estimators=50, random_state=42)

# Train and evaluate Decision Tree Regressor
bagging_dt.fit(X_train_reg, y_train_reg)
y_pred_dt = bagging_dt.predict(X_test_reg)
mse_dt = mean_squared_error(y_test_reg, y_pred_dt)

# Train and evaluate KNeighbors Regressor
bagging_knn.fit(X_train_reg, y_train_reg)
y_pred_knn = bagging_knn.predict(X_test_reg)
mse_knn = mean_squared_error(y_test_reg, y_pred_knn)

print(f"Bagging Regressor (Decision Tree) MSE: {mse_dt:.4f}")
print(f"Bagging Regressor (KNeighbors) MSE: {mse_knn:.4f}")

In [None]:
#22. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score
from sklearn.metrics import roc_auc_score

# Train the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Get predicted probabilities
y_scores = rf_classifier.predict_proba(X_test)[:, 1]

# Calculate ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_scores)
print(f"Random Forest ROC-AUC Score: {roc_auc:.4f}")

In [None]:
#23. Train a Bagging Classifier and evaluate its performance using cross-validation
from sklearn.model_selection import cross_val_score

# Initialize the Bagging Classifier
bagging_classifier = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)

# Perform cross-validation
cv_scores = cross_val_score(bagging_classifier, X, y, cv=5)

print(f"Cross-Validation Scores: {cv_scores}")
print(f"Mean Cross-Validation Score: {cv_scores.mean():.4f}")

In [None]:
#24. Train a Random Forest Classifier and plot the Precision-Recall curve
from sklearn.metrics import precision_recall_curve

# Train the Random Forest Classifier
rf_classifier.fit(X_train, y_train)

# Get predicted probabilities
y_scores = rf_classifier.predict_proba(X_test)[:, 1]

# Calculate precision and recall
precision, recall, _ = precision_recall_curve(y_test, y_scores)

# Plot Precision-Recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve for Random Forest Classifier')
plt.grid()
plt.show()

In [None]:
#25.Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

# Define the base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('lr', LogisticRegression())
]

# Initialize the Stacking Classifier
stacking_classifier = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())

# Train the Stacking Classifier
stacking_classifier.fit(X_train, y_train)

# Make predictions
y_pred_stacking = stacking_classifier.predict(X_test)

# Calculate accuracy
accuracy_stacking = accuracy_score(y_test, y_pred_stacking)
print(f"Stacking Classifier Accuracy: {accuracy_stacking:.4f}")

In [None]:
#26.Train a Bagging Regressor with different levels of bootstrap samples and compare performance
# Initialize lists to store MSE results
bootstrap_samples = [0.5, 0.7, 0.9]
mse_bootstrap_results = []

for bootstrap in bootstrap_samples:
    bagging_regressor = BaggingRegressor(DecisionTreeRegressor(), n_estimators=50, bootstrap=True, max_samples=bootstrap, random_state=42)
    bagging_regressor.fit(X_train_reg, y_train_reg)
    y_pred = bagging_regressor.predict(X_test_reg)
    mse = mean_squared_error(y_test_reg, y_pred)
    mse_bootstrap_results.append(mse)
    print(f"Bagging Regressor with {bootstrap*100:.0f}% bootstrap samples MSE: {mse:.4f}")