Theoretical Questions

Question 1: Can we use Bagging for regression problems?
Answer 1:
Yes, Bagging works for regression by averaging predictions from multiple models.

Question 2: What is the difference between multiple model training and single model training?
Answer 2:
Single model uses one algorithm; multiple model combines several for better performance.

Question 3: Explain the concept of feature randomness in Random Forest.
Answer 3:
Each tree considers a random subset of features at each split to reduce correlation.

Question 4: What is OOB (Out-of-Bag) Score?
Answer 4:
A validation score using data not included in bootstrap samples.

Question 5: How can you measure the importance of features in a Random Forest model?
Answer 5:
By checking how much each feature reduces impurity or affects performance when shuffled.

Question 6: Explain the working principle of a Bagging Classifier.
Answer 6:
Trains multiple models on random samples and combines results via majority vote.

Question 7: How do you evaluate a Bagging Classifier’s performance?
Answer 7:
Use accuracy, precision, recall, F1-score, and OOB score if enabled.

Question 8: How does a Bagging Regressor work?
Answer 8:
Trains regressors on different samples and averages their predictions.

Question 9: What is the main advantage of ensemble techniques?
Answer 9:
Higher accuracy and robustness compared to single models.

Question 10: What is the main challenge of ensemble methods?
Answer 10:
Complexity and reduced interpretability.

Question 11: Explain the key idea behind ensemble techniques.
Answer 11:
Combine multiple models to improve prediction accuracy and reduce overfitting.

Question 12: What is a Random Forest Classifier?
Answer 12:
An ensemble of decision trees using bagging and feature randomness for classification.

Question 13: What are the main types of ensemble techniques?
Answer 13:
Bagging, Boosting, and Stacking.

Question 14: What is ensemble learning in machine learning?
Answer 14:
A method that combines multiple models to make better predictions than a single model.

Question 15: When should we avoid using ensemble methods?
Answer 15:
When model interpretability is crucial or data size is too small.

Question 16: How does Bagging help in reducing overfitting?
Answer 16:
By averaging predictions from multiple models trained on different data samples.

Question 17: Why is Random Forest better than a single Decision Tree?
Answer 17:
It reduces overfitting and increases accuracy through ensemble learning.

Question 18: What is the role of bootstrap sampling in Bagging?
Answer 18:
It creates diverse training datasets to build multiple models.

Question 19: What are some real-world applications of ensemble techniques?
Answer 19:
Fraud detection, spam filtering, medical diagnosis, and recommendation systems.

Question 20: What is the difference between Bagging and Boosting?
Answer 20:
Bagging builds models independently; Boosting builds them sequentially with focus on errors.

Practical Questions

Question 21: Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy[link text](https://)

In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = BaggingClassifier(DecisionTreeClassifier(), n_estimators=10)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))



Question 22: Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)

In [None]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = BaggingRegressor(DecisionTreeRegressor(), n_estimators=10)
model.fit(X_train, y_train)
print("MSE:", mean_squared_error(y_test, model.predict(X_test)))


Question 23: Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_
for name, score in zip(load_breast_cancer().feature_names, importances):
    print(f"{name}: {score}")


Question 24: Train a Random Forest Regressor and compare its performance with a single Decision Tree

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

dt = DecisionTreeRegressor().fit(X_train, y_train)
rf = RandomForestRegressor().fit(X_train, y_train)
print("DT MSE:", mean_squared_error(y_test, dt.predict(X_test)))
print("RF MSE:", mean_squared_error(y_test, rf.predict(X_test)))


Question 25: Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier

In [None]:
model = RandomForestClassifier(oob_score=True)
model.fit(X, y)
print("OOB Score:", model.oob_score_)


Question 26: Train a Bagging Classifier using SVM as a base estimator and print accuracy

In [None]:
from sklearn.svm import SVC

model = BaggingClassifier(base_estimator=SVC(), n_estimators=10)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Question 27: Train a Random Forest Classifier with different numbers of trees and compare accuracy

In [None]:
for n in [10, 50, 100]:
    rf = RandomForestClassifier(n_estimators=n)
    rf.fit(X_train, y_train)
    print(f"{n} trees accuracy:", accuracy_score(y_test, rf.predict(X_test)))


Question 28: Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

model = BaggingClassifier(base_estimator=LogisticRegression(), n_estimators=10)
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]
print("AUC Score:", roc_auc_score(y_test, probs))


Question 29: Train a Random Forest Regressor and analyze feature importance scores

In [None]:
rf = RandomForestRegressor()
rf.fit(X, y)
importances = rf.feature_importances_
print("Feature Importances:", importances)


Question 30: Train an ensemble model using both Bagging and Random Forest and compare accuracy

In [None]:
bc = BaggingClassifier(DecisionTreeClassifier())
rf = RandomForestClassifier()
bc.fit(X_train, y_train)
rf.fit(X_train, y_train)
print("Bagging Accuracy:", accuracy_score(y_test, bc.predict(X_test)))
print("RF Accuracy:", accuracy_score(y_test, rf.predict(X_test)))


Question 31: Train a Random Forest Classifier and tune hyperparameters using GridSearchCV

In [None]:
from sklearn.model_selection import GridSearchCV

params = {'n_estimators': [10, 50, 100], 'max_depth': [None, 5, 10]}
grid = GridSearchCV(RandomForestClassifier(), params, cv=3)
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)


Question 32: Train a Bagging Regressor with different numbers of base estimators and compare performance

In [None]:
for n in [5, 10, 20]:
    model = BaggingRegressor(n_estimators=n)
    model.fit(X_train, y_train)
    print(f"{n} estimators MSE:", mean_squared_error(y_test, model.predict(X_test)))


Question 33: Train a Random Forest Classifier and analyze misclassified samples

In [None]:
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
misclassified = X_test[y_test != y_pred]
print("Misclassified Samples:", misclassified)


Question 34: Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier

In [None]:
dt = DecisionTreeClassifier().fit(X_train, y_train)
bag = BaggingClassifier(DecisionTreeClassifier()).fit(X_train, y_train)
print("DT Accuracy:", accuracy_score(y_test, dt.predict(X_test)))
print("Bagging Accuracy:", accuracy_score(y_test, bag.predict(X_test)))


Question 35: Train a Random Forest Classifier and visualize the confusion matrix

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

model = RandomForestClassifier().fit(X_train, y_train)
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)


Question 36: Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

estimators = [('dt', DecisionTreeClassifier()), ('svm', SVC(probability=True))]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
print("Stacking Accuracy:", accuracy_score(y_test, stack.predict(X_test)))


Question 37: Train a Random Forest Classifier and print the top 5 most important features

In [None]:
import numpy as np

model = RandomForestClassifier().fit(X, y)
importances = model.feature_importances_
top_indices = np.argsort(importances)[-5:][::-1]
for i in top_indices:
    print(f"Feature {i}: {importances[i]}")


Question 38: Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

model = BaggingClassifier().fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))


Question 39: Train a Random Forest Classifier and analyze the effect of max_depth on accuracy

In [None]:
for depth in [3, 5, 10, None]:
    model = RandomForestClassifier(max_depth=depth)
    model.fit(X_train, y_train)
    print(f"Max depth {depth}: Accuracy = {accuracy_score(y_test, model.predict(X_test))}")


Question 40: Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance

In [None]:
from sklearn.neighbors import KNeighborsRegressor

dt_model = BaggingRegressor(base_estimator=DecisionTreeRegressor()).fit(X_train, y_train)
knn_model = BaggingRegressor(base_estimator=KNeighborsRegressor()).fit(X_train, y_train)

print("DT Regressor MSE:", mean_squared_error(y_test, dt_model.predict(X_test)))
print("KNN Regressor MSE:", mean_squared_error(y_test, knn_model.predict(X_test)))


Question 41: Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score

In [None]:
probs = model.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, probs))


Question 42: Train a Bagging Classifier and evaluate its performance using cross-validation

In [None]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(BaggingClassifier(), X, y, cv=5)
print("Cross-Val Accuracy:", scores.mean())


Question 43: Train a Random Forest Classifier and plot the Precision-Recall curve


In [None]:
from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay

probs = model.predict_proba(X_test)[:, 1]
precision, recall, _ = precision_recall_curve(y_test, probs)
PrecisionRecallDisplay(precision=precision, recall=recall).plot()


Question 44: Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy

In [None]:
estimators = [('rf', RandomForestClassifier()), ('lr', LogisticRegression())]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
print("Stacking Accuracy:", accuracy_score(y_test, stack.predict(X_test)))


Question 45: Train a Bagging Regressor with different levels of bootstrap samples and compare performance

In [None]:
for max_samples in [0.5, 0.7, 1.0]:
    model = BaggingRegressor(max_samples=max_samples)
    model.fit(X_train, y_train)
    print(f"Bootstrap {max_samples}: MSE = {mean_squared_error(y_test, model.predict(X_test))}")
