**1. Can we use Bagging for regression problems?**

Yes. Bagging can be used for regression. It trains multiple regressors on bootstrapped samples and averages their predictions, reducing variance and improving stability.

**2. Difference between multiple model training and single model training**

| Single Model        | Multiple (Ensemble) Models |
|--------------------|---------|
| One model trained      | Many models trained    |
| High variance | Reduced variance    |
| More prone to overfitting | More robust    |
| Less accurate | Higher accuracy    |

**3. Feature randomness in Random Forest**

Random Forest selects a random subset of features at every split. This makes trees diverse and prevents dominance of strong predictors.

**4. What is OOB Score?**

Out-of-Bag (OOB) score uses data not selected in bootstrap samples to evaluate model performance without a separate test set.

**5. How to measure feature importance in Random Forest?**

By calculating how much each feature reduces impurity (Gini or MSE) across all trees.

**6. Working principle of Bagging Classifier**

Create multiple bootstrap samples

Train one model on each

Combine predictions using majority voting

**7. How to evaluate Bagging Classifier?**

Using:

Accuracy

Precision, Recall, F1-score

ROC-AUC

Cross-validation

**8. How does a Bagging Regressor work?**

It trains multiple regressors on bootstrapped datasets and averages their outputs.

**9. Main advantage of ensemble techniques**

Higher accuracy and reduced overfitting.

**10. Main challenge of ensemble methods**

High computational cost and reduced interpretability.

**11. Key idea behind ensemble techniques**

Combining multiple weak learners creates a strong learner.

**12. What is Random Forest Classifier?**

An ensemble of decision trees built using bootstrap samples and feature randomness.

**13. Main types of ensemble techniques**

Bagging

Boosting

Stacking

**14. What is ensemble learning?**

A technique that combines multiple models to improve performance.

**15. When should we avoid ensembles?**

When:

Data is very small

Real-time prediction is required

Interpretability is important

**16. How does Bagging reduce overfitting?**

By averaging multiple high-variance models, variance decreases.

**17. Why is Random Forest better than a single tree?**

It reduces overfitting and improves accuracy through averaging.

**18. Role of bootstrap sampling**

Creates diverse datasets for training different models.

**19. Real-world applications**

Fraud detection

Stock prediction

Medical diagnosis

Recommendation systems

**20. Difference between Bagging and Boosting**

| Bagging | Boosting |
|----------------|----------|
| Parallel training   | Sequential training |
| Reduces variance | Reduces bias |
| Equal weight | Weighted samples |

In [18]:
#All tasks are from page-1 of your PDF Ensemble_Learning

from sklearn.datasets import load_breast_cancer, make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
import pandas as pd

#21. Bagging Classifier
print("21. Bagging Classifier")
X,y = load_breast_cancer(return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
model = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50)
model.fit(X_train,y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))

#22. Bagging Regressor
print("22. Bagging Regressor")
X,y = make_regression(n_samples=1000,n_features=5)
X_train,X_test,y_train,y_test = train_test_split(X,y)
model = BaggingRegressor(DecisionTreeRegressor(), n_estimators=50)
model.fit(X_train,y_train)
print("MSE:", mean_squared_error(y_test, model.predict(X_test)))

#23. Random Forest Classifier – Feature Importance
print("23. Random Forest Classifier – Feature Importance")
# Load classification dataset (Breast Cancer)
data = load_breast_cancer()
X_cls = data.data
y_cls = data.target
feature_names = load_breast_cancer().feature_names

X_train_cls, X_test_cls, y_train_cls, y_test_cls = train_test_split(
    X_cls, y_cls, test_size=0.3, random_state=42
)

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_cls, y_train_cls)

y_pred_cls = rf_classifier.predict(X_test_cls)
print("Random Forest Classifier Accuracy:", accuracy_score(y_test_cls, y_pred_cls))

# Feature Importance for Classifier
clf_importance = pd.DataFrame({
    "Feature Name": feature_names,
    "Importance": rf_classifier.feature_importances_
}).sort_values(by="Importance", ascending=False)

print("\nQ23 – Random Forest Classifier Feature Importance")
print(clf_importance.head())

#24. RF vs Decision Tree
print("24. RF vs Decision Tree")
dt = DecisionTreeRegressor()
rf = RandomForestRegressor()
dt.fit(X_train,y_train)
rf.fit(X_train,y_train)
print("DT MSE:", mean_squared_error(y_test, dt.predict(X_test)))
print("RF MSE:", mean_squared_error(y_test, rf.predict(X_test)))

#25. OOB Score
print("25. OOB Score")
rf = RandomForestClassifier(n_estimators=100,
    oob_score=True,
    bootstrap=True,
    random_state=42)
rf.fit(X_train_cls, y_train_cls)
print("OOB Score:", rf.oob_score_)

#26. Bagging with SVM
print("26. Bagging with SVM")
model = BaggingClassifier(SVC(), n_estimators=10)
model.fit(X_train_cls, y_train_cls)
print(accuracy_score(y_test_cls, model.predict(X_test_cls)))

#27. RF with different trees
print("27. RF with different trees")
for n in [10,50,100]:
  rf = RandomForestClassifier(n_estimators=n)
  rf.fit(X_train_cls, y_train_cls)
  print(n, accuracy_score(y_test_cls, rf.predict(X_test_cls)))

#28. Bagging with Logistic Regression (AUC)
print("28. Bagging with Logistic Regression (AUC)")
log_reg = LogisticRegression(max_iter=5000)

bag_lr = BaggingClassifier(
    estimator=log_reg,
    n_estimators=20,
    random_state=42
)

bag_lr.fit(X_train_cls, y_train_cls)

y_pred_proba = bag_lr.predict_proba(X_test_cls)[:,1]

print("Q28 – AUC Score:", roc_auc_score(y_test_cls, y_pred_proba))

#29. RF Regressor Feature Importance
print("29. RF Regressor Feature Importance")
# Create regression dataset
X_reg, y_reg = make_regression(n_samples=1000, n_features=5, noise=10, random_state=42)

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.3, random_state=42
)

rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train_reg, y_train_reg)

y_pred_reg = rf_regressor.predict(X_test_reg)
print("\nRandom Forest Regressor MSE:", mean_squared_error(y_test_reg, y_pred_reg))

# Feature Importance for Regressor
reg_importance = pd.DataFrame({
    "Feature Index": [f"Feature {i}" for i in range(X_reg.shape[1])],
    "Importance": rf_regressor.feature_importances_
}).sort_values(by="Importance", ascending=False)

print("\nQ29 – Random Forest Regressor Feature Importance")
print(reg_importance)

#30. Bagging vs Random Forest
print("30. Bagging vs Random Forest")
bag = BaggingClassifier()
rf = RandomForestClassifier()
bag.fit(X_train_cls,y_train_cls)
rf.fit(X_train_cls,y_train_cls)
print("Bagging:", accuracy_score(y_test_cls, bag.predict(X_test_cls)))
print("Random Forest:", accuracy_score(y_test_cls, rf.predict(X_test_cls)))

21. Bagging Classifier
Accuracy: 0.9298245614035088
22. Bagging Regressor
MSE: 1233.70220691645
23. Random Forest Classifier – Feature Importance
Random Forest Classifier Accuracy: 0.9707602339181286

Q23 – Random Forest Classifier Feature Importance
            Feature Name  Importance
7    mean concave points    0.141934
27  worst concave points    0.127136
23            worst area    0.118217
6         mean concavity    0.080557
20          worst radius    0.077975
24. RF vs Decision Tree
DT MSE: 3698.253646146087
RF MSE: 1273.105982683196
25. OOB Score
OOB Score: 0.9547738693467337
26. Bagging with SVM
0.9473684210526315
27. RF with different trees
10 0.9649122807017544
50 0.9707602339181286
100 0.9707602339181286
28. Bagging with Logistic Regression (AUC)
Q28 – AUC Score: 0.9976484420928865
29. RF Regressor Feature Importance

Random Forest Regressor MSE: 476.95242474941597

Q29 – Random Forest Regressor Feature Importance
  Feature Index  Importance
1     Feature 1    0.545886
0 