# 1. Can we use Bagging for regression problems?
Yes. Bagging can be applied to regression tasks by training multiple regressors on bootstrap samples and averaging their predictions.

# 2. What is the difference between multiple model training and single model training?
- Single model training: One model is trained on the entire dataset.
- Multiple model training: Multiple models are trained and combined to improve performance, as in ensemble methods.

# 3. Explain the concept of feature randomness in Random Forest.
In Random Forest, feature randomness means that at each split, only a random subset of features is considered, which increases diversity among trees and reduces correlation.

# 4. What is OOB (Out-of-Bag) Score?
OOB score is the evaluation metric obtained from data samples not included in the bootstrap sample for a given tree, serving as an internal cross-validation score.

# 5. How can you measure the importance of features in a Random Forest model?
By calculating how much each feature decreases impurity (Gini or entropy) across all trees or using permutation importance.

# 6. Explain the working principle of a Bagging Classifier.
Bagging trains multiple base classifiers on bootstrap samples and aggregates their predictions (majority voting) to make the final decision.

# 7. How do you evaluate a Bagging Classifier’s performance?
By using metrics like accuracy, precision, recall, F1-score, ROC-AUC, or cross-validation scores.

# 8. How does a Bagging Regressor work?
It trains multiple regressors on bootstrap samples and averages their predictions to produce the final output.

# 9. What is the main advantage of ensemble techniques?
They improve prediction accuracy, reduce variance, and increase robustness compared to single models.

# 10. What is the main challenge of ensemble methods?
They can be computationally expensive and harder to interpret compared to single models.

# 11. Explain the key idea behind ensemble techniques.
The key idea is to combine multiple weak learners to create a strong learner by leveraging diversity among models.

# 12. What is a Random Forest Classifier?
A Random Forest Classifier is an ensemble of decision trees where each tree is trained on a bootstrap sample and uses random subsets of features for splitting.

# 13. What are the main types of ensemble techniques?
- Bagging
- Boosting
- Stacking

# 14. What is ensemble learning in machine learning?
Ensemble learning is the process of combining predictions from multiple models to improve performance.

# 15. When should we avoid using ensemble methods?
When interpretability is critical or when computational resources are very limited.

# 16. How does Bagging help in reducing overfitting?
By averaging predictions from multiple models trained on different samples, Bagging reduces variance and overfitting.

# 17. Why is Random Forest better than a single Decision Tree?
It reduces overfitting, improves generalization, and increases accuracy by combining multiple decision trees.

# 18. What is the role of bootstrap sampling in Bagging?
Bootstrap sampling creates multiple datasets from the original by random sampling with replacement, ensuring diversity among models.

# 19. What are some real-world applications of ensemble techniques?
- Fraud detection
- Medical diagnosis
- Credit scoring
- Customer churn prediction

# 20. What is the difference between Bagging and Boosting?
- Bagging: Models are trained independently on different samples.
- Boosting: Models are trained sequentially, with each model focusing on correcting errors of the previous one.


In [1]:
# 21. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)
bag_clf.fit(X_train, y_train)
print("Accuracy:", bag_clf.score(X_test, y_test))

Accuracy: 1.0


In [3]:
# 22. Train a Bagging Regressor using Decision Trees and evaluate using MSE.
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

housing = fetch_california_housing()
X_train_h, X_test_h, y_train_h, y_test_h = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)

bag_reg = BaggingRegressor(DecisionTreeRegressor(), n_estimators=50, random_state=42)
bag_reg.fit(X_train_h, y_train_h)
y_pred_h = bag_reg.predict(X_test_h)
print("MSE:", mean_squared_error(y_test_h, y_pred_h))


MSE: 0.2572988359842641


In [4]:
# 23. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores.
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier

cancer = load_breast_cancer()
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train_c, y_train_c)
print("Feature Importances:", rf_clf.feature_importances_)

Feature Importances: [0.04870337 0.01359088 0.05326975 0.04755501 0.00728533 0.01394433
 0.06800084 0.10620999 0.00377029 0.00388577 0.02013892 0.00472399
 0.01130301 0.02240696 0.00427091 0.00525322 0.00938583 0.00351326
 0.00401842 0.00532146 0.07798688 0.02174901 0.06711483 0.15389236
 0.01064421 0.02026604 0.0318016  0.14466327 0.01012018 0.00521012]


In [5]:
# 24. Train a Random Forest Regressor and compare its performance with a single Decision Tree.
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor

rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
dt_reg = DecisionTreeRegressor(random_state=42)

rf_reg.fit(X_train_h, y_train_h)
dt_reg.fit(X_train_h, y_train_h)

print("Random Forest MSE:", mean_squared_error(y_test_h, rf_reg.predict(X_test_h)))
print("Decision Tree MSE:", mean_squared_error(y_test_h, dt_reg.predict(X_test_h)))


Random Forest MSE: 0.2553684927247781
Decision Tree MSE: 0.495235205629094


In [6]:

# 25. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier.
rf_oob = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
rf_oob.fit(X_train_c, y_train_c)
print("OOB Score:", rf_oob.oob_score_)

OOB Score: 0.9560439560439561


In [7]:
# 26. Train a Bagging Classifier using SVM as a base estimator and print accuracy.
from sklearn.svm import SVC
bag_svm = BaggingClassifier(SVC(), n_estimators=10, random_state=42)
bag_svm.fit(X_train_c, y_train_c)
print("Accuracy:", bag_svm.score(X_test_c, y_test_c))

Accuracy: 0.9473684210526315


In [8]:
# 27. Train a Random Forest Classifier with different numbers of trees and compare accuracy.
for n in [10, 50, 100]:
    rf_var = RandomForestClassifier(n_estimators=n, random_state=42)
    rf_var.fit(X_train_c, y_train_c)
    print(f"n_estimators={n}, Accuracy={rf_var.score(X_test_c, y_test_c)}")

n_estimators=10, Accuracy=0.956140350877193
n_estimators=50, Accuracy=0.9649122807017544
n_estimators=100, Accuracy=0.9649122807017544


In [9]:
# 28. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

bag_lr = BaggingClassifier(LogisticRegression(max_iter=1000), n_estimators=10, random_state=42)
bag_lr.fit(X_train_c, y_train_c)
y_prob_lr = bag_lr.predict_proba(X_test_c)[:, 1]
print("AUC Score:", roc_auc_score(y_test_c, y_prob_lr))

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

AUC Score: 0.9980347199475925


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [10]:
# 29. Train a Random Forest Regressor and analyze feature importance scores.
rf_reg_imp = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg_imp.fit(X_train_h, y_train_h)
print("Feature Importances:", rf_reg_imp.feature_importances_)

Feature Importances: [0.52487148 0.05459322 0.04427185 0.02960631 0.03064978 0.13844281
 0.08893574 0.08862881]


In [11]:
# 30. Train an ensemble model using both Bagging and Random Forest and compare accuracy.
bag_clf_comp = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)
bag_clf_comp.fit(X_train_c, y_train_c)
rf_clf_comp = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf_comp.fit(X_train_c, y_train_c)
print("Bagging Accuracy:", bag_clf_comp.score(X_test_c, y_test_c))
print("Random Forest Accuracy:", rf_clf_comp.score(X_test_c, y_test_c))

Bagging Accuracy: 0.956140350877193
Random Forest Accuracy: 0.9649122807017544
