## Q1. How does bagging reduce overfitting in decision trees?

Bagging (Bootstrap Aggregating) reduces overfitting by:

Training multiple decision trees on different bootstrap samples (samples with replacement).

Averaging predictions (for regression) or majority voting (for classification).

This reduces variance, making the model less sensitive to noise or fluctuations in the training data.

Especially useful with high-variance models like decision trees.



## Q2. What are the advantages and disadvantages of using different types of base learners in bagging?

Advantages:

You can tailor the base learner to the problem.

Complex learners like decision trees benefit from bagging due to high variance.

Simpler models are faster and less prone to overfitting.

Disadvantages:

Some models (e.g., KNN, linear regression) may not benefit much from bagging.

Complex models increase training time.

Choosing an inappropriate base learner can hurt performance.

## Q3. How does the choice of base learner affect the bias-variance tradeoff in bagging?

High-variance learners (like unpruned decision trees): Bagging works best by reducing variance.

Low-variance, high-bias learners (like linear regression): Bagging doesn’t help much since bias remains.

Ideal base learners for bagging: those with low bias, high variance.

##  Q4. Can bagging be used for both classification and regression tasks? How does it differ in each case?

Yes, bagging is used for both:

Classification: Use majority voting (e.g., Random Forest Classifier).

Regression: Use averaging of predictions (e.g., Random Forest Regressor).

Both reduce variance and increase stability, but output type differs (class vs. number).

## Q5. What is the role of ensemble size in bagging? How many models should be included in the ensemble?

More models → more stable predictions, but increased computation.

After a certain size (e.g., 100-200 trees), performance gain becomes marginal.

Ideal ensemble size: determined by cross-validation or when error plateaus.

## Q6. Can you provide an example of a real-world application of bagging in machine learning?

Spam Detection (Classification):

Use bagging with decision trees or Random Forest.

Trained on email datasets (e.g., Spambase).

Improves accuracy, reduces false positives.

Other Examples:

Medical diagnosis: Predict diseases using Random Forest.

Credit scoring: Evaluate loan risk using bagged models.

Customer churn prediction: Telecom companies use ensemble models to identify at-risk users.



In [14]:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Random Forest (bagging)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       0.98      0.94      0.96        63
           1       0.96      0.99      0.98       108

    accuracy                           0.97       171
   macro avg       0.97      0.96      0.97       171
weighted avg       0.97      0.97      0.97       171

