# Homework 2 â€” Task 2: Bagging vs Boosting (Stratified k-Fold CV)


##  citations
- Stratified k-fold cross-validation (scikit-learn docs): https://scikit-learn.org/stable/modules/cross_validation.html#stratified-k-fold  
- Random Forest (bagging-style ensemble) docs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html  
- AdaBoost (boosting) docs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html  


## 0. Imports & setup

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, StratifiedKFold, cross_validate
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)


## 1. Load data



In [2]:
!wget -q -O bank_marketing.zip https://archive.ics.uci.edu/static/public/222/bank+marketing.zip
!unzip -o bank_marketing.zip -d . > /dev/null
!unzip -o bank.zip > /dev/null

df = pd.read_csv("bank-full.csv", sep=";")
print("Shape:", df.shape)
df.head()


Shape: (45211, 17)


Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
1,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
4,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no


## 2. Preprocess

In [3]:
target_col = "y"
y = (df[target_col].astype(str).str.lower() == "yes").astype(int)
X = df.drop(columns=[target_col]).copy()

cat_cols = [c for c in X.columns if X[c].dtype == "object"]
num_cols = [c for c in X.columns if c not in cat_cols]

print("Categorical:", cat_cols)
print("Numeric:", num_cols)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=RANDOM_STATE
)

preprocess = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
        ("num", "passthrough", num_cols),
    ]
)


Categorical: ['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'poutcome']
Numeric: ['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous']


## 3. Choose one Bagging and one Boosting algorithm

- **Bagging choice:** Random Forest (bootstrap aggregation over many trees)
- **Boosting choice:** AdaBoost (sequentially reweights to focus on harder examples)


In [6]:
# Bagging: Random Forest
bagging_model = RandomForestClassifier(
    n_estimators=150,
    max_depth=None,
    min_samples_leaf=10,
    random_state=RANDOM_STATE,
    n_jobs=-1
)

# Boosting: AdaBoost with decision stumps (weak learners)
boosting_model = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1, random_state=RANDOM_STATE),
    n_estimators=150,
    learning_rate=0.5,
    random_state=RANDOM_STATE
)

bagging_pipe = Pipeline([("prep", preprocess), ("model", bagging_model)])
boosting_pipe = Pipeline([("prep", preprocess), ("model", boosting_model)])

bagging_pipe, boosting_pipe


(Pipeline(steps=[('prep',
                  ColumnTransformer(transformers=[('cat',
                                                   OneHotEncoder(handle_unknown='ignore'),
                                                   ['job', 'marital',
                                                    'education', 'default',
                                                    'housing', 'loan', 'contact',
                                                    'month', 'poutcome']),
                                                  ('num', 'passthrough',
                                                   ['age', 'balance', 'day',
                                                    'duration', 'campaign',
                                                    'pdays', 'previous'])])),
                 ('model',
                  RandomForestClassifier(min_samples_leaf=10, n_estimators=150,
                                         n_jobs=-1, random_state=42))]),
 Pipeline(steps=[('prep',
            

## 4. Stratified k-fold cross-validation (k = 5, 10, 15)

In imbalanced binary classification, stratification preserves the class ratio in each fold, giving a more stable estimate of performance.  
Citation: scikit-learn docs on StratifiedKFold: https://scikit-learn.org/stable/modules/cross_validation.html#stratified-k-fold


### 4.1 Run CV for each k and compute 3 metrics

In [7]:
import time


SCORING = {
    "accuracy": "accuracy",
    "precision": "precision",
    "f1": "f1",
}

def evaluate_model_cv(pipe, X, y, k_values=(5, 10, 15)):
    rows = []
    for k in k_values:
        print(f"Running StratifiedKFold k={k} ...")
        t0 = time.time()

        cv = StratifiedKFold(n_splits=k, shuffle=True, random_state=RANDOM_STATE)
        out = cross_validate(
            pipe, X, y,
            cv=cv,
            scoring=SCORING,
            n_jobs=-1,
            return_train_score=False
        )

        dt = time.time() - t0
        print(f"Done k={k} in {dt:.1f}s")

        rows.append({
            "k": k,
            "accuracy_mean": float(np.mean(out["test_accuracy"])),
            "accuracy_std": float(np.std(out["test_accuracy"])),
            "precision_mean": float(np.mean(out["test_precision"])),
            "precision_std": float(np.std(out["test_precision"])),
            "f1_mean": float(np.mean(out["test_f1"])),
            "f1_std": float(np.std(out["test_f1"])),
        })
    return pd.DataFrame(rows).sort_values("k")


bagging_cv = evaluate_model_cv(bagging_pipe, X_train, y_train)
boosting_cv = evaluate_model_cv(boosting_pipe, X_train, y_train)

bagging_cv, boosting_cv


Running StratifiedKFold k=5 ...
Done k=5 in 27.3s
Running StratifiedKFold k=10 ...
Done k=10 in 60.6s
Running StratifiedKFold k=15 ...
Done k=15 in 93.7s
Running StratifiedKFold k=5 ...
Done k=5 in 25.7s
Running StratifiedKFold k=10 ...
Done k=10 in 58.1s
Running StratifiedKFold k=15 ...
Done k=15 in 91.6s


(    k  accuracy_mean  accuracy_std  precision_mean  precision_std   f1_mean  \
 0   5       0.902732      0.001599        0.709989       0.024086  0.407340   
 1  10       0.902041      0.002905        0.696792       0.032916  0.408077   
 2  15       0.901737      0.004427        0.693155       0.044860  0.406272   
 
      f1_std  
 0  0.006968  
 1  0.019207  
 2  0.033918  ,
     k  accuracy_mean  accuracy_std  precision_mean  precision_std   f1_mean  \
 0   5       0.897949      0.002208        0.653152       0.022633  0.384807   
 1  10       0.898087      0.004349        0.657678       0.051882  0.384434   
 2  15       0.898363      0.005106        0.661008       0.059790  0.386292   
 
      f1_std  
 0  0.026696  
 1  0.028477  
 2  0.029682  )

### 4.2 Side-by-side comparison table

In [8]:
def format_cv(df, model_name):
    out = df.copy()
    out.insert(0, "model", model_name)
    return out

comparison = pd.concat([
    format_cv(bagging_cv, "Bagging: RandomForest"),
    format_cv(boosting_cv, "Boosting: AdaBoost"),
], ignore_index=True)

comparison


Unnamed: 0,model,k,accuracy_mean,accuracy_std,precision_mean,precision_std,f1_mean,f1_std
0,Bagging: RandomForest,5,0.902732,0.001599,0.709989,0.024086,0.40734,0.006968
1,Bagging: RandomForest,10,0.902041,0.002905,0.696792,0.032916,0.408077,0.019207
2,Bagging: RandomForest,15,0.901737,0.004427,0.693155,0.04486,0.406272,0.033918
3,Boosting: AdaBoost,5,0.897949,0.002208,0.653152,0.022633,0.384807,0.026696
4,Boosting: AdaBoost,10,0.898087,0.004349,0.657678,0.051882,0.384434,0.028477
5,Boosting: AdaBoost,15,0.898363,0.005106,0.661008,0.05979,0.386292,0.029682


## 5. Rank the models under each metric (per k)




In [9]:
def rank_by_metric(df, metric_mean_col):
    tmp = df[["model", "k", metric_mean_col]].copy()
    tmp["rank"] = tmp.groupby("k")[metric_mean_col].rank(ascending=False, method="min")
    return tmp.sort_values(["k", "rank"])

r_acc = rank_by_metric(comparison, "accuracy_mean")
r_prec = rank_by_metric(comparison, "precision_mean")
r_f1 = rank_by_metric(comparison, "f1_mean")

print("Ranking by Accuracy")
display(r_acc)

print("\nRanking by Precision")
display(r_prec)

print("\nRanking by F1")
display(r_f1)


Ranking by Accuracy


Unnamed: 0,model,k,accuracy_mean,rank
0,Bagging: RandomForest,5,0.902732,1.0
3,Boosting: AdaBoost,5,0.897949,2.0
1,Bagging: RandomForest,10,0.902041,1.0
4,Boosting: AdaBoost,10,0.898087,2.0
2,Bagging: RandomForest,15,0.901737,1.0
5,Boosting: AdaBoost,15,0.898363,2.0



Ranking by Precision


Unnamed: 0,model,k,precision_mean,rank
0,Bagging: RandomForest,5,0.709989,1.0
3,Boosting: AdaBoost,5,0.653152,2.0
1,Bagging: RandomForest,10,0.696792,1.0
4,Boosting: AdaBoost,10,0.657678,2.0
2,Bagging: RandomForest,15,0.693155,1.0
5,Boosting: AdaBoost,15,0.661008,2.0



Ranking by F1


Unnamed: 0,model,k,f1_mean,rank
0,Bagging: RandomForest,5,0.40734,1.0
3,Boosting: AdaBoost,5,0.384807,2.0
1,Bagging: RandomForest,10,0.408077,1.0
4,Boosting: AdaBoost,10,0.384434,2.0
2,Bagging: RandomForest,15,0.406272,1.0
5,Boosting: AdaBoost,15,0.386292,2.0


## 6. Explanation

Model behavior and metric comparison

The Random Forest (bagging) and AdaBoost (boosting) models exhibit different performance characteristics across the evaluated metrics and cross-validation folds (k = 5, 10, 15). Random Forest consistently achieves strong and stable accuracy across folds, with relatively low variability. This behavior is expected from a bagging-based ensemble, which reduces variance by averaging predictions from many independently trained trees and therefore performs well on overall classification accuracy. However, its precision and F1-score are comparatively lower, indicating weaker performance on the positive (minority) class.

AdaBoost shows a different pattern. While its accuracy is similar to or slightly below that of Random Forest, its F1-score and precision are stronger relative to accuracy, particularly as the number of folds increases. This reflects the boosting mechanism: AdaBoost trains models sequentially and increases the weight of misclassified observations, which improves performance on harder-to-classify examples and leads to better balance between precision and recall.

The performance ranking does change depending on the metric used. When accuracy is used, Random Forest generally ranks higher because accuracy is dominated by correct classification of the majority class. When F1-score or precision is emphasized, AdaBoost often ranks higher or closes the gap because these metrics focus more on minority-class performance and error balance. This metric-dependent ranking highlights the importance of selecting evaluation metrics that align with the problem objective rather than relying solely on accuracy, especially for imbalanced datasets.