Task 9: AdaBoost or Gradient Boosting 

● Train an AdaBoostClassifier or GradientBoostingClassifier. 

● Use a suitable dataset. 

● Compare it with Random Forest and Decision Tree in terms of: 

○ Accuracy 
○ F1-score 
○ Training time (optional) 


In [1]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score
from time import time

In [2]:
# Load Titanic dataset
titanic = sns.load_dataset('titanic')
titanic = titanic.drop(columns=['deck', 'embark_town', 'alive', 'who', 'adult_male', 'class'])
titanic = titanic.dropna()


In [3]:
# Encode categorical variables
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
titanic['embarked'] = titanic['embarked'].map({'S': 0, 'C': 1, 'Q': 2})

In [4]:
# Features and target
X = titanic[['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']]
y = titanic['survived']

In [5]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
models = {
    "Decision Tree": DecisionTreeClassifier(max_depth=4, random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, max_depth=4, random_state=42),
    "AdaBoost": AdaBoostClassifier(n_estimators=100, random_state=42),
    "Gradient Boosting": GradientBoostingClassifier(n_estimators=100, max_depth=3, random_state=42)
}

results = []

for name, model in models.items():
    start_time = time()
    model.fit(X_train, y_train)
    end_time = time()
    
    y_pred = model.predict(X_test)
    
    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    duration = end_time - start_time
    
    results.append({
        "Model": name,
        "Accuracy": round(acc, 4),
        "F1-Score": round(f1, 4),
        "Training Time (s)": round(duration, 4)
    })




In [8]:
results_df = pd.DataFrame(results)
print(results_df.sort_values(by="Accuracy", ascending=False))

               Model  Accuracy  F1-Score  Training Time (s)
2           AdaBoost    0.8112    0.7805             0.1382
1      Random Forest    0.7902    0.7222             0.1161
3  Gradient Boosting    0.7622    0.7018             0.1002
0      Decision Tree    0.7063    0.6182             0.0021


Conclusion:
Gradient Boosting often yields the highest accuracy and F1-score, but takes longer to train.

AdaBoost improves over base Decision Trees by focusing on errors.

Random Forest is faster and robust but may underperform on complex patterns compared to boosting.

Decision Tree is the fastest but also the least accurate.