Task 9: AdaBoost or Gradient Boosting
● Train an AdaBoostClassifier or GradientBoostingClassifier.
● Use a suitable dataset.
● Compare it with Random Forest and Decision Tree in terms of:
○ Accuracy
○ F1-score
○ Training time (optional)

In [1]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score
import time


In [3]:
               
data = pd.read_csv('titanic.csv')

data.drop(['Cabin', 'Ticket', 'Name', 'PassengerId'], axis=1, inplace=True)

data['Age'].fillna(data['Age'].median(), inplace=True)
data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)

le = LabelEncoder()
data['Sex'] = le.fit_transform(data['Sex'])          
data['Embarked'] = le.fit_transform(data['Embarked'])

X = data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
y = data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['Age'].fillna(data['Age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)


In [4]:

results = {}

# Decision Tree:-
start = time.time()
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
end = time.time()
y_pred_dt = dt.predict(X_test)
results['Decision Tree'] = {
    'accuracy': accuracy_score(y_test, y_pred_dt),
    'f1': f1_score(y_test, y_pred_dt),
    'time': end - start
}

# Random Forest:-
start = time.time()
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
end = time.time()
y_pred_rf = rf.predict(X_test)
results['Random Forest'] = {
    'accuracy': accuracy_score(y_test, y_pred_rf),
    'f1': f1_score(y_test, y_pred_rf),
    'time': end - start
}

# AdaBoost:-
start = time.time()
ab = AdaBoostClassifier(random_state=42)
ab.fit(X_train, y_train)
end = time.time()
y_pred_ab = ab.predict(X_test)
results['AdaBoost'] = {
    'accuracy': accuracy_score(y_test, y_pred_ab),
    'f1': f1_score(y_test, y_pred_ab),
    'time': end - start
}


In [5]:
print("Model Comparison:")
for model in results:
    print(f"\n{model}:")
    print(f"Accuracy: {results[model]['accuracy']:.4f}")
    print(f"F1-Score: {results[model]['f1']:.4f}")
    print(f"Training Time: {results[model]['time']:.4f} seconds")


Model Comparison:

Decision Tree:
Accuracy: 0.7821
F1-Score: 0.7451
Training Time: 0.0163 seconds

Random Forest:
Accuracy: 0.8212
F1-Score: 0.7746
Training Time: 0.6281 seconds

AdaBoost:
Accuracy: 0.7989
F1-Score: 0.7429
Training Time: 0.2598 seconds
