Scenario Question: Predicting Titanic Survival
Researchers are studying the Titanic disaster and want to build models that predict whether a
 passenger would survive or not survive based on their information.
- Features used:
- Passenger class (pclass)
- Gender (sex)
- Age (age)
- Number of siblings/spouses aboard (sibsp)
- Number of parents/children aboard (parch)
- Ticket fare (fare)
- Label:
- 1 = Survived
- 0 = Died
The researchers train three different models:
- Logistic Regression
- K-Nearest Neighbors (KNN) with k=5
- Decision Tree with max depth = 4
They then evaluate each model using a classification report (precision, recall, F1-score, accuracy).

❓ Questions for Learners
- Which model performs best at predicting survival, and why?
- How does Logistic Regression differ from Decision Tree in terms of interpretability?
# - Why is scaling applied before training Logistic Regression and KNN, but not strictly needed
 for Decision Trees?
- Looking at the classification report, what do precision and recall mean in the context of survival
 predictions?
- Precision → Of those predicted to survive, how many actually survived?
- Recall → Of all who truly survived, how many were correctly predicted?
- If you were a historian, which model would you trust more to explain survival patterns, and why?

In [3]:
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression      # for logistic classification
from sklearn.neighbors import KNeighborsClassifier       # for KNeighbor classification
from sklearn.tree import DecisionTreeClassifier          # for decision tree classification
from sklearn.metrics import classification_report, f1_score, precision_score,recall_score,accuracy_score

df = sns.load_dataset('titanic')

df['age'] = df['age'].fillna(df['age'].median())
df['sex'] = df['sex'].map({'male':0,'female':1})
X=df[['pclass','sex','age','sibsp','parch','fare']]
y=df['survived']

# SPLIT the data into 80% training and 20% test

X_train,X_test,y_train,y_test = train_test_split(
    X,y,test_size=0.2,random_state=42
)

f1_scores = {}
accuracy_scores = {}
recall_scores = {}
precision_scores = {}

# LOGISTIC REGRESSION
model = LogisticRegression()
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print("LOGISTIC REGRESSION REPORT\n")
print(classification_report(y_test,y_pred))
f1_scores['Logistic'] = f1_score(y_test,y_pred)
accuracy_scores['Logistic'] = accuracy_score(y_test,y_pred)
precision_scores['Logistic'] = precision_score(y_test,y_pred)
recall_scores['Logistic'] = recall_score(y_test,y_pred)

# KNeighbor Classification

k = KNeighborsClassifier(n_neighbors=5)
k.fit(X_train,y_train)

y_pred = k.predict(X_test)

print("KNN REPORT\n")
print(classification_report(y_test,y_pred))
f1_scores['KNN'] = f1_score(y_test,y_pred)
accuracy_scores['KNN'] = accuracy_score(y_test,y_pred)
precision_scores['KNN'] = precision_score(y_test,y_pred)
recall_scores['KNN'] = recall_score(y_test,y_pred)

# Decision Tree

tree = DecisionTreeClassifier(
    max_depth=4,criterion='gini',random_state=42   # to make split always same
)
tree.fit(X_train,y_train)
y_pred = tree.predict(X_test)

print("DECISION TREE REPORT\n")
print(classification_report(y_test,y_pred))
f1_scores['Decision Tree'] = f1_score(y_test,y_pred)
accuracy_scores['Decision Tree'] = accuracy_score(y_test,y_pred)
precision_scores['Decision Tree'] = precision_score(y_test,y_pred)
recall_scores['Decision Tree'] = recall_score(y_test,y_pred)

# COMPARISON

best_model_f1_score = max(f1_scores, key=f1_scores.get)
best_model_accuracy_score = max(accuracy_scores, key=accuracy_scores.get)
best_model_precision_score = max(precision_scores, key=precision_scores.get)
best_model_recall_score = max(recall_scores, key=recall_scores.get)
print(classification_report(y_test,y_pred))

print("Best Model based on F1 Score:", best_model_f1_score)
print("Best Model based on Accuracy Score:", best_model_accuracy_score)
print("Best Model based on Precision Score:", best_model_precision_score)
print("Best Model based on Recall Score:", best_model_recall_score)



LOGISTIC REGRESSION REPORT

              precision    recall  f1-score   support

           0       0.81      0.88      0.84       105
           1       0.80      0.72      0.76        74

    accuracy                           0.81       179
   macro avg       0.81      0.80      0.80       179
weighted avg       0.81      0.81      0.81       179

KNN REPORT

              precision    recall  f1-score   support

           0       0.71      0.83      0.77       105
           1       0.68      0.53      0.60        74

    accuracy                           0.70       179
   macro avg       0.70      0.68      0.68       179
weighted avg       0.70      0.70      0.70       179

DECISION TREE REPORT

              precision    recall  f1-score   support

           0       0.80      0.88      0.84       105
           1       0.80      0.69      0.74        74

    accuracy                           0.80       179
   macro avg       0.80      0.78      0.79       179
weighted avg