# Titanic Survival Prediction Using Multiple Classifiers

This notebook demonstrates how to build and evaluate four different classifiers 
on the Titanic dataset to predict passenger survival:

1. **SVM** 
2. **Multilayer Perceptron (MLP)**

We will compare their performances and record observations.

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')


## 2. Loading the Dataset

In [4]:
df = pd.read_csv(r"C:\Users\sonam\Downloads\titanic.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 3. Preprocessing

In [5]:
# Check missing values
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [6]:
# Fill missing 'Age' with median
df['Age'].fillna(df['Age'].median(), inplace=True)

# Fill missing 'Embarked' with the most frequent value
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

# Drop columns that are not essential or have too many NaNs
df.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1, inplace=True)

# Convert categorical columns into numeric
le_sex = LabelEncoder()
df['Sex'] = le_sex.fit_transform(df['Sex'])  # 0 or 1

le_embarked = LabelEncoder()
df['Embarked'] = le_embarked.fit_transform(df['Embarked'])  # 0,1,2 for S,C,Q (depending on order)

df.head()


Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,1,22.0,1,0,7.25,2
1,1,1,0,38.0,1,0,71.2833,0
2,1,3,0,26.0,0,0,7.925,2
3,1,1,0,35.0,1,0,53.1,2
4,0,3,1,35.0,0,0,8.05,2


## Separate features and target

In [7]:
X = df.drop('Survived', axis=1)
y = df['Survived']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42
)

X_train.shape, X_test.shape

((712, 7), (179, 7))

## 4. Training Models and Evaluating Performance

1. **SVM (Support Vector Classifier)**
2. **Multilayer Perceptron**



In [8]:
svm_model = SVC()
svm_model.fit(X_train, y_train)

y_pred_svm = svm_model.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)

print("SVM Accuracy: {:.4f}".format(accuracy_svm))
print("Classification Report:\n", classification_report(y_test, y_pred_svm))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_svm))


SVM Accuracy: 0.6592
Classification Report:
               precision    recall  f1-score   support

           0       0.64      0.94      0.76       105
           1       0.76      0.26      0.38        74

    accuracy                           0.66       179
   macro avg       0.70      0.60      0.57       179
weighted avg       0.69      0.66      0.61       179

Confusion Matrix:
 [[99  6]
 [55 19]]


In [9]:
mlp_model = MLPClassifier(max_iter=500, random_state=42)
mlp_model.fit(X_train, y_train)

y_pred_mlp = mlp_model.predict(X_test)
accuracy_mlp = accuracy_score(y_test, y_pred_mlp)

print("MLP Accuracy: {:.4f}".format(accuracy_mlp))
print("Classification Report:\n", classification_report(y_test, y_pred_mlp))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_mlp))


MLP Accuracy: 0.7654
Classification Report:
               precision    recall  f1-score   support

           0       0.80      0.80      0.80       105
           1       0.72      0.72      0.72        74

    accuracy                           0.77       179
   macro avg       0.76      0.76      0.76       179
weighted avg       0.77      0.77      0.77       179

Confusion Matrix:
 [[84 21]
 [21 53]]


## 5. Comparing Results

In [10]:
# Compare the four accuracy scores
model_results = {
    'SVM': accuracy_svm,
    'MLP': accuracy_mlp,
}

results_df = pd.DataFrame.from_dict(model_results, orient='index', columns=['Accuracy'])
results_df.sort_values(by='Accuracy', ascending=False, inplace=True)
results_df



Unnamed: 0,Accuracy
MLP,0.765363
SVM,0.659218


# Final Observations and Conclusions

- **Multilayer Perceptron (MLP)** (**0.765363**) outperformed the baseline SVM but did not surpass the ensemble methods, suggesting that further hyperparameter tuning (e.g., adjusting hidden layers, learning rate, or regularization) could improve its performance.  
- **SVM** recorded the lowest accuracy (**0.659218**), indicating that in its default configuration, it may not be well-suited for this dataset. SVMs often require careful tuning of parameters (such as \(C\), kernel choice, and gamma) to reach competitive results.

