# Titanic Survival Prediction
### 📌 Project Type: Classification
**Goal**: Predict whether a passenger survived the Titanic disaster based on features like age, sex, class, etc.

---

## 📚 Importing Libraries
We use pandas for data manipulation, seaborn and matplotlib for visualization, and sklearn for machine learning.


In [2]:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

---

## 🗂️ Loading the Dataset
Let's load the training dataset and understand its structure.

In [4]:
#loading data and evaluating
df = pd.read_csv('train.csv')
print(df.head())
print("\n")
print(df.describe())
print("\n")
print(df.info())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  




---

## 🧹 Data Cleaning
We drop useless columns like PassengerId, Name, Ticket, and Cabin, and fill missing values.

- **Cabin** has too many missing values.
- **Name** and **Ticket** are not useful for prediction.


In [6]:
# Drop useless columns
df.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1, inplace=True)

> **Interpretation**:
- `Sex` is encoded as string; we’ll convert it soon for modeling.
- `Embarked` is also encoded as character strin; let's convert it too

In [8]:
#encoding catagorical features
df['Sex'] = df['Sex'].map({'male':0, 'female':1})
df['Embarked'] = df['Embarked'].fillna('S')
df['Embarked'] = df['Embarked'].map({'S':0, 'C':1, 'B':2, 'Q':3})

In [9]:
#filling missing values
df['Age'] = df['Age'].fillna(df['Age'].median())

# Double check
print(df.isnull().sum())  # should show 0 everywhere

Survived    0
Pclass      0
Sex         0
Age         0
SibSp       0
Parch       0
Fare        0
Embarked    0
dtype: int64


In [12]:
X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [13]:
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

In [14]:
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

✅ Accuracy: 0.7988826815642458

📊 Confusion Matrix:
 [[89 16]
 [20 54]]

🧾 Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.85      0.83       105
           1       0.77      0.73      0.75        74

    accuracy                           0.80       179
   macro avg       0.79      0.79      0.79       179
weighted avg       0.80      0.80      0.80       179

