# 🚢 Titanic Survival Prediction  

This project is based on the Kaggle **Titanic: Machine Learning from Disaster** challenge.  
The goal is to build a model that predicts whether a passenger survived the Titanic shipwreck, using features such as **Age, Sex, Passenger Class, and Family Size**.  

### 🔎 Key Highlights
- Data preprocessing and handling missing values (`Age`, `Embarked`, `Cabin`).  
- Encoding categorical variables (`Sex`, `Embarked`, `Pclass`).  
- Exploratory Data Analysis (EDA) to find survival patterns.  
- Training a **Logistic Regression classifier**.  
- Evaluating the model using accuracy, confusion matrix, and classification report.  

### 🎯 Objective
To develop a baseline machine learning pipeline that can generalize well and predict passenger survival with high accuracy.


## loading the data 

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 


df=pd.read_csv('titanic/train.csv' )

df=df.drop("PassengerId", axis=1 )

df.head(3)

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


## data preprocessing and analysis 

In [None]:
## data preprocessing stage 

## encoding of categorical features 
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Select categorical features
cat_features = ["Sex", "Embarked", "Pclass"]

# One-hot encoding
nominal_dummies = pd.get_dummies(df[cat_features], drop_first=True)
df = pd.concat([df, nominal_dummies], axis=1)
df.drop(cat_features, axis=1, inplace=True)

# Handle missing values
for col in df.columns:
    perc_of_nulls = df[col].isnull().mean()
    if perc_of_nulls > 0.75:
        df.drop(col, axis=1, inplace=True)
    else:
        if df[col].dtype == 'object':
            df[col] = df[col].fillna(df[col].mode()[0])
        else:
            df[col] = df[col].fillna(df[col].mean())

# Remove duplicate values
df = df.drop_duplicates()

# Scale Fare
scaler = MinMaxScaler()
df["Fare"] = scaler.fit_transform(df[["Fare"]])




