🚢 Titanic Survival Prediction — AI Project

Predict which passengers survived the Titanic shipwreck using machine learning — a classic binary classification challenge from Kaggle.

Academic project (2025–2026) supervised by Prof. N. ABOUTABIT

📖 Table of Contents

Overview
Dataset
Project Structure
Features & Preprocessing
Models & Validation
Results
Installation
Usage
Reports
Contributing
License
Acknowledgements

🧠 Overview

This project tackles the Kaggle Titanic competition, one of the most popular introductory machine learning challenges. The goal is to build a predictive model that answers:

"What sorts of people were more likely to survive the Titanic disaster?"

The repository implements a complete ML pipeline:

Exploratory Data Analysis (EDA)
Missing value handling & preprocessing
Feature engineering
Model training + evaluation
Prediction generation and submission export

📊 Dataset

The dataset is provided by Kaggle and contains information about 891 passengers (train) and 418 passengers (test).

File	Description
`data/train.csv`	Training data with survival labels
`data/test.csv`	Test data for prediction submission
`data/gender_submission.csv`	Sample submission file

Key Features

Feature	Description
`Survived`	Target variable (0 = No, 1 = Yes)
`Pclass`	Passenger class (1st, 2nd, 3rd)
`Name`	Passenger name
`Sex`	Gender
`Age`	Age in years
`SibSp`	# of siblings/spouses aboard
`Parch`	# of parents/children aboard
`Ticket`	Ticket number
`Fare`	Passenger fare
`Cabin`	Cabin number
`Embarked`	Port of embarkation (C, Q, S)

📁 Project Structure

This reflects the current repository layout.

Titanic_Machine_Learning/
├── data/
│   ├── train.csv
│   ├── test.csv
│   └── gender_submission.csv
├── src/
│   ├── data/              # loading / preprocessing modules
│   ├── features/          # feature engineering
│   ├── models/            # training / evaluation / optimization
│   ├── utils/             # config + helpers
│   └── visualization/     # EDA plots
├── main.py                # runs the full pipeline end-to-end
├── REPORT.md              # detailed report (markdown)
├── DELIVERABLES.md        # checklist of deliverables
├── Rapport_Project_AI.pdf # full technical report (PDF)
└── requirements.txt

⚙️ Features & Preprocessing

Key engineering steps used in the pipeline include:

Missing value imputation
- Age: imputation strategy based on passenger groups (e.g., by class/sex)
- Embarked: mode imputation
- Fare: median imputation (mainly affects test set)
Feature engineering
- FamilySize = SibSp + Parch + 1
- IsAlone flag
- Title extracted from Name (Mr, Mrs, Miss, Master, Rare, ...)
- HasCabin (Cabin present vs missing)
Encoding
- One-hot encoding for categorical features (Sex, Embarked, Title, etc.)
Scaling
- StandardScaler applied before model training (see main.py)

🤖 Models & Validation

The project trains and evaluates classification models (baseline + optimized) and uses:

Train/Validation split (stratified)
Cross-validation
Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC

For the exact implementation details and metrics, see REPORT.md.

📈 Results

A short summary (update with your Kaggle score/rank):

Item	Value
Best local validation accuracy	~80–82% (varies by model/settings)
ROC-AUC	~0.84–0.86
Kaggle public score	Add your score
Kaggle rank	Add your rank / Top X%

🛠️ Installation

Prerequisites: Python 3.9+

# 1) Clone the repository
git clone https://github.com/msabr/Titanic_Machine_Learning.git
cd Titanic_Machine_Learning

# 2) (Recommended) create a virtual environment
python -m venv venv
source venv/bin/activate       # On Windows: venv\Scripts\activate

# 3) Install dependencies
pip install -r requirements.txt

Note: requirements.txt currently lists core libraries (pandas/numpy/sklearn/matplotlib/seaborn).

🚀 Usage

Run the full pipeline (recommended)

python main.py

This will run the pipeline end-to-end (EDA → preprocessing → training → evaluation → predictions).

Data location

Make sure the Kaggle CSV files exist in:

data/train.csv
data/test.csv

📑 Reports

REPORT.md — detailed methodology, preprocessing, evaluation, and recommendations
Rapport_Project_AI.pdf — full technical report (PDF)
DELIVERABLES.md — deliverables summary / checklist

🤝 Contributing

Contributions are welcome:

Fork the repository
Create a branch (git checkout -b feature/improvement)
Commit changes (git commit -m "Add improvement")
Push (git push origin feature/improvement)
Open a Pull Request

Team: Mohamed SABR, Abdejlil SALMI, Soufaine ZEKAOUI, Anass LAMHADAR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚢 Titanic Survival Prediction — AI Project

📖 Table of Contents

🧠 Overview

📊 Dataset

Key Features

📁 Project Structure

⚙️ Features & Preprocessing

🤖 Models & Validation

📈 Results

🛠️ Installation

🚀 Usage

Run the full pipeline (recommended)

Data location

📑 Reports

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
src		src
DELIVERABLES.md		DELIVERABLES.md
README.md		README.md
Rapport_Project_AI.pdf		Rapport_Project_AI.pdf
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚢 Titanic Survival Prediction — AI Project

📖 Table of Contents

🧠 Overview

📊 Dataset

Key Features

📁 Project Structure

⚙️ Features & Preprocessing

🤖 Models & Validation

📈 Results

🛠️ Installation

🚀 Usage

Run the full pipeline (recommended)

Data location

📑 Reports

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages