Predict which passengers survived the Titanic shipwreck using machine learning — a classic binary classification challenge from Kaggle.
Academic project (2025–2026) supervised by Prof. N. ABOUTABIT
- Overview
- Dataset
- Project Structure
- Features & Preprocessing
- Models & Validation
- Results
- Installation
- Usage
- Reports
- Contributing
- License
- Acknowledgements
This project tackles the Kaggle Titanic competition, one of the most popular introductory machine learning challenges. The goal is to build a predictive model that answers:
"What sorts of people were more likely to survive the Titanic disaster?"
The repository implements a complete ML pipeline:
- Exploratory Data Analysis (EDA)
- Missing value handling & preprocessing
- Feature engineering
- Model training + evaluation
- Prediction generation and submission export
The dataset is provided by Kaggle and contains information about 891 passengers (train) and 418 passengers (test).
| File | Description |
|---|---|
data/train.csv |
Training data with survival labels |
data/test.csv |
Test data for prediction submission |
data/gender_submission.csv |
Sample submission file |
| Feature | Description |
|---|---|
Survived |
Target variable (0 = No, 1 = Yes) |
Pclass |
Passenger class (1st, 2nd, 3rd) |
Name |
Passenger name |
Sex |
Gender |
Age |
Age in years |
SibSp |
# of siblings/spouses aboard |
Parch |
# of parents/children aboard |
Ticket |
Ticket number |
Fare |
Passenger fare |
Cabin |
Cabin number |
Embarked |
Port of embarkation (C, Q, S) |
This reflects the current repository layout.
Titanic_Machine_Learning/
├── data/
│ ├── train.csv
│ ├── test.csv
│ └── gender_submission.csv
├── src/
│ ├── data/ # loading / preprocessing modules
│ ├── features/ # feature engineering
│ ├── models/ # training / evaluation / optimization
│ ├── utils/ # config + helpers
│ └── visualization/ # EDA plots
├── main.py # runs the full pipeline end-to-end
├── REPORT.md # detailed report (markdown)
├── DELIVERABLES.md # checklist of deliverables
├── Rapport_Project_AI.pdf # full technical report (PDF)
└── requirements.txt
Key engineering steps used in the pipeline include:
-
Missing value imputation
Age: imputation strategy based on passenger groups (e.g., by class/sex)Embarked: mode imputationFare: median imputation (mainly affects test set)
-
Feature engineering
FamilySize = SibSp + Parch + 1IsAloneflagTitleextracted fromName(Mr, Mrs, Miss, Master, Rare, ...)HasCabin(Cabin present vs missing)
-
Encoding
- One-hot encoding for categorical features (Sex, Embarked, Title, etc.)
-
Scaling
StandardScalerapplied before model training (seemain.py)
The project trains and evaluates classification models (baseline + optimized) and uses:
- Train/Validation split (stratified)
- Cross-validation
- Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC
For the exact implementation details and metrics, see
REPORT.md.
A short summary (update with your Kaggle score/rank):
| Item | Value |
|---|---|
| Best local validation accuracy | ~80–82% (varies by model/settings) |
| ROC-AUC | ~0.84–0.86 |
| Kaggle public score | Add your score |
| Kaggle rank | Add your rank / Top X% |
Prerequisites: Python 3.9+
# 1) Clone the repository
git clone https://github.com/msabr/Titanic_Machine_Learning.git
cd Titanic_Machine_Learning
# 2) (Recommended) create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3) Install dependencies
pip install -r requirements.txtNote:
requirements.txtcurrently lists core libraries (pandas/numpy/sklearn/matplotlib/seaborn).
python main.pyThis will run the pipeline end-to-end (EDA → preprocessing → training → evaluation → predictions).
Make sure the Kaggle CSV files exist in:
data/train.csv
data/test.csv
REPORT.md— detailed methodology, preprocessing, evaluation, and recommendationsRapport_Project_AI.pdf— full technical report (PDF)DELIVERABLES.md— deliverables summary / checklist
Contributions are welcome:
- Fork the repository
- Create a branch (
git checkout -b feature/improvement) - Commit changes (
git commit -m "Add improvement") - Push (
git push origin feature/improvement) - Open a Pull Request
- Team: Mohamed SABR, Abdejlil SALMI, Soufaine ZEKAOUI, Anass LAMHADAR
