Titanic Survival Predictor - Python Version

This is a Python translation of the original R-based Titanic survival prediction project. It includes machine learning model training, web applications, and deployment configurations.

Project Structure

python_version/
├── data_cleaning.py          # Data preprocessing (translation of cleaning.R)
├── model_training.py         # ML model training (translation of titanic_training.R)  
├── streamlit_app.py          # Streamlit web app (translation of app.R)
├── fastapi_app.py           # REST API (translation of plumber.R)
├── run_pipeline.py          # Pipeline runner script
├── requirements.txt         # Python dependencies
├── Dockerfile.streamlit     # Docker for Streamlit app
├── Dockerfile.fastapi       # Docker for FastAPI app
├── docker-compose.yml       # Multi-container deployment
└── README.md               # This file

Quick Start

Option 1: Automated Pipeline

Run the complete pipeline with one command:

cd python_version
python run_pipeline.py --start-streamlit

Options:

--skip-training: Skip model training (use existing model)
--start-streamlit: Start Streamlit app after training
--start-fastapi: Start FastAPI app after training
--docker: Use Docker containers
--deploy: Deploy FastAPI to Google Cloud Run

Option 2: Manual Steps

Install dependencies:

pip install -r requirements.txt

Run data cleaning:

python data_cleaning.py

Train the model:

python model_training.py

Start web applications:

Streamlit (Interactive UI):
```
streamlit run streamlit_app.py
```
Access at: http://localhost:8501

FastAPI (REST API):
```
uvicorn fastapi_app:app --host 0.0.0.0 --port 8000
```
Access at: http://localhost:8000 API docs at: http://localhost:8000/docs
Build and deploy docker API to Google Cloud Run:

python deploy_google_run.py

Option 3: Docker Deployment Local

docker-compose up --build

This starts both applications:

Streamlit: http://localhost:8501
FastAPI: http://localhost:8000

Features

Data Processing (`data_cleaning.py`)

Converts R data cleaning logic to pandas
Feature engineering: title extraction from names
Categorical encoding and missing value handling

Machine Learning (`model_training.py`)

Random Forest classifier (equivalent to R's ranger)
Preprocessing pipeline with KNN imputation and scaling
Feature importance analysis and visualization
Cross-validation evaluation
Test set predictions and submission file generation

Web Applications

Streamlit App (streamlit_app.py)

Interactive web interface (translation of Shiny app)
Single passenger prediction mode
CSV batch upload functionality
Interactive visualizations with Plotly
Bootstrap-style theming

FastAPI App (fastapi_app.py)

REST API endpoints (translation of plumber.R)
Single prediction: POST /predict
Batch prediction: POST /predict/batch
Custom JSON input: POST /predict/custom
Automatic API documentation

Deployment - local

Docker: Separate containers for each app
Docker Compose: Multi-container orchestration
Health checks: Built-in application monitoring
Security: Non-root user execution

API Usage Examples

Single Prediction

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "pclass": "1st",
    "sex": "female", 
    "age": 25,
    "sibsp": 0,
    "parch": 0,
    "fare": 50.0,
    "embarked": "S",
    "title": "Miss"
  }'

Batch Prediction

curl -X POST "http://localhost:8000/predict/batch" \
  -F "file=@passengers.csv"

Model Performance

The Python version achieves similar performance to the original R implementation:

Random Forest classifier with 100 trees
Cross-validation accuracy: ~82%
Feature importance analysis available
Handles missing values with KNN imputation

Differences from R Version

Aspect	R Version	Python Version
Data Processing	tidyverse, janitor	pandas
ML Framework	tidymodels, ranger	scikit-learn
Web UI	Shiny	Streamlit
API	plumber	FastAPI
Deployment	vetiver, Docker	Docker, docker-compose
Visualization	ggplot2	matplotlib, plotly

Requirements

Python 3.12+
See requirements.txt for full dependency list
Docker (optional, for containerized deployment)

Troubleshooting

Module import errors: Install requirements with pip install -r requirements.txt
Model not found: Run python model_training.py first
Port conflicts: Change ports in docker-compose.yml or command line arguments
Data not found: Ensure train.csv and test.csv are in the data/ directory

Next Steps

Model Improvements: Hyperparameter tuning, feature engineering
Monitoring: Add logging, metrics, and alerting
Testing: Unit tests and integration tests
CI/CD: Automated testing and deployment pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
model		model
.gitignore		.gitignore
.python-version		.python-version
Dockerfile.fastapi-gcs		Dockerfile.fastapi-gcs
Dockerfile.streamlit		Dockerfile.streamlit
README.md		README.md
data_cleaning.py		data_cleaning.py
deploy_api_gcs.py		deploy_api_gcs.py
docker-compose.yml		docker-compose.yml
eda.ipynb		eda.ipynb
fastapi_app.py		fastapi_app.py
feature_importance.csv		feature_importance.csv
feature_importance_ct.png		feature_importance_ct.png
model_training.py		model_training.py
openapi.json		openapi.json
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_gcloud.txt		requirements_gcloud.txt
run_pipeline.py		run_pipeline.py
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic Survival Predictor - Python Version

Project Structure

Quick Start

Option 1: Automated Pipeline

Option 2: Manual Steps

Option 3: Docker Deployment Local

Features

Data Processing (`data_cleaning.py`)

Machine Learning (`model_training.py`)

Web Applications

Deployment - local

API Usage Examples

Single Prediction

Batch Prediction

Model Performance

Differences from R Version

Requirements

Troubleshooting

Next Steps

About

Uh oh!

Releases

Packages

Languages

polip/titanic-python

Folders and files

Latest commit

History

Repository files navigation

Titanic Survival Predictor - Python Version

Project Structure

Quick Start

Option 1: Automated Pipeline

Option 2: Manual Steps

Option 3: Docker Deployment Local

Features

Data Processing (data_cleaning.py)

Machine Learning (model_training.py)

Web Applications

Deployment - local

API Usage Examples

Single Prediction

Batch Prediction

Model Performance

Differences from R Version

Requirements

Troubleshooting

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Processing (`data_cleaning.py`)

Machine Learning (`model_training.py`)

Packages