This is a Python translation of the original R-based Titanic survival prediction project. It includes machine learning model training, web applications, and deployment configurations.
python_version/
├── data_cleaning.py          # Data preprocessing (translation of cleaning.R)
├── model_training.py         # ML model training (translation of titanic_training.R)  
├── streamlit_app.py          # Streamlit web app (translation of app.R)
├── fastapi_app.py           # REST API (translation of plumber.R)
├── run_pipeline.py          # Pipeline runner script
├── requirements.txt         # Python dependencies
├── Dockerfile.streamlit     # Docker for Streamlit app
├── Dockerfile.fastapi       # Docker for FastAPI app
├── docker-compose.yml       # Multi-container deployment
└── README.md               # This file
Run the complete pipeline with one command:
cd python_version
python run_pipeline.py --start-streamlitOptions:
- --skip-training: Skip model training (use existing model)
- --start-streamlit: Start Streamlit app after training
- --start-fastapi: Start FastAPI app after training
- --docker: Use Docker containers
- --deploy: Deploy FastAPI to Google Cloud Run
- Install dependencies:
pip install -r requirements.txt- Run data cleaning:
python data_cleaning.py- Train the model:
python model_training.py- 
Start web applications: Streamlit (Interactive UI): streamlit run streamlit_app.py Access at: http://localhost:8501 FastAPI (REST API): uvicorn fastapi_app:app --host 0.0.0.0 --port 8000 Access at: http://localhost:8000 API docs at: http://localhost:8000/docs 
- 
Build and deploy docker API to Google Cloud Run: 
python deploy_google_run.pydocker-compose up --buildThis starts both applications:
- Streamlit: http://localhost:8501
- FastAPI: http://localhost:8000
- Converts R data cleaning logic to pandas
- Feature engineering: title extraction from names
- Categorical encoding and missing value handling
- Random Forest classifier (equivalent to R's ranger)
- Preprocessing pipeline with KNN imputation and scaling
- Feature importance analysis and visualization
- Cross-validation evaluation
- Test set predictions and submission file generation
Streamlit App (streamlit_app.py)
- Interactive web interface (translation of Shiny app)
- Single passenger prediction mode
- CSV batch upload functionality
- Interactive visualizations with Plotly
- Bootstrap-style theming
FastAPI App (fastapi_app.py)
- REST API endpoints (translation of plumber.R)
- Single prediction: POST /predict
- Batch prediction: POST /predict/batch
- Custom JSON input: POST /predict/custom
- Automatic API documentation
- Docker: Separate containers for each app
- Docker Compose: Multi-container orchestration
- Health checks: Built-in application monitoring
- Security: Non-root user execution
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "pclass": "1st",
    "sex": "female", 
    "age": 25,
    "sibsp": 0,
    "parch": 0,
    "fare": 50.0,
    "embarked": "S",
    "title": "Miss"
  }'curl -X POST "http://localhost:8000/predict/batch" \
  -F "file=@passengers.csv"The Python version achieves similar performance to the original R implementation:
- Random Forest classifier with 100 trees
- Cross-validation accuracy: ~82%
- Feature importance analysis available
- Handles missing values with KNN imputation
| Aspect | R Version | Python Version | 
|---|---|---|
| Data Processing | tidyverse, janitor | pandas | 
| ML Framework | tidymodels, ranger | scikit-learn | 
| Web UI | Shiny | Streamlit | 
| API | plumber | FastAPI | 
| Deployment | vetiver, Docker | Docker, docker-compose | 
| Visualization | ggplot2 | matplotlib, plotly | 
- Python 3.12+
- See requirements.txtfor full dependency list
- Docker (optional, for containerized deployment)
- Module import errors: Install requirements with pip install -r requirements.txt
- Model not found: Run python model_training.pyfirst
- Port conflicts: Change ports in docker-compose.yml or command line arguments
- Data not found: Ensure train.csv and test.csv are in the data/ directory
- Model Improvements: Hyperparameter tuning, feature engineering
- Monitoring: Add logging, metrics, and alerting
- Testing: Unit tests and integration tests
- CI/CD: Automated testing and deployment pipelines