End-to-end ML pipeline predicting California house prices with strong results.
RMSE: 0.4392 (original scale)MAE: 0.2793R²: 0.8528Baseline RMSE: 1.1583 →Improvement: 62.1% (target exceeded)
- Install:
pip install -r requirements.txt - Train & evaluate:
python src/test_models.py(writesresults/final_project_report.txt, saves models inmodels/) - Production demo:
python src/ml_pipeline.py(runs a sample single prediction)
src/
data_loader.py # Load/save dataset
eda_analysis.py # EDA and plots
data_preprocessor.py # Feature engineering, scaling, outliers, target log
model_trainer.py # Model training, CV, tuning, report
ml_pipeline.py # Prediction pipeline (load preprocessor/model, predict)
test_preprocessing.py # Preprocessing sanity check
test_models.py # Full training + final report
results/
final_project_report.txt, plots
models/
preprocessor.pkl, best_model.pkl, others
- Best model: XGBoost (top CV RMSE)
- Consistent preprocessing: saved preprocessor ensures identical train/inference transforms
- Python 3.8+
- See
requirements.txtfor exact package versions