Skip to content

gabayae/data-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yaé Gaba — Data Science Portfolio

tests Python License Projects Case studies Bilingual

13 end-to-end data-science projects on real public data, 5 long-form bilingual case studies (EN/FR), one Streamlit twin.

Domains: forecasting · GLMs / actuarial pricing · survival analysis · hierarchical reconciliation · stochastic optimization / RL · experimental design.

Live site: gabayae.github.io/data-portfolio · Version française

Each project follows the same pipeline: business question → data & EDA → modeling → validation → deployment → business outcome.


Projects

# Project Domain Real data source Headline result
01 Health Supply-Chain Demand Forecasting Health logistics USAID PEPFAR SCMS SARIMA WAPE 0.94
02 Hourly Load Forecasting — PJM (case study) Energy PJM Hourly Consumption GBM MAPE 6.2% (UC PI 99%)
03 Insurance Claims Frequency & Severity Insurance freMTPL2 freq + sev Tweedie Gini 0.310
04 Hierarchical Retail Demand Forecasting Retail M5 sample MinT-OLS reconciliation
05 Stochastic Optimization for Resource Allocation Health / Ops Kenya KMPDC + SHA Q-learning +122% vs manual
06 River Flow Forecasting — Lake Kariba Hydropower Kariba Reservoir GBM RMSE 7 cm
07 Solar Energy Forecasting — Nairobi Energy NASA POWER API GBM MAPE 9.4%
08 Customer Survival Analysis — MTN Nigeria Telecom MTN Nigeria Churn Cox PH + Weibull AFT
09 Flight Demand & Price — Southern Africa Aviation SA Flight Prices SARIMA MAPE 2.0%
10 Property Valuation — Lagos Real estate Lagos Housing GBM R² 0.57
11 Geospatial Farm-Output Forecasting Agriculture African Farm Households GBM R² 0.66
12 Churn Classification — MTN Nigeria Telecom MTN Nigeria Churn XGBoost AUC 0.71
13 A/B Test Framework Marketing Marketing Campaign A/B ANOVA F=21.95, p < 1e-9

Case studies

Five projects have a long-form narrative deep-dive — business context, methodology, results, trade-offs, deployment sketch. Each is fully bilingual (EN/FR) with reading time, prev/next nav, breadcrumb schema, and a per-case-study OG card.

Case study Family Honest finding
PJM hourly load · FR Time-series UC state-space underperforms SARIMA on point error (19.3% vs 14.5% MAPE), but SARIMA's 99% PI coverage is what procurement actually uses.
Lake Kariba river flow · FR Time-series + exog 7 cm RMSE on a 7 m operational band; turbine discharge as exog is a 2× point-error improvement.
Nairobi solar irradiance · FR Time-series Monthly climatology (12.3% MAPE) beats SARIMA (13.8%) — at this latitude the seasonal envelope is most of the predictability.
freMTPL2 pricing · FR GLMs / actuarial Tweedie wins on segmentation (Gini 0.310); Poisson + Gamma wins on top-decile lift (2.66×). The choice is actuarial, not technical.
Kenya mobile clinics · FR Stochastic optimization Q-learning +122% over manual, but the constraint formulation matters more than the algorithm — capped LP with explicit equity at +39% is more defensible.

Browse all five at /case-studies/ (or /fr/case-studies/).


Tech stack

  • Languages: Python, SQL
  • Time-series: SARIMA, UnobservedComponents (state-space, Kalman filter), Prophet, ETS / Holt-Winters, ARIMA
  • ML: scikit-learn (GradientBoostingRegressor, RandomForest, LogisticRegression), XGBoost, LightGBM
  • GLM / generalized: statsmodels (Poisson, Gamma, Tweedie, Cox PH, Weibull AFT, OLS)
  • Survival: lifelines (KaplanMeier, CoxPHFitter, WeibullAFTFitter)
  • Hierarchical reconciliation: MinT-OLS
  • Stochastic optimization: Markov Decision Processes, Q-learning, linear programming (scipy.optimize.linprog)
  • Experimentation: ANOVA, Welch t-tests with Bonferroni correction, Bayesian A/B (posterior simulation)
  • Data engineering: pandas, NumPy, Kaggle CLI, NASA POWER API
  • Visualization: matplotlib, seaborn

Domain coverage

  • Energy (2): PJM hourly load, Nairobi solar
  • Health logistics & operations (2): USAID supply chain, Kenya facility scheduling
  • Telecom (2): MTN customer survival, MTN churn classification
  • Time-series & forecasting infrastructure (2): hierarchical reconciliation, river flow
  • Insurance (1), Aviation (1), Real estate (1), Agriculture (1), Experimentation (1)

How to reproduce

git clone https://github.com/gabayae/data-portfolio.git
cd data-portfolio
make install            # install deps from pyproject.toml
make data               # download all datasets (Kaggle CLI required for some)
make notebooks          # execute every notebook end-to-end

For a single project:

cd <NN-project-name>
pip install -r requirements.txt
python download_data.py
jupyter nbconvert --to notebook --execute notebook.ipynb

Each notebook is self-contained — open it on GitHub and the rendered plots and tables are visible without running anything.

Streamlit twin

portfolio_app.py is an interactive twin of index.html — same content, same color palette, with project filtering and clickable cards. Run it locally:

pip install -r requirements-app.txt
streamlit run portfolio_app.py

To deploy on Streamlit Community Cloud:

  1. share.streamlit.ioNew app
  2. Repository: gabayae/data-portfolio · Branch: main · Main file path: portfolio_app.py
  3. Advanced settings → Python requirements file: requirements-app.txt
  4. Deploy. Free, public, takes ~1 minute.

portfolio_config.py is the single source of truth for project metadata — edit there and both surfaces update.

Profile photo

Drop a square headshot at profile.jpg in the repo root and the hero avatar on index.html automatically swaps the "YG" monogram for the photo (no code change — handled by onerror="this.remove()" on the <img>). Recommended ≥240×240 px; JPG / PNG / WebP all work.

Contact

License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors