GitHub - guydev42/predictive-maintenance: <div align=center>

Overview • Key results • Architecture • Quickstart • Dataset • Methodology

Overview

A predictive maintenance system that forecasts industrial equipment failures from sensor data using gradient boosting, survival analysis for remaining useful life estimation, and SHAP-based explanations.

Unplanned equipment downtime costs industrial operations thousands of dollars per hour. This project builds a failure prediction pipeline that ingests sensor readings (temperature, vibration, pressure, RPM, power consumption) from 50 machines and predicts whether a failure will occur within the next 7 days. Four classifiers are trained, a Weibull survival model estimates remaining useful life, and a cost-based threshold optimizer balances unplanned downtime ($15K) against preventive maintenance ($1.5K).

Problem   →  Predicting equipment failures before they happen using sensor data
Solution  →  XGBoost with survival analysis, SHAP explanations, and cost-based threshold tuning
Impact    →  AUC 0.94, catches 91% of failures with optimized maintenance scheduling

Key results

Metric	Value
AUC-ROC	0.94
Recall (failures caught)	91%
Precision	78%
PR-AUC	0.76
Best model	XGBoost

Architecture

┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│  Sensor data     │───▶│  Feature          │───▶│  Rolling         │
│  generation      │    │  extraction       │    │  aggregations    │
└──────────────────┘    └──────────────────┘    └────────┬─────────┘
                                                         │
                          ┌──────────────────────────────┘
                          ▼
              ┌──────────────────────┐    ┌──────────────────────┐
              │  Model training      │───▶│  Survival analysis   │
              │  (4 classifiers)     │    │  (Weibull AFT / RUL) │
              └──────────────────────┘    └──────────┬───────────┘
                                                     │
                          ┌──────────────────────────┘
                          ▼
              ┌──────────────────────┐    ┌──────────────────────┐
              │  SHAP explanations   │───▶│  Maintenance         │
              │  + threshold tuning  │    │  dashboard           │
              └──────────────────────┘    └──────────────────────┘

Project structure

project_21_predictive_maintenance/
├── data/
│   ├── sensor_readings.csv              # Sensor dataset
│   └── generate_data.py                 # Synthetic data generator
├── src/
│   ├── __init__.py
│   ├── data_loader.py                   # Data generation and loading
│   └── model.py                         # Training, evaluation, SHAP, RUL
├── notebooks/
│   ├── 01_eda.ipynb                     # Exploratory data analysis
│   ├── 02_feature_engineering.ipynb     # Rolling features, interactions
│   ├── 03_modeling.ipynb                # Model training and CV
│   └── 04_evaluation.ipynb             # ROC, SHAP, cost analysis
├── app.py                               # Streamlit dashboard
├── requirements.txt
└── README.md

Quickstart

# Clone and navigate
git clone https://github.com/guydev42/calgary-data-portfolio.git
cd calgary-data-portfolio/project_21_predictive_maintenance

# Install dependencies
pip install -r requirements.txt

# Generate sensor data
python data/generate_data.py

# Launch dashboard
streamlit run app.py

Dataset

Property	Details
Source	Synthetic industrial sensor data
Readings	15,000
Machines	50
Failure rate	~8% (1,200 pre-failure readings)
Features	11 (temperature, vibration, pressure, rpm, power, rolling stats)
Target	failure_within_7days (binary)

Tech stack

Methodology

Sensor feature engineering

Rolling 24h mean temperature and standard deviation of vibration
Temperature-pressure ratio as an interaction feature
Machine-level attributes: age, operating hours, maintenance history

Model training

Four classifiers: Logistic Regression, Random Forest, XGBoost, Gradient Boosting
5-fold StratifiedKFold cross-validation
Class imbalance handled via class_weight and scale_pos_weight
Metrics: AUC-ROC, precision, recall, F1, PR-AUC

Survival analysis

Weibull Accelerated Failure Time (AFT) model from lifelines
Covariates: machine age, mean temperature, mean vibration
Outputs remaining useful life (RUL) estimates per machine

SHAP explainability

TreeExplainer for gradient boosting models
Global feature importance via mean absolute SHAP values
Waterfall plots for individual sensor reading explanations

Cost-optimized threshold

Business cost model: FN cost ($15,000 unplanned downtime) vs FP cost ($1,500 preventive maintenance)
Sweep thresholds from 0.05 to 0.95 to minimize total cost
Achieves 91% recall with optimized maintenance scheduling

Acknowledgements

Built as part of the Calgary Data Portfolio.

Ola K.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Key results

Architecture

Quickstart

Dataset

Tech stack

Methodology

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
notebooks		notebooks
src		src
README.md		README.md
app.py		app.py
index.html		index.html
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Overview

Key results

Architecture

Quickstart

Dataset

Tech stack

Methodology

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages