Skip to content

guydev42/predictive-maintenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Overview

A predictive maintenance system that forecasts industrial equipment failures from sensor data using gradient boosting, survival analysis for remaining useful life estimation, and SHAP-based explanations.

Unplanned equipment downtime costs industrial operations thousands of dollars per hour. This project builds a failure prediction pipeline that ingests sensor readings (temperature, vibration, pressure, RPM, power consumption) from 50 machines and predicts whether a failure will occur within the next 7 days. Four classifiers are trained, a Weibull survival model estimates remaining useful life, and a cost-based threshold optimizer balances unplanned downtime ($15K) against preventive maintenance ($1.5K).

Problem   →  Predicting equipment failures before they happen using sensor data
Solution  →  XGBoost with survival analysis, SHAP explanations, and cost-based threshold tuning
Impact    →  AUC 0.94, catches 91% of failures with optimized maintenance scheduling

Key results

Metric Value
AUC-ROC 0.94
Recall (failures caught) 91%
Precision 78%
PR-AUC 0.76
Best model XGBoost

Architecture

┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│  Sensor data     │───▶│  Feature          │───▶│  Rolling         │
│  generation      │    │  extraction       │    │  aggregations    │
└──────────────────┘    └──────────────────┘    └────────┬─────────┘
                                                         │
                          ┌──────────────────────────────┘
                          ▼
              ┌──────────────────────┐    ┌──────────────────────┐
              │  Model training      │───▶│  Survival analysis   │
              │  (4 classifiers)     │    │  (Weibull AFT / RUL) │
              └──────────────────────┘    └──────────┬───────────┘
                                                     │
                          ┌──────────────────────────┘
                          ▼
              ┌──────────────────────┐    ┌──────────────────────┐
              │  SHAP explanations   │───▶│  Maintenance         │
              │  + threshold tuning  │    │  dashboard           │
              └──────────────────────┘    └──────────────────────┘
Project structure
project_21_predictive_maintenance/
├── data/
│   ├── sensor_readings.csv              # Sensor dataset
│   └── generate_data.py                 # Synthetic data generator
├── src/
│   ├── __init__.py
│   ├── data_loader.py                   # Data generation and loading
│   └── model.py                         # Training, evaluation, SHAP, RUL
├── notebooks/
│   ├── 01_eda.ipynb                     # Exploratory data analysis
│   ├── 02_feature_engineering.ipynb     # Rolling features, interactions
│   ├── 03_modeling.ipynb                # Model training and CV
│   └── 04_evaluation.ipynb             # ROC, SHAP, cost analysis
├── app.py                               # Streamlit dashboard
├── requirements.txt
└── README.md

Quickstart

# Clone and navigate
git clone https://github.com/guydev42/calgary-data-portfolio.git
cd calgary-data-portfolio/project_21_predictive_maintenance

# Install dependencies
pip install -r requirements.txt

# Generate sensor data
python data/generate_data.py

# Launch dashboard
streamlit run app.py

Dataset

Property Details
Source Synthetic industrial sensor data
Readings 15,000
Machines 50
Failure rate ~8% (1,200 pre-failure readings)
Features 11 (temperature, vibration, pressure, rpm, power, rolling stats)
Target failure_within_7days (binary)

Tech stack


Methodology

Sensor feature engineering
  • Rolling 24h mean temperature and standard deviation of vibration
  • Temperature-pressure ratio as an interaction feature
  • Machine-level attributes: age, operating hours, maintenance history
Model training
  • Four classifiers: Logistic Regression, Random Forest, XGBoost, Gradient Boosting
  • 5-fold StratifiedKFold cross-validation
  • Class imbalance handled via class_weight and scale_pos_weight
  • Metrics: AUC-ROC, precision, recall, F1, PR-AUC
Survival analysis
  • Weibull Accelerated Failure Time (AFT) model from lifelines
  • Covariates: machine age, mean temperature, mean vibration
  • Outputs remaining useful life (RUL) estimates per machine
SHAP explainability
  • TreeExplainer for gradient boosting models
  • Global feature importance via mean absolute SHAP values
  • Waterfall plots for individual sensor reading explanations
Cost-optimized threshold
  • Business cost model: FN cost ($15,000 unplanned downtime) vs FP cost ($1,500 preventive maintenance)
  • Sweep thresholds from 0.05 to 0.95 to minimize total cost
  • Achieves 91% recall with optimized maintenance scheduling

Acknowledgements

Built as part of the Calgary Data Portfolio.


About

<div align=center>

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors