
# EV Battery Remaining Useful Life (RUL) — Project Report

**Abstract.** We predict *Remaining Useful Life (RUL)* of EV cells as **diagnostic steps remaining** to the 80% End-of-Life (EOL) threshold using diagnostic capacity tests from the Onori dataset. The pipeline is simple and interpretable: we compute labels per cell, engineer a few physics-aligned features, and evaluate multiple models with grouped cross-validation by cell. A tuned Elastic Net with polynomial features achieves strong out-of-fold (OOF) performance on cells with ≥5 diagnostics.



## 1. Problem Definition & Success Criteria
- **Task:** supervised **regression**; target is RUL (diagnostic steps to 80% EOL).
- **Why:** enable scheduled maintenance before performance drops.
- **Success:** good **generalization to unseen cells** → grouped CV by `cell_id` with RMSE/MAE/R² (OOF).



## 2. Data Provenance & Description
- **Source:** Onori EV aging diagnostic capacity tests (`Diag_*/Capacity_test`).
- **Shape:** 101 diagnostics across 10 cells (derived; no raw files tracked in Git).
- **EOL:** 0.8 × initial capacity per cell; per-diagnostic capacity taken as max *Discharge Capacity (Ah)* from the Excel workbook.


In [None]:

import pandas as pd

df = pd.read_csv("results/rpt_features_labeled_enriched.csv").sort_values(["cell_id","diag"])
rows, cells_cnt = len(df), df["cell_id"].nunique()
print("rows:", rows, "| cells:", cells_cnt)
df.head()



## 3. Cleaning & EDA (highlights)
- **Label:** For each cell, find the first diagnostic where capacity ≤ 0.8 × initial (call it diag_EOL). RUL = diag_EOL − diag.
- **Features:** `diag`, `capacity_ah`, `fade_frac = 1 − capacity/initial`, short-window slope `cap_slope_k3`; parsed `c_rate`/`temp_c` when available.
- **Takeaways:** RUL is right-skewed; capacity declines roughly monotonically with diagnostic index; slopes vary by cell group.

**Figures (reproduced earlier and tracked):**


In [None]:

from IPython.display import Image, display

display(Image(filename="results/figs/rul_hist.png", width=800))
display(Image(filename="results/figs/cap_vs_diag_by_cell.png", width=800))
display(Image(filename="results/figs/fade_vs_diag.png", width=800))



## 4. Models & Cross-Validation
Compared: baseline mean, **diag-only LinearRegression**, **Elastic Net**, **RandomForest**, **Histogram Gradient Boosting**, **SVR (RBF)**.  
**Evaluation:** **GroupKFold by `cell_id`** (emulates unseen cells); report **OOF** metrics RMSE/MAE/R².  
**Tuning:** Elastic Net + polynomial features; filtered to cells with ≥5 diagnostics for the headline OOF.


In [None]:

import pandas as pd
from IPython.display import display, Image

lb = pd.read_csv("results/leaderboard.csv").sort_values("rmse_mean")
pc = pd.read_csv("results/per_cell_oof_metrics_tuned.csv")

display(lb.round(3))

print("
Tuned Elastic Net (poly, filtered ≥5 diags) — per-cell OOF")
display(pc.round(3))

display(Image(filename="results/figs/parity_plot_oof_tuned.png", width=800))



## 5. Results, Discussion & Conclusions
- **Simple CV leaderboard:** diag-only LinearRegression is already strong; Elastic Net close; trees/SVR do not beat linear on this small, mostly monotonic dataset.
- **Tuned (filtered, OOF):** strong parity across cells with adequate history (see table & plot).
- **Interpretability:** standardized coefficients (reported in slides) align with physics — `diag` (−), `capacity_ah` (+), `fade_frac` (+), short-window slope (+).
- **Limits:** only 10 cells; mostly fixed temperature; some cells have few diagnostics.
- **Next:** add HPPC ΔR (resistance change), try quantile regression for conservative RUL, expand cells/temps, consider time-to-EOL.
