# C-MAPSS Turbofan RUL Analysis with PRISM

NASA's Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset for Remaining Useful Life (RUL) prediction.

## Datasets

| Dataset | Operating Conditions | Fault Modes | Train Engines | Test Engines |
|---------|---------------------|-------------|---------------|---------------|
| FD001 | 1 (Sea Level) | 1 (HPC Degradation) | 100 | 100 |
| FD002 | 6 (Various) | 1 (HPC Degradation) | 260 | 259 |
| FD003 | 1 (Sea Level) | 2 (HPC + Fan) | 100 | 100 |
| FD004 | 6 (Various) | 2 (HPC + Fan) | 249 | 248 |

## Sensors (21)

| Sensor | Description | Unit |
|--------|-------------|------|
| T2 | Fan inlet temperature | 째R |
| T24 | LPC outlet temperature | 째R |
| T30 | HPC outlet temperature | 째R |
| T50 | LPT outlet temperature | 째R |
| P2 | Fan inlet pressure | psia |
| P15 | Bypass duct pressure | psia |
| P30 | HPC outlet pressure | psia |
| Nf | Physical fan speed | rpm |
| Nc | Physical core speed | rpm |
| epr | Engine pressure ratio | - |
| Ps30 | HPC outlet static pressure | psia |
| phi | Fuel flow / Ps30 | - |
| NRf | Corrected fan speed | rpm |
| NRc | Corrected core speed | rpm |
| BPR | Bypass ratio | - |
| farB | Burner fuel-air ratio | - |
| htBleed | Bleed enthalpy | - |
| Nf_dmd | Demanded fan speed | rpm |
| PCNfR_dmd | Demanded corrected fan speed | rpm |
| W31 | HPT coolant bleed | lbm/s |
| W32 | LPT coolant bleed | lbm/s |

## PRISM Approach

Use Laplace field geometry to detect regime changes that precede failure:
- **Divergence spikes** = energy injection (degradation onset)
- **Gradient magnitude** = rate of change acceleration
- **Normalized field** = anomaly detection relative to baseline

In [None]:
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (14, 5)

DOMAIN = 'cmapss'
DATA_DIR = Path(f'/Users/jasonrudder/prism-mac/data/{DOMAIN}')

## 1. Data Overview

In [None]:
# Load raw observations
obs = pl.read_parquet(DATA_DIR / 'raw' / 'observations.parquet')
meta = pl.read_parquet(DATA_DIR / 'raw' / 'engine_metadata.parquet')
signals = pl.read_parquet(DATA_DIR / 'raw' / 'signals.parquet')

print(f"Total observations: {len(obs):,}")
print(f"Total signals: {obs['signal_id'].n_unique():,}")
print(f"Total engines: {len(meta):,}")
print()
print("Dataset summary:")
print(meta.group_by('dataset').agg([
    pl.len().alias('engines'),
    pl.col('total_cycles').mean().alias('avg_cycles'),
    pl.col('total_cycles').max().alias('max_cycles'),
]).sort('dataset'))

In [None]:
# Sample engine trajectory (FD001, unit 1)
sample_unit = obs.filter(pl.col('signal_id').str.contains('FD001_0001_'))

# Pivot to wide format
sample_wide = sample_unit.pivot(
    on='signal_id',
    index='obs_date',
    values='value'
).sort('obs_date')

print(f"Sample engine FD001_0001: {len(sample_wide)} cycles")
print(f"Sensors available: {len(sample_wide.columns) - 1}")

In [None]:
# Plot sample engine sensor trajectories
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

sensors = ['FD001_0001_T30', 'FD001_0001_P30', 'FD001_0001_Nf', 
           'FD001_0001_phi', 'FD001_0001_NRc', 'FD001_0001_RUL']

for ax, sensor in zip(axes.flat, sensors):
    if sensor in sample_wide.columns:
        y = sample_wide[sensor].to_numpy()
        ax.plot(y, linewidth=0.8)
        ax.set_title(sensor.split('_')[-1])
        ax.set_xlabel('Cycle')

plt.suptitle('FD001 Unit 1: Sensor Trajectories to Failure', fontsize=14)
plt.tight_layout()
plt.show()

## 2. RUL Distribution Analysis

In [None]:
# Engine lifetimes by dataset
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

for ax, dataset in zip(axes, ['FD001', 'FD002', 'FD003', 'FD004']):
    cycles = meta.filter(pl.col('dataset') == dataset)['total_cycles'].to_numpy()
    ax.hist(cycles, bins=30, alpha=0.7, color='steelblue', edgecolor='white')
    ax.axvline(np.mean(cycles), color='red', linestyle='--', label=f'Mean: {np.mean(cycles):.0f}')
    ax.set_title(f'{dataset}')
    ax.set_xlabel('Cycles to Failure')
    ax.legend()

axes[0].set_ylabel('Count')
plt.suptitle('Engine Lifetime Distribution by Dataset', fontsize=14)
plt.tight_layout()
plt.show()

## 3. Run PRISM Pipeline

Execute in terminal:
```bash
# Compute behavioral vectors with normalization
python -m prism.entry_points.signal_vector --signal --domain cmapss --testing

# Compute Laplace field (raw)
python -m prism.entry_points.laplace --level signal --domain cmapss --value-col metric_value

# Compute Laplace field (normalized)
python -m prism.entry_points.laplace --level signal --domain cmapss --value-col metric_value_norm
```

In [None]:
# Check if PRISM outputs exist
from prism.db.parquet_store import get_parquet_path

vector_path = get_parquet_path('vector', 'signal', domain=DOMAIN)
field_path = get_parquet_path('vector', 'signal_field', domain=DOMAIN)
field_norm_path = get_parquet_path('vector', 'signal_field_norm', domain=DOMAIN)

print(f"Vector exists: {vector_path.exists()}")
print(f"Field (raw) exists: {field_path.exists()}")
print(f"Field (norm) exists: {field_norm_path.exists()}")

## 4. Laplace Field Analysis (After PRISM Run)

In [None]:
# Load PRISM outputs (run this after pipeline completes)
try:
    vectors = pl.read_parquet(vector_path)
    field_raw = pl.read_parquet(field_path)
    field_norm = pl.read_parquet(field_norm_path)
    
    print(f"Vector rows: {len(vectors):,}")
    print(f"Field (raw) rows: {len(field_raw):,}")
    print(f"Field (norm) rows: {len(field_norm):,}")
except FileNotFoundError:
    print("PRISM outputs not found. Run the pipeline first.")

In [None]:
# Analyze divergence patterns vs RUL
try:
    # Get RUL signals
    rul_field = field_norm.filter(pl.col('signal_id').str.contains('_RUL'))
    
    # Sample: correlate divergence with RUL
    print("RUL field statistics:")
    print(f"  Divergence mean: {rul_field['divergence'].mean():.4f}")
    print(f"  Divergence std: {rul_field['divergence'].std():.4f}")
except:
    print("Run PRISM pipeline first.")

## 5. Early Warning Detection

Key hypothesis: Divergence spikes in normalized Laplace field should precede failure.

In [None]:
# Placeholder for post-analysis
# After running PRISM, analyze:
# 1. Divergence vs RUL correlation
# 2. Detection lag (how many cycles before failure does divergence spike)
# 3. False positive rate at different thresholds

print("Analysis pending PRISM pipeline completion.")

## Summary

### Data Loaded
- **5.8M observations** across 4 datasets
- **31,152 signals** (21 sensors + RUL per engine)
- **1,416 engines** total (709 train + 707 test)

### Next Steps
1. Run `signal_vector` to compute behavioral metrics
2. Run `laplace` on both raw and normalized values
3. Analyze divergence patterns vs RUL
4. Build early warning threshold