# ✈️ AeroDemo: Workflow & Instructions

This document outlines the end-to-end workflow for simulating and analyzing aircraft sensor and maintenance data using **Delta Live Tables**, **Auto Loader**, and **MLflow** within Databricks. The goal is to demonstrate predictive maintenance capabilities by identifying aircraft at risk of **AOG (Aircraft on Ground)** events.

---

## 📚 Notebook Workflow

| Notebook Name                             | Purpose |
|------------------------------------------|---------|
| `01_Table_Creation.ipynb`                | 🏗️ Creates all required Delta tables in Unity Catalog (e.g., `raw_sensor_data`, `maintenance_events`, etc.) |
| `02_Synthetic_Data_Generation.ipynb`     | 🧪 Generates synthetic CSV files for sensor and maintenance event data with timestamped filenames |
| `03_DLT_Pipeline_Full.ipynb`             | 🔁 Unified DLT pipeline that ingests raw data, cleanses it, enriches it with recent maintenance info, and computes sensor features |
| `04_Model_Training_And_Registration.ipynb`| 🤖 Trains a Random Forest model to detect anomalies, logs it with MLflow, and registers it in Unity Catalog |
| `05_Model_Inference.ipynb`               | 📈 Loads the registered model to predict anomalies on new data |
| `06_Anomaly_Alert_Writeback.ipynb`       | 🚨 Filters high-risk predictions and writes them to `anomaly_alerts` Delta table |

---

## ▶️ Execution Order & Triggers

### ✅ One-Time Setup
- **`01_Table_Creation.ipynb`**: Run once to initialize all Delta tables.

### 🔄 Generate New Data
- **`02_Synthetic_Data_Generation.ipynb`**: Use to produce fresh input files. Output will automatically be picked up by the DLT pipeline (via Auto Loader).

### ⚙️ DLT Pipeline Execution
- **`03_DLT_Pipeline_Full.ipynb`**: Run this to ingest and process new raw + maintenance data. It:
  - Ingests from volume paths using Auto Loader
  - Applies validation filters
  - Joins maintenance data
  - Computes rolling sensor features for ML

### 🎯 ML Pipeline
- **`04_Model_Training_And_Registration.ipynb`**: Trains, evaluates, and registers a classification model to predict anomalies.

### 🔍 Inference & Alerts
- **`05_Model_Inference.ipynb`**: Loads new sensor features and applies the model to detect anomalies. Outputs are stored in a new table: `anomaly_alerts`.

---

## 📊 Key Datasets

### ✈️ `aircraft_model_reference`

This table serves as the authoritative reference for aircraft metadata used in Digital Twin tracking.

| Column         | Type    | Description                                   |
|----------------|---------|-----------------------------------------------|
| `model`        | STRING  | Aircraft model type (e.g., A320, B737, A330)  |
| `manufacturer` | STRING  | Name of the aircraft manufacturer             |
| `engine_type`  | STRING  | Type of engine used (e.g., Turbofan)          |
| `capacity`     | INT     | Maximum seating capacity                      |
| `range_km`     | INT     | Maximum range in kilometers                   |

This reference data enables context-aware interpretation of sensor patterns and maintenance events across aircraft types, forming the foundation for Digital Twin modeling at the aircraft level.

### 🔧 `raw_sensor_data`
Telemetry from aircraft sensors including:

| Column            | Type     | Description                          |
|-------------------|----------|--------------------------------------|
| `timestamp`       | TIMESTAMP| Time of data capture                 |
| `aircraft_id`     | STRING   | Unique aircraft ID                   |
| `model`           | STRING   | Aircraft type (e.g., A320, B737)     |
| `engine_temp`     | DOUBLE   | Engine temperature (°C)              |
| `fuel_efficiency` | DOUBLE   | Percent efficiency                   |
| `vibration`       | DOUBLE   | Structural vibration score           |
| `altitude`        | DOUBLE   | Altitude in feet                     |
| `airspeed`        | DOUBLE   | Speed in knots                       |
| `anomaly_score`   | DOUBLE   | Flag for simulated anomalies         |
| `oil_pressure`    | DOUBLE   | PSI reading from oil system          |
| `engine_rpm`      | INT      | Revolutions per minute               |
| `battery_voltage` | DOUBLE   | Electrical system health (volts)     |

---

### 🛠 `maintenance_events`
Maintenance logs for routine and unscheduled repairs.

| Column         | Type   | Description                             |
|----------------|--------|-----------------------------------------|
| `aircraft_id`  | STRING | ID of the aircraft                      |
| `event_date`   | DATE   | Date of maintenance                     |
| `event_type`   | STRING | Type (e.g., Routine Check, Engine Repair)|

---

### 🧼 `cleaned_sensor_data`
DLT-filtered sensor data based on validation rules:
- Engine temp < 700
- Fuel efficiency > 50
- Vibration < 25

---

### 🧩 `enriched_sensor_data`
Sensor data joined with latest known maintenance event per aircraft.

---

### 🔢 `sensor_features`
Feature-engineered table for ML — includes time-lagged metrics and smoothed aggregates.

---

### 📈 `anomaly_predictions`
Model-predicted anomalies for each aircraft and timestamp.

---

### 🚨 `anomaly_alerts`
Filtered subset of predictions where risk is high (used for downstream alerts/UI).

---

🛫 **Next Steps**: You can enhance this demo by:
- Adding streaming maintenance logs
- Using a real-world dataset
- Building a Plotly Dash app or Power BI dashboard for visualization

## ✅ Pipeline Reset Checklist

Before re-running the pipeline:

1. [ ] Execute `02_Synthetic_Data_Generation` to generate fresh sensor and maintenance data.
2. [ ] Run the SQL cell to truncate existing tables:
   - `raw_sensor_data`
   - `maintenance_events`
   - `cleaned_sensor_data`
   - `enriched_sensor_data`
   - `prediction_results`
3. [ ] Click **"Run"** on the DLT Pipeline to refresh all stages.
4. [ ] Verify record counts and schema using `SELECT COUNT(*) FROM <table_name>` queries.