# 🚀 02_01_Sensor_Data_Generation.ipynb

This notebook generates **synthetic daily aircraft sensor data** for the Databricks AeroDemo pipeline.  
It is the core generator for overall aircraft operational metrics across an entire fleet.

---

### 📋 What this notebook does:

✅ **Sets up aircraft fleet**
- Builds a fleet of aircraft IDs across 5 models:
  - Airbus A320
  - Boeing B737
  - Airbus A330
  - Boeing B777
  - Embraer E190
- Assigns `NUM_AIRCRAFT_PER_MODEL` (default 5) for balanced model coverage

✅ **Generates daily sensor records**
- For each aircraft:
  - Simulates **engine temperature** drift over time
  - Adjusts **fuel efficiency** and **vibration** levels based on wear-and-tear
  - Adds realistic **altitude** and **airspeed** noise
  - Simulates **anomaly events** and **post-repair recovery**
  - Logs maintenance events (routine check, engine repair)

✅ **Creates two structured datasets**
- Daily sensor records → CSV for Auto Loader (`/Volumes/arao/aerodemo/tmp/raw`)
- Maintenance events → CSV for Auto Loader (`/Volumes/arao/aerodemo/tmp/maintenance`)

✅ **Saves outputs with timestamped filenames**
- Ensures multiple runs don’t overwrite previous synthetic data dumps

---

### 🛠 Key points:
- **Temporal coverage:** Full year simulation (e.g., Jan–Dec 2024)
- **Data diversity:** Includes anomalies, repairs, and gradual drift effects
- **Downstream integration:** Feeds the DLT tables like `raw_sensor_data` and `maintenance_events`

---

### 🔗 Where this fits:
This notebook is part of the **02_ series**:
- `02_01_Sensor_Data_Generation.ipynb` → aircraft-level daily metrics (this file)
- `02_02_Engine_Data_Generation.ipynb` → component-level engine sensor metrics
- Future: landing gear, avionics, cabin, and airframe data generators

These synthetic datasets power the AeroDemo’s DLT pipeline, which builds the digital twin system, performs risk analysis, and supports operational dashboards.

In [0]:

from datetime import datetime, timedelta
import random
import numpy as np

# --- Configuration ---
NUM_AIRCRAFT_PER_MODEL = 5
START_DATE = datetime(2024, 1, 1)
END_DATE = datetime(2024, 12, 31)


In [0]:

# --- Aircraft Setup ---
aircraft_ids = (
    [f"A320_{i:03d}" for i in range(101, 101 + NUM_AIRCRAFT_PER_MODEL)] +
    [f"B737_{i:03d}" for i in range(201, 201 + NUM_AIRCRAFT_PER_MODEL)] +
    [f"A330_{i:03d}" for i in range(301, 301 + NUM_AIRCRAFT_PER_MODEL)] +
    [f"B777_{i:03d}" for i in range(401, 401 + NUM_AIRCRAFT_PER_MODEL)] +
    [f"E190_{i:03d}" for i in range(501, 501 + NUM_AIRCRAFT_PER_MODEL)]
)

models = (
    ["A320"] * NUM_AIRCRAFT_PER_MODEL +
    ["B737"] * NUM_AIRCRAFT_PER_MODEL +
    ["A330"] * NUM_AIRCRAFT_PER_MODEL +
    ["B777"] * NUM_AIRCRAFT_PER_MODEL +
    ["E190"] * NUM_AIRCRAFT_PER_MODEL
)

date_range = [START_DATE + timedelta(days=i) for i in range((END_DATE - START_DATE).days + 1)]
raw_data, maintenance_events = [], []


In [0]:

from datetime import time

# Generate data for each aircraft
for aircraft_id, model in zip(aircraft_ids, models):
    base_temp = random.uniform(550, 600)
    base_fuel_eff = random.uniform(80, 90)
    base_vib = random.uniform(3.0, 6.0)
    drift_temp = random.uniform(0.05, 0.1)
    drift_fuel_eff = random.uniform(-0.1, -0.05)
    drift_vib = random.uniform(0.01, 0.03)
    sched_idx = random.randint(150, 180)
    anomaly_idx = random.randint(250, 300)
    if anomaly_idx <= sched_idx:
        anomaly_idx = sched_idx + 50
    if anomaly_idx >= len(date_range):
        anomaly_idx = len(date_range) - 60
    repair_idx = min(anomaly_idx + 1, len(date_range) - 1)
    maintenance_events.append({
        "aircraft_id": aircraft_id,
        "event_date": date_range[sched_idx].date(),
        "event_type": "Routine Check"
    })
    maintenance_events.append({
        "aircraft_id": aircraft_id,
        "event_date": date_range[repair_idx].date(),
        "event_type": "Engine Repair"
    })
    for day_idx, current_date in enumerate(date_range):
        hour = random.randint(0, 23)
        minute = random.randint(0, 59)
        second = random.randint(0, 59)
        timestamp = datetime.combine(current_date, time(hour, minute, second))
        engine_temp = base_temp + drift_temp * day_idx + np.random.normal(0, 2)
        fuel_eff = base_fuel_eff + drift_fuel_eff * day_idx + np.random.normal(0, 1)
        vibration = base_vib + drift_vib * day_idx + np.random.normal(0, 0.1)
        altitude = 30000 + np.random.normal(0, 500)
        airspeed = 450 + np.random.normal(0, 20)
        oil_pressure = round(random.uniform(30, 90), 2)
        engine_rpm = int(random.uniform(1500, 5000))
        battery_voltage = round(random.uniform(22.0, 28.0), 2)
        anomaly_score = 0.0
        if day_idx == anomaly_idx:
            engine_temp *= 1.3
            fuel_eff *= 0.7
            vibration = max(vibration * 3, vibration + 5)
            anomaly_score = 1.0
        if day_idx >= repair_idx:
            engine_temp = max(base_temp, engine_temp - 0.15 * base_temp)
            fuel_eff = min(base_fuel_eff, fuel_eff + 0.15 * base_fuel_eff)
            vibration = max(0.0, vibration - 0.5 * base_vib)
        raw_data.append({
            "timestamp": timestamp.strftime("%Y-%m-%d %H:%M:%S"),
            "aircraft_id": aircraft_id,
            "model": model,
            "engine_temp": round(engine_temp, 2),
            "fuel_efficiency": round(fuel_eff, 2),
            "vibration": round(vibration, 3),
            "altitude": round(altitude, 2),
            "airspeed": round(airspeed, 2),
            "anomaly_score": anomaly_score,
            "oil_pressure": oil_pressure,
            "engine_rpm": engine_rpm,
            "battery_voltage": battery_voltage
        })


In [0]:

import pandas as pd

raw_data_path = "/Volumes/arao/aerodemo/tmp/raw"
maint_data_path = "/Volumes/arao/aerodemo/tmp/maintenance"
timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S")
raw_file_path = f"{raw_data_path}/raw_sensor_data_{timestamp_str}.csv"
maint_file_path = f"{maint_data_path}/maintenance_events_{timestamp_str}.csv"

pd.DataFrame(raw_data).to_csv(raw_file_path, index=False)
pd.DataFrame(maintenance_events).to_csv(maint_file_path, index=False)

print("✅ Files written:")
print(f"- {raw_file_path}")
print(f"- {maint_file_path}")
