# 🚀 02_02_Engine_Data_Generation.ipynb

This notebook generates **synthetic engine sensor data** for the Databricks AeroDemo pipeline.  
It is part of the synthetic data generator suite that feeds component-level DLT tables.

---

### 📋 What this notebook does:

✅ **Sets up aircraft & engine IDs**  
- Works on a set of predefined aircraft (A320, A330, B737)  
- For each aircraft, generates multiple unique engine records

✅ **Generates synthetic engine measurements**  
- For each engine record:
  - Randomized event timestamps within the past 7 days
  - Realistic thrust levels (50,000–120,000)
  - Fuel consumption rates (2.0–5.0)
  - Temperature readings (300–800)
  - Vibration levels (0.1–2.0)
  - Oil pressure (30–80)

✅ **Creates a Pandas DataFrame**  
- All generated records are assembled into a single structured dataset

✅ **Exports as CSV file**  
- Saves to: **/Volumes/arao/aerodemo/tmp/engine/engines_sample.csv**

- This is ready for ingestion by the Databricks Auto Loader in the DLT pipeline

---

### 🛠 Key points:
- **File naming:** Includes `engines_sample.csv` to distinguish it from other synthetic data files
- **Compatibility:** Designed to match the DLT schema defined in the pipeline for engine components
- **Reusability:** You can adjust `num_records_per_aircraft` to control the data volume for testing

---

### 🔗 Where this fits:
This notebook is part of the **02_ series**:
- `02_01_Sensor_Data_Generation.ipynb` → aircraft-level daily sensor metrics
- `02_02_Engine_Data_Generation.ipynb` → engine component-level sensor data (this file)
- Future notebooks → landing gear, avionics, cabin, airframe components

At the end, all synthetic data flows into the full Databricks DLT pipeline to power the aircraft digital twin system and associated dashboards.

In [0]:

import os
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import random

def generate_engines_data(num_records_per_aircraft=100):
    aircraft_ids = ["A320_101", "A330_201", "B737_301"]
    data = {
        'engine_id': [],
        'aircraft_id': [],
        'event_timestamp': [],
        'thrust_level': [],
        'fuel_consumption_rate': [],
        'temperature_reading': [],
        'vibration_level': [],
        'oil_pressure': []
    }

    for aircraft_id in aircraft_ids:
        for i in range(num_records_per_aircraft):
            random_days_ago = random.randint(0, 6)
            random_time = datetime.now() - timedelta(days=random_days_ago, hours=random.randint(0, 23), minutes=random.randint(0, 59))
            data['engine_id'].append(f'ENG_{aircraft_id}_{i:03d}')
            data['aircraft_id'].append(aircraft_id)
            data['event_timestamp'].append(random_time.strftime("%Y-%m-%d %H:%M:%S"))
            data['thrust_level'].append(round(np.random.uniform(50000, 120000), 2))
            data['fuel_consumption_rate'].append(round(np.random.uniform(2.0, 5.0), 3))
            data['temperature_reading'].append(round(np.random.uniform(300, 800), 2))
            data['vibration_level'].append(round(np.random.uniform(0.1, 2.0), 3))
            data['oil_pressure'].append(round(np.random.uniform(30, 80), 2))

    df = pd.DataFrame(data)
    return df

# Generate DataFrame
df = generate_engines_data()

# Save to Auto Loader-compatible path
output_path = "/Volumes/arao/aerodemo/tmp/engine"
os.makedirs(output_path, exist_ok=True)
output_file = f"{output_path}/engines_sample.csv"
if os.path.exists(output_file):
    os.remove(output_file)

df.to_csv(output_file, index=False)

print(f"✅ Engine data generated: {len(df)} rows saved to {output_file}")
