# 🚀 02_03_CabinPressurization_Data_Generation.ipynb

This notebook generates **synthetic cabin pressurization system data** for the Databricks AeroDemo pipeline.
It creates component-level environmental metrics focused on the cabin systems of the aircraft fleet.

---

### 📋 What this notebook does:

✅ **Sets up aircraft fleet**
- Uses the same fleet IDs as earlier sensor generators (`A320`, `B737`, `A330`, `B777`, `E190`)

✅ **Generates daily cabin pressurization records**
- For each aircraft:
  - Simulates **cabin pressure** and **seal integrity**
  - Adds fluctuations in **airflow rate**, **temperature control**, and **humidity levels**
  - Injects random anomaly periods for realistic critical/warning events

✅ **Creates a structured dataset**
- Daily component records → CSV for Auto Loader (`/Volumes/arao/aerodemo/tmp/cabin`)

✅ **Saves output with timestamped filenames**
- Ensures multiple runs create separate files for easy ingestion

---

### 🛠 Key points:
- **Focus area:** Cabin pressurization environment and system integrity
- **Data diversity:** Includes normal, warning, and critical conditions
- **Downstream integration:** Feeds DLT tables like `twin_cabin_pressurization` and health status computations

---

### 🔗 Where this fits:
This notebook is part of the **02_ series** synthetic data generators:
- `02_01_Sensor_Data_Generation.ipynb`
- `02_02_Engine_Data_Generation.ipynb`
- `02_03_CabinPressurization_Data_Generation.ipynb` (this file)
- Future: additional components like landing gear, avionics, airframe



In [0]:
import os
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import random

def generate_cabin_data(num_records_per_aircraft=100):
    aircraft_ids = ["A320_101", "A330_201", "B737_301"]
    data = {
        'cabin_id': [],
        'aircraft_id': [],
        'event_timestamp': [],
        'cabin_pressure': [],
        'seal_integrity': [],
        'airflow_rate': [],
        'temperature_control': [],
        'humidity_level': []
    }

    for aircraft_id in aircraft_ids:
        for i in range(num_records_per_aircraft):
            random_days_ago = random.randint(0, 6)
            random_time = datetime.now() - timedelta(days=random_days_ago, hours=random.randint(0, 23), minutes=random.randint(0, 59))
            data['cabin_id'].append(f'CABIN_{aircraft_id}_{i:03d}')  # zero-padded for uniqueness
            data['aircraft_id'].append(aircraft_id)
            data['event_timestamp'].append(random_time.strftime("%Y-%m-%d %H:%M:%S"))
            data['cabin_pressure'].append(round(np.random.uniform(10, 12), 2))
            data['seal_integrity'].append(round(np.random.uniform(85, 100), 2))
            data['airflow_rate'].append(round(np.random.uniform(300, 500), 2))
            data['temperature_control'].append(round(np.random.uniform(18, 25), 2))
            data['humidity_level'].append(round(np.random.uniform(30, 50), 2))

    df = pd.DataFrame(data)
    return df

# Generate DataFrame
df = generate_cabin_data()

# Save to Auto Loader-compatible path
output_path = "/Volumes/arao/aerodemo/tmp/cabin"
os.makedirs(output_path, exist_ok=True)
output_file = f"{output_path}/cabin_sample.csv"
if os.path.exists(output_file):
    os.remove(output_file)  # safely remove old file

df.to_csv(output_file, index=False)

print(f"✅ Cabin pressurization data generated: {len(df)} rows saved to {output_file}")