# Notebook 3B: Auto Loader for Maintenance Events

✅ Purpose of 03B Notebook
1. Auto Loader setup for maintenance_events.csv (in /Volumes/arao/aerodemo/tmp/maintenance)
1. 	Explicit schema to prevent inference errors
1. .trigger(once=True) batch-mode ingestion
1. Optionally simulate new CSV files using a helper scrip

🔁 `03B_Autoloader_Maintenance_Events.ipynb`
Use this notebook to ingest **maintenance events** into the `maintenance_events` Delta table.

- 📁 Watches files in: `/Volumes/arao/aerodemo/tmp/maintenance/`
- 📄 Looks for: `maintenance_events_*.csv`
- 🗂 Schema: aircraft_id, event_date, event_type
- ✅ Run this when:
  - You've created or simulated new maintenance log files
  - You're testing analytics or pipelines that depend on `maintenance_events`
  - You want to simulate incremental updates to aircraft service history

In [0]:

from pyspark.sql.types import *

# Set the path to your managed volume directory
volume_path = "/Volumes/arao/aerodemo/tmp/maintenance"

# Define the schema explicitly to avoid inference and ensure DLT compatibility
maintenance_schema = StructType([
    StructField("aircraft_id", StringType(), True),
    StructField("event_date", DateType(), True),
    StructField("event_type", StringType(), True)
])

# Read CSV files using Auto Loader
events_df = (spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "csv")
    .option("header", "true")
    .option("pathGlobFilter", "*.csv")
    .option("cloudFiles.schemaLocation", f"{volume_path}/schema/maintenance_events")
    .schema(maintenance_schema)
    .load(volume_path))

# Write to Delta table in append mode (with one-time trigger)
(events_df.writeStream
    .format("delta")
    .option("checkpointLocation", f"{volume_path}/checkpoints/maintenance_events")
    .option("mergeSchema", "true")
    .outputMode("append")
    .trigger(once=True)
    .table("arao.aerodemo.maintenance_events"))


## Utility Script to Simulate New Data Drop

In [0]:

import pandas as pd
from datetime import datetime
import uuid
import os

# Generate synthetic maintenance data
data = [{
    "aircraft_id": "A320_101",
    "event_date": datetime.today().date(),
    "event_type": "Routine Check" if i % 2 == 0 else "Engine Repair"
} for i in range(10)]

df = pd.DataFrame(data)

# Write to a uniquely named file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = f"/Volumes/arao/aerodemo/tmp/maintenance/maintenance_events_{timestamp}.csv"

os.makedirs("/Volumes/arao/aerodemo/tmp/maintenance", exist_ok=True)
df.to_csv(output_path, index=False)

print(f"✅ New file written to: {output_path}")
