# Aviation Analytics - Data Processing Pipeline

This notebook handles the ingestion, cleaning, and processing of raw data for the Aviation Analytics project.

## Modules:
1. **Turbulence Data**: Processing PIREPs (Pilot Reports).
2. **Airport Efficiency**: Processing BTS On-Time Performance Data (Chunked Download).

In [None]:
import os
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add src to path
sys.path.append(os.path.abspath("aviation-analytics/src"))

from data_preprocessing import process_turbulence_data, process_aei_chunks

# Define Paths
RAW_DIR = Path("aviation-analytics/data/raw")
PROCESSED_DIR = Path("aviation-analytics/data/processed")
PIREPS_DIR = RAW_DIR / "pireps"

# Create directories if not exist
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

## 1. Turbulence Data Processing (PIREPs)
Processing text labels and standardizing columns.

In [None]:
print("Processing Turbulence Data...")
turbulence_df = process_turbulence_data(PIREPS_DIR)

if not turbulence_df.empty:
    print(f"Processed {len(turbulence_df)} rows.")
    print(turbulence_df['turbulence_intensity'].value_counts())
    
    output_path = PROCESSED_DIR / "turbulence_cleaned.csv.gz"
    turbulence_df.to_csv(output_path, compression='gzip', index=False)
    print(f"Saved to {output_path}")
else:
    print("No valid turbulence data found.")

## 2. Airport Efficiency Index (AEI) Processing

Downloading and processing BTS data in chunks (Month by Month) to avoid memory issues and manual large downloads.
We will process data for **2023 and 2024**.

In [None]:
# Define range
YEARS = [2023, 2024]
MONTHS = range(1, 13) # 1 to 12

print(f"Starting AEI Processing for years: {YEARS}...")
aei_df = process_aei_chunks(YEARS, MONTHS)

if not aei_df.empty:
    print(f"Processed AEI for {len(aei_df)} airports.")
    print(aei_df.head())
    
    output_path = PROCESSED_DIR / "airport_efficiency.csv.gz"
    aei_df.to_csv(output_path, compression='gzip', index=False)
    print(f"Saved AEI data to {output_path}")
else:
    print("Failed to process AEI data. Check internet connection or BTS availability.")