# 📘 Machine Learning for Turbulence Risk Prediction in U.S. Airspace Using Open Flight and Weather Data - Project Summary

# ✈️ 1. Problem Statement
# Predicting high-risk turbulence events using open aviation and weather datasets to enhance flight safety.
# Traditional forecasting struggles with clear-air turbulence (CAT), especially without onboard sensors.

# 🌍 2. Data Sources
# - PIREPs (Pilot Reports) from Iowa Environmental Mesonet
# - ERA5 Reanalysis Data (28 pressure levels, multiple weather variables)

# ⚙️ 3. Pipeline Overview
# - Data Cleaning & Matching (PIREPs + ERA5)
# - Feature Engineering: Wind speed, shear, vorticity, cloud content, etc.
# - Class Balancing:
#     - Isolation Forest-based downsampling (NEG class)
#     - SMOTE for SEV–EXTRM samples
# - PCA + KMeans for pattern discovery
# - Supervised Classification:
#     - XGBoost (Main)
#     - Random Forest, LightGBM, CatBoost, TabNet, KNN, Naive Bayes (Comparisons)

# 📊 4. Evaluation
# - Stratified 10-Fold Cross Validation
# - Ablation study across different preprocessing variants
# - Case study on unseen 2025 data (e.g., February 16)

# ✅ 5. Results
# - XGBoost achieved F1-score: 0.88, Accuracy: 91.97%
# - PCA + KMeans uncovered a high-risk cluster with 82.35% SEV–EXTRM overlap
# - Visual insights: Confusion matrices, 3D PCA plots, U.S. turbulence risk maps

# 🧠 6. Skills Demonstrated
# - Open data integration (APIs, CSVs)
# - Data cleaning and feature extraction
# - Handling class imbalance (anomaly scoring, SMOTE)
# - Dimensionality reduction and unsupervised learning
# - Supervised classification (tree-based and neural models)
# - Cross-validation, performance analysis, and visualization

# 🔒 7. Disclaimer
# Full dataset and full code not publicly available to protect research integrity.
# This repo showcases techniques, visualizations, and results from the project.
# For inquiries or collaboration, please contact me directly.

# 📎 Author: Godha Naravara
# 📁 Repo: aviation-turbulence-risk-ML
