# Data Sources and Acquisition

This notebook demonstrates how raw aviation and weather data were collected and processed to support a turbulence risk prediction system. All work was carried out independently as part of a Master's project.

---

## 1. Pilot Reports (PIREPs)

**Source**: Iowa Environmental Mesonet (IEM)  
[https://mesonet.agron.iastate.edu/request/gis/pireps.php](https://mesonet.agron.iastate.edu/request/gis/pireps.php)

PIREPs provide real-time turbulence observations submitted by pilots during flight. These were used to derive ground truth turbulence labels and contextual metadata.

### -- Spatial Coverage Visualization
```python
from IPython.display import Image
Image(filename='../assets/1. PIREP_Reports_2024_Map.png')
```
  ![PIREPs Map](/assets/1.%20PIREP_Reports_2024_Map.png)
This map showcases the full extent of ~1.1 Million PIREPs across the U.S., Alaska, Hawaii, and nearby regions.

### -- Sample Raw Data (10 rows)
```python
import pandas as pd
raw_df = pd.read_csv("../sample_data/pirep_sample.csv")
raw_df
```

|        VALID | URGENT   | AIRCRAFT   | REPORT                                                                   |   ICING | TURB                 | ARTCC   | PROD_ID                        |     LAT |       LON | geometry                                      |
|-------------:|:---------|:-----------|:-------------------------------------------------------------------------|--------:|:---------------------|:--------|:-------------------------------|--------:|----------:|:----------------------------------------------|
| 202412211455 | F        | A20N       | TUL UA /OV TUL360040/TM 1455/FL350/TP A20N/TB LGT CHOP/RM ZKC/FDC/CW     |     nan | LGT CHOP             | ZKC     | 202412211500-KMSC-UBUS01-PIREP | 36.8651 |  -95.8881 | POINT (-95.88811 36.86511066672067)           |
| 202405292133 | F        | CRJ9       | CMI UA /OV CMI340015/TM 2133/FL340/TP CRJ9/TB NEG                        |     nan | NEG                  | ZAU     | nan                            | 40.2673 |  -88.3872 | POINT (-88.38718094861466 40.267342183991076) |
| 202408011119 | F        | CRJ2       | RFD UA /OV RFD220010/TM 1119/FL100/TP CRJ2/WX -RA/TB LGT CHOP            |     nan | LGT CHOP             | ZAU     | 202408011100-KMSC-UBUS01-PIREP | 42.0638 |  -89.2322 | POINT (-89.23220639168281 42.06382122453652)  |
| 202404010017 | F        | CRJ9       | CMH UA /OV ROD/TM 0017/FL270/TP CRJ9/TB INTMT -CONS LGT CHOP             |     nan | INTMT -CONS LGT CHOP | ZID     | nan                            | 40.29   |  -84.04   | POINT (-84.04 40.29)                          |
| 202402191740 | F        | A320       | SZL UA /OV SZL/TM 1740/FL370/TP A320/TB LGT CONS/RM /ZKCFDC              |     nan | LGT CONS             | ZKC     | nan                            | 38.7303 |  -93.548  | POINT (-93.54799 38.73029)                    |
| 202401091731 | F        | SB20       | AKN UA /OV AKN221068/TM 1731/FL300/TP SB20/TB LGT-MOD/RM (ZAN            |     nan | LGT-MOD              | ZAN     | nan                            | 57.8146 | -158.07   | POINT (-158.07008037346228 57.81459319333878) |
| 202412161426 | F        | T6         | SSF UA /OV SSF/TM 1426/FL011/TP T6/SK OVC012-TOPUNKN                     |     nan | nan                  | ZHU     | 202412161400-KMSC-UBUS01-PIREP | 29.337  |  -98.4711 | POINT (-98.47105 29.33698)                    |
| 202408310408 | F        | A320       | BNA UA /OV BNA270030/TM 0408/FL200/TP A320/TB SMOOTH/RM ZME              |     nan | SMOOTH               | ZME     | 202408310400-KMSC-UBUS01-PIREP | 36.1189 |  -87.3082 | POINT (-87.30818860885368 36.11889)           |
| 202402081414 | F        | E75S       | FYV UA /OV RZC350015/TM 1414/FL120/TP E75S/TB LGT CHOP 110-120/RM ZKCFDC |     nan | LGT CHOP 110-120     | ZME     | nan                            | 36.4962 |  -94.1738 | POINT (-94.17383576325777 36.496221880629996) |
| 202411062313 | F        | GALX       | DMN UA /OV DMN/TM 2313/FL380/TP GALX/TB MOD TURB FL360/RM ZAB FDCS       |     nan | MOD TURB FL360       | ZAB     | 202411062300-KMSC-UBUS01-PIREP | 32.2623 | -107.721  | POINT (-107.72064 32.26231)                   |

### -- Derived Sample (with new columns)
```python
derived_df = pd.read_csv("../sample_data/pirep_derived_sample.csv")
derived_df.head(10)
```

| VALID               | URGENT   | AIRCRAFT   | REPORT                                                                                                                              | TURB                       | ARTCC   |     LAT |      LON | geometry                                      |   Altitude (ft in msl) |   Altitude (hpa) | turbulence_category   |   Approximated Altitude in hPa |
|:--------------------|:---------|:-----------|:------------------------------------------------------------------------------------------------------------------------------------|:---------------------------|:--------|--------:|---------:|:----------------------------------------------|-----------------------:|-----------------:|:----------------------|-------------------------------:|
| 2024-01-14 04:48:00 | F        | A321       | HRO UA /OV HRO135010/TM 0448/FL300/TP A321/TB MOD/RM ZME                                                                            | MOD                        | ZME     | 36.1437 | -93.0086 | POINT (-93.00855993940871 36.143659323851146) |                  30000 |              301 | MOD                   |                            300 |
| 2024-08-22 12:04:00 | F        | A319       | FOD UA /OV FOD/TM 1204/FL270/TP A319/TB OCNL LGT CHOP                                                                               | OCNL LGT CHOP              | ZMP     | 42.5497 | -94.2032 | POINT (-94.203203 42.549741)                  |                  27000 |              344 | LGT                   |                            350 |
| 2024-03-05 15:28:00 | F        | B752       | INL UA /OV INL/TM 1528/FL340/TP B752/TB NEG                                                                                         | NEG                        | ZMP     | 48.5595 | -93.3956 | POINT (-93.3955519 48.55946773)               |                  34000 |              250 | NEG                   |                            250 |
| 2024-03-08 13:44:00 | F        | A320       | GRD UA /OV IRQ/TM 1344/FL360/TP A320/TB INTMT LGT CHOP/RM ZTLFD-23                                                                  | INTMT LGT CHOP             | ZTL     | 33.71   | -82.16   | POINT (-82.16 33.71)                          |                  36000 |              227 | LGT                   |                            225 |
| 2024-02-28 19:11:00 | F        | E135       | PKB UA /OV JPU070025/TM 1911/FL050/TP E135/TB CONS LGT OCNL MOD CHOP /IC NEG                                                        | CONS LGT OCNL MOD CHOP     | ZOB     | 39.5825 | -80.863  | POINT (-80.86297521013846 39.58251993624374)  |                   5000 |              843 | MOD                   |                            850 |
| 2024-05-03 02:30:00 | F        | B77W       | ABR UA /OV ABR/TM 0230/FL330/TP B77W/TB OCNL LGT CAT                                                                                | OCNL LGT CAT               | ZMP     | 45.45   | -98.42   | POINT (-98.42 45.45)                          |                  33000 |              262 | LGT                   |                            250 |
| 2024-12-11 02:33:00 | F        | B737       | IND UA /OV HNN/TM 0233/FL260/TP B737/TB NEG                                                                                         | NEG                        | ZID     | 38.75   | -82.03   | POINT (-82.03 38.75)                          |                  26000 |              360 | NEG                   |                            350 |
| 2024-07-17 20:03:00 | F        | CRJ9       | CLT UA /OV CLT/TM 2003/FL190/TP CRJ9/TB NEG CHOP/IC NEG/RM ZTLFD-29                                                                 | NEG CHOP                   | ZTL     | 35.2225 | -80.9543 | POINT (-80.95431 35.22255)                    |                  19000 |              485 | NEG                   |                            500 |
| 2024-12-11 16:09:00 | F        | BCS3       | ILM UA /OV ILM/TM 1609/FL350/TP BCS3/TB LIGHT CHOP                                                                                  | LIGHT CHOP                 | ZDC     | 34.27   | -77.9    | POINT (-77.9 34.27)                           |                  35000 |              238 | LGT                   |                            250 |
| 2024-03-08 21:59:00 | F        | A320       | ORD UA /OV 35 SE ORD/TM 2159/FL080/TP A320/SK IMC/TB MOSTLY SMOOTH OCCL LT CHOP/IC NEG/RM LT TO MOD PRECIP ON ARRIVAL  OCCL LT CHOP | MOSTLY SMOOTH OCCL LT CHOP | ZAU     | 41.575  | -87.3769 | POINT (-87.37694851553685 41.57502763347903)  |                   8000 |              752 | NEG                   |                            750 |

These additional columns include altitude (hPa), timestamp parsing, and turbulence severity labels, preprocessed to match with ERA5 data.

---
## 2. ERA5 Reanalysis Data

**Source**: Copernicus Climate Data Store (CDS)  
[ERA5 hourly data on pressure levels](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=overview)

<!-- ### -- Weather variables used include: -->
- Weather variables used include:
  - Temperature, Relative Humidity, U/V Wind Components, Vertical Velocity
  - Cloud Liquid/Ice Water Content
  - Vorticity, Geopotential, Potential Vorticity, Divergence, etc.

- 28 pressure levels spanning tropospheric layers
- Downloaded using the `cdsapi` Python client


    ⓘ Over **1.5 TB** of ERA5 data processed across 12 months (2024)

---

## 3. ERA5 Retrieval Script (cdsapi)
```python
import cdsapi

client = cdsapi.Client()
client.retrieve(
    "reanalysis-era5-pressure-levels",
    {
        "product_type": "reanalysis",
        "variable": ["temperature", "u_component_of_wind", "v_component_of_wind", ...],
        "pressure_level": [400, 450, 500, 550, 700],
        "year": ["2024"],
        "month": ["01"],
        "day": ["01", ..., "31"],
        "time": ["00:00", ..., "23:00"],
        "format": "grib",
        "area": [72, -180, 15, -60],
    },
    "january_data.grib"
)
```

---

## 4. Weather Data Extraction (GRIB to CSV)
Once GRIB files are downloaded, weather variables per PIREP row were extracted using `xarray` and `cfgrib`. Each PIREP was matched by:
- UTC timestamp (nearest hour)
- Latitude and Longitude (nearest grid point)
- Altitude mapped to nearest pressure level

```python
import xarray as xr

grib_file = 'january_data.grib'
grib_data = xr.open_dataset(grib_file, engine='cfgrib')

# Define weather variables
weather_columns = ["temperature", "u_component_of_wind", "v_component_of_wind", "relative_humidity", ...]

# Automation function for data extraction
def extract_weather_data(row):
    lat, lon, time, pressure = row['LAT'], row['LON'], row['VALID'], row['Altitude (hpa)']
    ... # selection using xarray.sel
    return pd.Series({var: val for var in weather_columns})
```

---

## 5. Automation: Month-wise Extraction
Processed GRIB files day-by-day for multiple months:
- Automated matching for all pressure levels
- Saved enriched files like `january_with_weather_data.csv`, `february_with_weather_data.csv`, etc.

```python
# Example: Processing one GRIB file
def process_grib_file(grib_file, valid_pressure_levels, month_df, output_csv):
    grib_data = xr.open_dataset(grib_file, engine='cfgrib')

    month_df[weather_columns] = month_df.apply(
        lambda row: extract_weather_data(row, grib_data, valid_pressure_levels), axis=1
    )

    month_df.to_csv(output_csv, index=False)
    print(f"Updated data saved to {output_csv}")

# Example usage:
grib_file = 'jan.grib'
valid_pressure_levels = [400, 450, 500, 550, 700]
process_grib_file(grib_file, valid_pressure_levels, january_df, 'january_with_weather_data.csv')
```

---

##  Notes
- Only sample rows are shown in this notebook
- Full code and data are not included

📎 For questions or collaboration, please contact me
