# Weather GRIB to CSV Conversion Workflow

This notebook converts ERA5 weather data files in GRIB format into CSV files for further analysis and integration with other datasets. The workflow is designed to automate the extraction and merging of weather variables for Delhi and Mumbai, ensuring the data is accessible and ready for downstream processing.

---

## Workflow Steps

1. **Import Required Libraries**  
   Load Python libraries for handling GRIB files, data manipulation, and file operations.

2. **Specify Input and Output Paths**  
   Define the directories containing the GRIB files and where the resulting CSV files will be saved.

3. **List GRIB Files**  
   Provide a list of GRIB files to process, covering both cities and years of interest.

4. **Convert GRIB to CSV**  
   For each GRIB file:
   - Open the file and extract all available datasets.
   - Convert each dataset to a pandas DataFrame.
   - Merge all DataFrames on common columns (`time`, `latitude`, `longitude`).
   - Save the merged DataFrame as a CSV file.

5. **Error Handling**  
   Gracefully handle missing or corrupted files and datasets, reporting any issues encountered during processing.

---

This pipeline ensures that raw ERA5 weather data is efficiently converted into a user-friendly CSV format, ready for analysis and integration with

In [None]:
# Import necessary libraries
import cfgrib
import pandas as pd
import warnings
from pathlib import Path
from functools import reduce

# Suppress future warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# File list (downloaded from https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download)
# see details in report and README.md
grib_files = [
    "weather_mumbai_2024.grib",
    "weather_mumbai_2023.grib",
    "weather_delhi_2024.grib",
    "weather_delhi_2023.grib"
]

input_dir = Path("../power-data") # Directory containing GRIB files 
output_dir = Path("../power-data") # Directory to save CSV files

# Convert each GRIB file to CSV
for grib_file in grib_files:
    file_path = input_dir / grib_file

    try:
        datasets = cfgrib.open_datasets(file_path)
    except Exception as e:
        print(f"Failed to open {grib_file}: {e}")
        continue

    dfs = []
    for i, ds in enumerate(datasets):
        try:
            df = ds.to_dataframe().reset_index()
            dfs.append(df)
        except Exception as e:
            print(f"Skipped dataset {i+1} in {grib_file}: {e}")

    if not dfs:
        print(f"No valid data in {grib_file}")
        continue

    df_merged = reduce(lambda left, right: pd.merge(
        left, right, on=["time", "latitude", "longitude"], how="outer"), dfs)

    output_csv = output_dir / f"{grib_file.replace('.grib', '_external.csv')}"
    df_merged.to_csv(output_csv, index=False)
    print(f"Saved: {output_csv.name}")


Ignoring index file '../power-data/weather_mumbai_2024.grib.5b7b6.idx' incompatible with GRIB file
Ignoring index file '../power-data/weather_mumbai_2023.grib.5b7b6.idx' incompatible with GRIB file


Saved: weather_mumbai_2024_external.csv


Ignoring index file '../power-data/weather_delhi_2024.grib.5b7b6.idx' incompatible with GRIB file


Saved: weather_mumbai_2023_external.csv


Ignoring index file '../power-data/weather_delhi_2023.grib.5b7b6.idx' incompatible with GRIB file


Saved: weather_delhi_2024_external.csv
Saved: weather_delhi_2023_external.csv


In [None]:
# reference:
# https://chatgpt.com/share/68b10a67-b718-800e-9a88-f1f9bcfe0ebe