### 🛰️ EDGAR GHG Emission Data Processing and Clipping Script

This repository contains a Python script for extracting, processing, and spatially clipping greenhouse gas (GHG) emission data from the EDGAR (Emission Database for Global Atmospheric Research) dataset at the subsector and annual level.

### 📘 Overview

The script automates the process of reading NetCDF emission files from the EDGAR Global Warming Potential (GWP) dataset, computes annual subsector-wise emissions, and clips the data to a specified geographical boundary (in this case, the Ranchi Municipal Corporation area). The processed outputs are saved as GeoTIFF raster files for each year and subsector.

### 🌍 About EDGAR GWP Dataset

The EDGAR (Emission Database for Global Atmospheric Research) provides global gridded data of anthropogenic greenhouse gas emissions, compiled by the European Commission Joint Research Centre (JRC).

- Dataset Used: EDGARv8 (2025 release)
- Variable: emissions
- Spatial Resolution: ~0.1° × 0.1°
- Temporal Coverage: 1970–2024
- GWP Standard: 100-year Global Warming Potential (GWP100) based on IPCC AR5

The GWP (Global Warming Potential) metric expresses the climate impact of non-CO₂ gases (like CH₄, N₂O) in terms of equivalent CO₂ emissions over a 100-year time horizon. For instance, 1 tonne of CH₄ has a GWP100 of approximately 28 tonnes CO₂-equivalent.

### 🧭 Features of the Script

- Iterates through multiple years (1970–2024) and EDGAR subsectors.
- Handles emission files dynamically using Python’s glob library.
- Reads NetCDF data (.nc) using xarray.
- Clips emission rasters to the target shapefile boundary using rioxarray.
- Writes output GeoTIFFs for each subsector-year combination.
- Includes error handling for missing or corrupt files.

In [None]:
import xarray as xr
import glob
import os
import rioxarray
import geopandas as gpd

# Define years and subsectors with their abbreviations
years = range(1970, 2025)

subsectors = {
    'AGS': 'Agricultural soils',
    'TNR_Aviation_CDS': 'Aviation climbing&descent',
    'TNR_Aviation_CRS': 'Aviation cruise',
    'TNR_Aviation_SPS': 'Aviation supersonic',
    'CHE': 'Chemical processes',
    'TNR_Aviation_LTO': 'Aviation landing&takeoff',
    'IND': 'Combustion for manufacturing',
    'RCO': 'Energy for buildings',
    'ENF': 'Enteric fermentation',
    'PRO_FFF': 'Fuel exploitation',
    'IDE': 'Indirect emissions from NOX and NH3',
    'N2O': 'Indirect N2O emissions from agriculture',
    'IRO': 'Iron and steel production',
    'MNM': 'Manure management',
    'NEU': 'Non energy use of fuels',
    'NFE': 'Non-ferrous metals production',
    'NMM': 'Non-metallic minerals production',
    'REF_TRF': 'Oil Refinaries and Transformation Industry',
    'ENE': 'Power industry',
    'TNR_Other': 'Railways pipelines off road transport',
    'TRO': 'Road transportation',
    'TNR_Ship': 'Shipping',
    'SWD_INC': 'Solid waste incineration',
    'SWD_LDF': 'Solid waste landfills',
    'PRU_SOL': 'Solvents and products use',
    'WWT': 'Waste water handling',
    'AWB': 'Agricultural waste burning'
}

base_dir = '/Users/Documents/Input_GWP_Data'
shapefile_path = 'Shapefile Location'
output_dir = '/Users/Desktop/Output_GWP'

# Load the shapefile
mask = gpd.read_file(shapefile_path)

for year in years:
    for subsector_code, subsector_name in subsectors.items():
        subsector_annual_sum = None  # Initialize sum as None to detect if any data was added

        file_pattern = f'{base_dir}/{subsector_code}_emi_nc/EDGAR_2025_GHG_GWP_100_AR5_GHG_{year}_{subsector_code}_emi.nc'
        files = glob.glob(file_pattern)
        if not files:
            continue

        for file in files:
            try:
                with xr.open_dataset(file) as ds:
                    data = ds['emissions']
                    if subsector_annual_sum is None:
                        subsector_annual_sum = data.copy()
                    else:
                        subsector_annual_sum += data
            except Exception as e:
                print(f"Error processing file {file}: {e}")

        # Check if data was added and process if so
        if subsector_annual_sum is not None:
            try:
                subsector_annual_sum.rio.write_crs("epsg:4326", inplace=True)
                subsector_annual_sum = subsector_annual_sum.rio.clip(mask.geometry, mask.crs, drop=True)
                output_file = f'{output_dir}/SubSector_{subsector_code}_{year}.tif'
                subsector_annual_sum.rio.to_raster(output_file)
                print(f"Saved {output_file}")
            except Exception as e:
                print(f"Error processing subsector {subsector_code} for year {year}: {e}")

print("Data processing complete.")

### ⚡️ Sector-wise Aggregation of EDGAR GWP Data

This second code block extends the processing to a sectoral aggregation level, grouping multiple EDGAR subsectors into four major emission sectors — Energy, Agriculture, Transport, and Waste.

### 🧩 Purpose

The script computes total greenhouse gas emissions (in CO₂-equivalent) for each major sector and year (1970–2023), using data from the EDGAR GWP100 (AR5) dataset. It then clips the aggregated sectoral rasters to the defined shapefile boundary and exports them as GeoTIFF files.

In [None]:
import xarray as xr
import glob
import os
import rioxarray
import geopandas as gpd

# Define years and sector mapping
years = range(1970, 2024)

sector_mapping = {
    'Energy': ['CHE', 'IND', 'RCO', 'PRO_FFF', 'IRO', 'NEU', 'NFE', 'NMM', 'REF_TRF', 'ENE', 'PRU_SOL'],
    'Agriculture': ['AGS', 'AWB', 'MNM', 'ENF', 'N2O', 'IDE'],
    'Transport': ['TNR_Aviation_CDS', 'TNR_Aviation_CRS', 'TNR_Aviation_LTO', 'TNR_Aviation_SPS',
                  'TNR_Other', 'TRO', 'TNR_Ship'],
    'Waste': ['SWD_INC', 'SWD_LDF', 'WWT']
}

base_dir = '/Users/Documents/Input_GWP_Data'
shapefile_path = 'Shapefile Location'
output_dir = '/Users/Desktop/Output_GWP'

# Load the shapefile
mask = gpd.read_file(shapefile_path)

for year in years:
    sector_annual_sums = {sector: None for sector in sector_mapping}  # Initialize sector sums
    
    for sector, subsectors in sector_mapping.items():
        for subsector_code in subsectors:
            file_pattern = f'{base_dir}/{subsector_code}_emi_nc/EDGAR_2024_GHG_GWP_100_AR5_GHG_{year}_{subsector_code}_emi.nc'
            files = glob.glob(file_pattern)
            if not files:
                continue
            
            for file in files:
                try:
                    with xr.open_dataset(file) as ds:
                        data = ds['emissions']
                        if sector_annual_sums[sector] is None:
                            sector_annual_sums[sector] = data.copy()
                        else:
                            sector_annual_sums[sector] += data
                except Exception as e:
                    print(f"Error processing file {file}: {e}")
    
    # Save the results per sector
    for sector, aggregated_data in sector_annual_sums.items():
        if aggregated_data is not None:
            try:
                aggregated_data.rio.write_crs("epsg:4326", inplace=True)
                aggregated_data = aggregated_data.rio.clip(mask.geometry, mask.crs, drop=True)
                output_file = f'{output_dir}/{sector}_GWP_{year}.tif'
                aggregated_data.rio.to_raster(output_file)
                print(f"Saved {output_file}")
            except Exception as e:
                print(f"Error processing sector {sector} for year {year}: {e}")

print("Data processing complete.")
