## eGRID script to get localized carbon intensity of the grid

The eGRID database provides detailed information of the carbon intensity of electricity generation in the United States. Since 2018, they have released it every January, with the latest data being from 2 years prior. So currently, in July 2024, the latest eGRID data is from 2022 and was released in January 2024. This script should be re-run each year to incorporate the latest data. 

>  Interactive tool: https://www.epa.gov/egrid/power-profiler

In [1]:
# change these values when a new dataset is released

latest_released_year = 2022

# "Historical Zip Codes" dataset URL
# linked from https://www.epa.gov/energy/power-profiler, "Historical Zip Codes (XLSX)"
historical_zips_url = 'https://www.epa.gov/system/files/documents/2023-05/Power%20Profiler%20Historical%20Zip%20Codes.xlsx'

# eGRID dataset URLs by year
# linked from https://www.epa.gov/egrid/download-data and https://www.epa.gov/egrid/historical-egrid-data
egrid_urls = {
  2018: 'https://www.epa.gov/sites/default/files/2020-03/egrid2018_data_v2.xlsx',
  2019: 'https://www.epa.gov/sites/default/files/2021-02/egrid2019_data.xlsx',
  2020: 'https://www.epa.gov/system/files/documents/2022-09/eGRID2020_Data_v2.xlsx',
  2021: 'https://www.epa.gov/system/files/documents/2023-01/eGRID2021_data.xlsx',
  2022: 'https://www.epa.gov/system/files/documents/2024-01/egrid2022_data.xlsx',
}

In [2]:
# imports
import pandas as pd
import json

# since 2018, the dataset is released annually (before that it was inconsistent)
# so we can include all years from 2018 to the latest released year, inclusive
years = range(2018, latest_released_year + 1)

# dictionary to store the output
output = {}

The eGRID has data by state or by 27 eGRID regions.

> The 27 eGRID subregions in the US are defined by EPA using data from the Energy Information Administration (EIA) and the North American Electric Reliability Corporation (NERC). The subregions are defined to limit the amount of imports and exports across regions in order to best represent the electricity used in each of the subregions. More information can be found in section 3.4.2 of the eGRID Technical Support Document.

Although it might be easier to use state-level data, the eGRID regions are more accurate. We will need to use ZIP codes to get the eGRID region for each location. EPA has a ZIP code to eGRID region mapping spreadsheet. Let's include this in our output file.

The ZIP code to eGRID region mapping could change, so we should make sure to update this URL when new eGRID data is released.

In [3]:
historical_zips_df = pd.read_excel(historical_zips_url, 'Combined', dtype=str)
for y in years:
  output[y] = {
    'regions_zips': historical_zips_df.groupby(y)['ZIP (character)'].apply(list).to_dict()
  }

Now let's include the carbon intensity for each region, also by year.
The field we will be using for carbon intensity is SRC2ERTA, which is described as "eGRID subregion annual CO2 equivalent total output emission rate" and is in units of kg CO2 per MWh.

In [4]:
for year, url in egrid_urls.items():
  egrid_df = pd.read_excel(url, 'SRL' + str(year)[-2:], skiprows=[0])
  output[year]['regions_src2erta'] = egrid_df[['SUBRGN', 'SRC2ERTA']].set_index('SUBRGN')['SRC2ERTA'].to_dict()

Dump `output` to a file called `egrid_carbon_by_year.py`

In [5]:
with open('../src/emcommon/metrics/footprint/egrid_carbon_by_year.py', 'w') as f:
  f.write("egrid_carbon_by_year = " + json.dumps(output))

In [6]:
def get_egrid_carbon_intensity(year: int, zip: str) -> float:
  """
  Returns the estimated carbon intensity of the electricity grid in the given zip code for the given year.
  (units in kg CO2e per MWh)
  :param year: The year to get the data for, e.g. 2022
  :param zip: The 5-digit zip code to get the data for; e.g. "45221" (Cincinnati), "02115" (Boston)
  """
  year = str(year)
  with open('egrid_carbon_by_year.json', 'r') as f:
    data = json.load(f)
    try:
      # find which region the zip code is in
      region = [k for k in data[year]['regions_zips'] if zip in data[year]['regions_zips'][k]][0]
      return data[year]['regions_src2erta'][region]
    except KeyError:
      return None

print(get_egrid_carbon_intensity(2022, "45221"))

FileNotFoundError: [Errno 2] No such file or directory: 'egrid_carbon_by_year.json'

In [None]:
from ....emcommon import logger as Logger
Logger.log_debug(get_egrid_carbon_intensity(2022, "45221"))