## eGRID script to get localized carbon intensity of the grid

The eGRID database provides detailed information of the carbon intensity of electricity generation in the United States. Since 2018, they have released it every January, with the latest data being from 2 years prior. So currently, in July 2024, the latest eGRID data is from 2022 and was released in January 2024. This script should be re-run each year to incorporate the latest data. 

>  Interactive tool: https://www.epa.gov/egrid/power-profiler

In [6]:
# change these values when a new dataset is released

latest_released_year = 2022

# "Historical Zip Codes" dataset URL
# linked from https://www.epa.gov/energy/power-profiler, "Historical Zip Codes (XLSX)"
historical_zips_url = 'https://www.epa.gov/system/files/documents/2023-05/Power%20Profiler%20Historical%20Zip%20Codes.xlsx'

# eGRID dataset URLs by year
# linked from https://www.epa.gov/egrid/download-data and https://www.epa.gov/egrid/historical-egrid-data
egrid_urls = {
  2018: 'https://www.epa.gov/sites/default/files/2020-03/egrid2018_data_v2.xlsx',
  2019: 'https://www.epa.gov/sites/default/files/2021-02/egrid2019_data.xlsx',
  2020: 'https://www.epa.gov/system/files/documents/2022-09/eGRID2020_Data_v2.xlsx',
  2021: 'https://www.epa.gov/system/files/documents/2023-01/eGRID2021_data.xlsx',
  2022: 'https://www.epa.gov/system/files/documents/2024-01/egrid2022_data.xlsx',
}

In [7]:
# imports
import pandas as pd
import json

# since 2018, the dataset is released annually (before that it was inconsistent)
# so we can include all years from 2018 to the latest released year, inclusive
years = range(2018, latest_released_year + 1)

# dictionary to store the output
output = {}

The eGRID has data by state or by 27 eGRID regions.

> The 27 eGRID subregions in the US are defined by EPA using data from the Energy Information Administration (EIA) and the North American Electric Reliability Corporation (NERC). The subregions are defined to limit the amount of imports and exports across regions in order to best represent the electricity used in each of the subregions. More information can be found in section 3.4.2 of the eGRID Technical Support Document.

Although it might be easier to use state-level data, the eGRID regions are more accurate. We will need to use ZIP codes to get the eGRID region for each location. EPA has a ZIP code to eGRID region mapping spreadsheet. Let's include this in our output file.

The ZIP code to eGRID region mapping could change, so we should make sure to update this URL when new eGRID data is released.

In [8]:
historical_zips_df = pd.read_excel(historical_zips_url, 'Combined', dtype=str)
for y in years:
  output[y] = {
    'regions_zips': historical_zips_df.groupby(y)['ZIP (character)'].apply(list).to_dict()
  }

Now let's include the carbon intensity for each region, also by year.
The field we will be using for carbon intensity is "Annual CO2 equivalent total output emission rate (lb/MWh)". For the national average, the column is called `USC2ERTA`, and for the per-region values it is called `SRC2ERTA`. We'll convert this to kg CO2 per MWh.

In [None]:
LBS_PER_KG = 0.45359237

for year, url in egrid_urls.items():
  # national average in the "USxx" sheet; e.g. "US22"
  # it only has one row
  national_df = pd.read_excel(url, 'US' + str(year)[-2:], skiprows=[0])
  output[year]['national_kg_per_mwh'] = national_df[['USC2ERTA']].iloc[0, 0] * LBS_PER_KG

  # per-region averages in the "SRLxx" sheet; e.g. "SRL22"
  regions_df = pd.read_excel(url, 'SRL' + str(year)[-2:], skiprows=[0])
  output[year]['regions_kg_per_mwh'] = regions_df[['SUBRGN', 'SRC2ERTA']] \
    .set_index('SUBRGN')['SRC2ERTA'] \
    .apply(lambda lbs: lbs * LBS_PER_KG) \
    .to_dict()

Dump `output` to a file called `egrid_carbon_by_year.py`

In [None]:
with open('../src/emcommon/metrics/footprint/egrid_carbon_by_year.py', 'w') as f:
  f.write("egrid_data = " + json.dumps(output))