# Chapter 1 - EIA API - Python Client

In this section, we will see how query the EIA API with Python using. We will use the eia_api.py to send GET requests to the API.

We will continue with the same example we used before - the hourly demand of electricity for balancing authority subregion PGAE. As before, we will use the API dashboard to extract the GET request:

<figure>
<img src="./images/query-detail.png" width="100%" align="center"/></a>
<figcaption> Figure 1 - The GET request details for balancing authority subregion PGAE</figcaption>
</figure>

The `eia_api.py` file provides a set of functions to query data from the EIA API V2. This includes the following functions:

- `eia_get` - to send GET request for data
- `eia_metadata` - to send GET request for metadata
- `eia_backfill` - to send a GET request for large data (more than 5000 observations)

In [1]:
import eia_api as api

In addition, we will import the following libraries:

In [2]:
import os
import datetime
import plotly.express as px

## Pulling Metadata

Setting the api key and the api path to pull data:

In [3]:
api_key = os.getenv('EIA_API_KEY')

api_meta_path = "electricity/rto/region-sub-ba-data/"

Sending GET request for route metadata:

In [4]:
meta = api.eia_metadata(
    api_key = api_key,
    api_path = api_meta_path  
)

In [6]:
meta.meta

{'id': 'region-sub-ba-data',
 'name': 'Hourly Demand by Subregion',
 'description': 'Hourly demand by balancing authority subregion.  \n    Source: Form EIA-930\n    Product: Hourly Electric Grid Monitor',
 'frequency': [{'id': 'hourly',
   'alias': 'hourly (UTC)',
   'description': 'One data point for each hour in UTC time.',
   'query': 'H',
   'format': 'YYYY-MM-DD"T"HH24'},
  {'id': 'local-hourly',
   'alias': 'hourly (Local Time Zone)',
   'description': 'One data point for each hour in local time.',
   'query': 'LH',
   'format': 'YYYY-MM-DD"T"HH24TZH'}],
 'facets': [{'id': 'subba', 'description': 'Subregion'},
  {'id': 'parent', 'description': 'Balancing Authority'}],
 'data': {'value': {'aggregation-method': 'SUM',
   'alias': 'Demand',
   'units': 'megawatthours'}},
 'startPeriod': '2019-01-01T00',
 'endPeriod': '2024-12-25T08',
 'defaultDateFormat': 'YYYY-MM-DD"T"HH24',
 'defaultFrequency': 'hourly'}

## Sending A Simple GET Request

Setting a GET Request:

In [7]:
api_key = os.getenv('EIA_API_KEY')

api_path = "electricity/rto/region-sub-ba-data/data/"

frequency = "hourly"

facets = {
    "parent": "CISO",
    "subba": "PGAE"
}

In [8]:
df1 = api.eia_get(
    api_key = api_key,
    api_path = api_path,
    frequency = frequency,
    facets = facets
)

In [9]:
df1.url

'https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/data/?data[]=value&facets[parent][]=CISO&facets[subba][]=PGAE&frequency=hourly&api_key='

In [10]:
df1.parameters

{'api_path': 'electricity/rto/region-sub-ba-data/data/',
 'data': 'value',
 'facets': {'parent': 'CISO', 'subba': 'PGAE'},
 'start': None,
 'end': None,
 'length': None,
 'offset': None,
 'frequency': 'hourly'}

In [11]:
df1.data

Unnamed: 0,period,subba,subba-name,parent,parent-name,value,value-units
4999,2024-05-28 20:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9504,megawatthours
4998,2024-05-28 21:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9363,megawatthours
4997,2024-05-28 22:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9613,megawatthours
4996,2024-05-28 23:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9963,megawatthours
4995,2024-05-29 00:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10517,megawatthours
...,...,...,...,...,...,...,...
4,2024-12-25 04:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11532,megawatthours
3,2024-12-25 05:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11045,megawatthours
2,2024-12-25 06:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10849,megawatthours
1,2024-12-25 07:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10651,megawatthours


In [12]:
df1.data.dtypes

period         datetime64[ns]
subba                  object
subba-name             object
parent                 object
parent-name            object
value                   int64
value-units            object
dtype: object

## API Limitation

Let's plot the series:

In [22]:

px.line(df1.data, x= "period", y= "value")


The `start` and `end` arguments enable us to set a time range to the GET request. For example, let's pull data betweem January 1st, 2024 and February 24th, 2024:

In [14]:
start = datetime.datetime(2024, 1, 1, 1)
end = datetime.datetime(2024, 2, 24, 23)

df2 = api.eia_get(
    api_key = api_key,
    api_path = api_path,
    frequency = frequency,
    facets = facets,
    start = start,
    end = end
)

In [16]:
px.line(df2.data, x="period", y="value")

## Handling A Large Data Request

When we have to pull a series with a number of observations that exceed the API limitation of 5000 observations per call, use the `eia_backfill` function. The function splits the request into multiple small requests, where the `offset` argument defines the size of each request. It is recommended not to use an offset larger than 2500 observations. For example, let's pull data since July 1st, 2018:

In [18]:
start = datetime.datetime(2019, 7, 1, 8)
end = datetime.datetime(2024, 2, 24, 23)
offset = 2250

df3 = api.eia_backfill(
  start = start,
  end = end,
  offset = offset,
  api_path= api_path,
  api_key = api_key,
  facets = facets)

In [19]:
df3.data



Unnamed: 0,period,subba,subba-name,parent,parent-name,value,value-units
2251,2019-07-01 00:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11643,megawatthours
2250,2019-07-01 01:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,12483,megawatthours
2249,2019-07-01 02:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,13148,megawatthours
2248,2019-07-01 03:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,13392,megawatthours
2247,2019-07-01 04:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,13337,megawatthours
...,...,...,...,...,...,...,...
4,2024-02-23 20:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10113,megawatthours
3,2024-02-23 21:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9365,megawatthours
2,2024-02-23 22:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,8969,megawatthours
1,2024-02-23 23:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,8656,megawatthours


In [20]:
p = px.line(df3.data, x="period", y="value")
p.show()