## Notebook function:
* Connect to API at US Energy Information Agency and download hourly electric grid data *
https://www.eia.gov

API URL:
https://api.eia.gov/v2/electricity/rto/region-data/data/?frequency=hourly&data[0]=value&start=2015-07-01T00&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

Method:  GET

Series description:
Hourly demand, day-ahead demand forecast, net generation, and interchange by balancing authority. Source: Form EIA-930 Product: Hourly Electric Grid Monitor

API Documentation:  https://www.eia.gov/opendata/documentation.php

*API URL for inventory of operable generators (monthly):*
https://api.eia.gov/v2/electricity/operating-generator-capacity/data/?frequency=monthly&data[0]=county&data[1]=latitude&data[2]=longitude&data[3]=nameplate-capacity-mw&data[4]=net-summer-capacity-mw&data[5]=net-winter-capacity-mw&data[6]=operating-year-month&data[7]=planned-derate-summer-cap-mw&data[8]=planned-derate-year-month&data[9]=planned-retirement-year-month&data[10]=planned-uprate-summer-cap-mw&data[11]=planned-uprate-year-month&start=2018-01&end=2022-12&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

*API URL for electric power operations for individual power plants (monthly):*
https://api.eia.gov/v2/electricity/facility-fuel/data/?frequency=monthly&data[0]=average-heat-content&data[1]=consumption-for-eg&data[2]=consumption-for-eg-btu&data[3]=generation&data[4]=gross-generation&data[5]=total-consumption&data[6]=total-consumption-btu&start=2018-01&end=2022-12&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

In [2]:
import requests
import json
import pandas as pd
import time
from datetime import datetime, timedelta

In [3]:
#retrieves data from EIA api, returns json with response, request, and api metadata
def getEIAdata(api_keystring, start, end, offset, length):
    url = (f"https://api.eia.gov/v2/electricity/rto/region-data/data/?{api_keystring}frequency=hourly&data[0]"
       f"=value&start={start}&end={end}&sort[0][column]=period&sort[0][direction]"
       f"=desc&offset={offset}&length={length}")
    try:
        response = requests.get(url)
    except:
        print(f'no response from {url}')
    try:
        response_json = json.dumps(response.json(), indent=4)
    except:
        print(response)
    return response_json

In [4]:
#returns subset of dictionary containing data items
def extractData(response_dict):
    data = response_dict['response']['data']
    return data

In [5]:
#extracts total number of items that matched API request
def extractTotalRows(response_dict):
    total_rows = response_dict['response']['total']
    return total_rows

In [6]:
#appends data to json file
def saveJSON(json_obj, data_file_path):
    with open(data_file_path, 'a') as fout:
        fout.write(json_obj)

In [7]:
#appends data to csv file
def saveCSV(data_dict, csv_path, header):
    df = pd.DataFrame.from_dict(data_dict)
    with open(csv_path, 'a') as fout:
        df.to_csv(fout, header=header, index=False, lineterminator='\n')

In [9]:
#flow
api_key = 'DryvLQciETN0UgsSlqTeeQnSfHj8sPif8tfUGKCg'
api_keystring = f"api_key={api_key}&"
row_limit = 5000
offset = 0
start_date = '2020-01-01'
end_date = '2020-07-03'
start_datetime = f"{start_date}T00" #API takes start and end hour in '2023-04-02T00' format
end_datetime = f"{end_date}T00"
api_chill_time = 69

json_path = 'eiadata.json'
csv_path = 'eia.csv'
with open(json_path, 'w') as overwrite:
    pass
with open(csv_path, 'w') as overwrite:
    pass

response_json = getEIAdata(api_keystring, start_datetime, end_datetime, offset, row_limit)
#create dictionary from json object
d = json.loads(response_json)
data = extractData(d)
saveJSON(response_json, json_path)
saveCSV(data, csv_path, header=True)
returned_rows = len(data)
total_rows = extractTotalRows(d)
print(total_rows)
call_count = 1
while call_count * row_limit < total_rows:
    offset = call_count * row_limit
    try:
        response_json = getEIAdata(api_keystring, start_datetime, end_datetime, offset, row_limit) 
    except UnboundLocalError:
        resume_time = datetime.now() + timedelta(minutes=api_chill_time)
        print(f"API response error. Lurk until {resume_time}")
        time.sleep(60*api_chill_time)
        continue       
    d = json.loads(response_json)
    data = extractData(d)
    #only save first and last chunk to json for examination
    #saveJSON(response_json, json_path)
    saveCSV(data, csv_path, header=False)
    call_count += 1
    print(call_count*row_limit)
    #time.sleep(10)
    if call_count % 20 == 0:
        time.sleep(69)
    #if call_count % 100 == 0:
        #time.sleep(5400)
length = total_rows - (call_count - 1)*row_limit
response_json = getEIAdata(api_keystring, start_datetime, end_datetime, offset+row_limit, length)
d = json.loads(response_json)
data = extractData(d)
saveJSON(response_json, json_path)
saveCSV(data, csv_path, header=False)


1274292
10000
15000
20000
25000
30000
35000
40000
45000
50000
55000
60000
65000
70000
75000
80000
85000
90000
95000
100000
105000
110000
115000
120000
125000
130000
135000
140000
145000
150000
155000
160000
165000
170000
175000
180000
185000
190000
195000
200000
205000
210000
215000
220000
225000
230000
235000
240000
245000
250000
255000
260000
265000
270000
275000
280000
285000
290000
295000
300000
305000
310000
315000
320000
325000
330000
335000
340000
345000
350000
355000
360000
365000
370000
375000
380000
385000
390000
395000
400000
405000
410000
415000
420000
425000
430000
435000
440000
445000
450000
455000
460000
465000
470000
475000
480000
485000
490000
495000
500000
505000
510000
515000
520000
525000
530000
535000
540000
545000
550000
555000
560000
565000
570000
575000
580000
585000
590000
595000
600000
605000
610000
615000
620000
625000
630000
635000
640000
645000
650000
655000
660000
665000
670000
675000
680000
685000
690000
695000
700000
705000
710000
715000
720000
725000
73

In [14]:
#change this to dataframe from csv
df = pd.DataFrame.from_dict(d['response']['data'])
print(df)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   period           5000 non-null   object
 1   respondent       5000 non-null   object
 2   respondent-name  5000 non-null   object
 3   type             5000 non-null   object
 4   type-name        5000 non-null   object
 5   value            5000 non-null   int64 
 6   value-units      5000 non-null   object
dtypes: int64(1), object(6)
memory usage: 273.6+ KB
