## Units of Measure (Pay attention to this)
### Two Types of Values Stored:  
* Reported Value - What the monitor actually measured (in various units)  
* Standard Value - Converted to the official standard unit for that parameter  

Standard Units by Pollutant Type:
Criteria Pollutants:
* Ozone (44201): Parts per million (ppm)  
* CO (42101): Parts per million (ppm)  
* SO₂ (42401): Parts per billion (ppb)  
* NO₂ (42602): Parts per billion (ppb)  

Particulate Matter:
* PM2.5 (88101, 88502): Micrograms/cubic meter at Local Conditions (LC)  
* PM10 (81102, 85101): Micrograms/cubic meter  

Metals and Elements:
Varies by size fraction:
* TSP (Total Suspended Particulates): Usually micrograms/m³ at 25°C  
* PM2.5: Usually micrograms/m³ at LC  
* PM10: Often nanograms/m³  

Important Abbreviations:
* LC = Local Conditions (temperature/pressure at site)  
* STP = Standard Temperature & Pressure (25°C, 1 atm)  
* Parts per billion Carbon (ppbC) = Used for VOCs

In [39]:
### IMPORTS ###
import requests
import pandas as pd
import json
import time
from datetime import datetime, date
from typing import Dict, List, Optional

In [40]:
# GLOBAL ENVIRONMENT VARS:
MY_EMAIL = "huex@rose-hulman.edu"
MY_KEY = "orangecat14"

In [41]:
# Signing up for the API
def signup_for_api_key(email):
    url = f"https://aqs.epa.gov/data/api/signup"
    params = {'email': email}

    response = requests.get(url, params=params)
    data = response.json()

    return data

In [42]:
#api_key_data = signup_for_api_key(MY_EMAIL)

In [43]:
#api_key_data

In [44]:
# API TEST CALL - Fixed version
url = "https://aqs.epa.gov/data/api/dailyData/byState"

params = {
    'email': MY_EMAIL,
    'key': MY_KEY,
    'param': '44201',      # Ozone
    'bdate': '20230101',
    'edate': '20230131',
    'state': '01'          # Alabama - must be zero-padded!
}

response = requests.get(url, params=params)  # Use GET, not POST
data = response.json()

print(f"Status: {response.status_code}")
print(f"Header: {data.get('Header', [{}])[0].get('status', 'Unknown')}")

Status: 200
Header: Success


In [45]:
print(response.url)

https://aqs.epa.gov/data/api/dailyData/byState?email=huex%40rose-hulman.edu&key=orangecat14&param=44201&bdate=20230101&edate=20230131&state=01


In [46]:
#print(f"{data.get("Data")}")

df = pd.DataFrame(data["Data"])
print(f"Got {len(df)} rows")
print(f"Columns: {list(df.columns)}")

df

Got 248 rows
Columns: ['state_code', 'county_code', 'site_number', 'parameter_code', 'poc', 'latitude', 'longitude', 'datum', 'parameter', 'sample_duration_code', 'sample_duration', 'pollutant_standard', 'date_local', 'units_of_measure', 'event_type', 'observation_count', 'observation_percent', 'validity_indicator', 'arithmetic_mean', 'first_max_value', 'first_max_hour', 'aqi', 'method_code', 'method', 'local_site_name', 'site_address', 'state', 'county', 'city', 'cbsa_code', 'cbsa', 'date_of_last_change']


Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,...,method_code,method,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,01,073,0023,44201,1,33.553056,-86.815000,WGS84,Ozone,1,...,087,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2024-05-24
1,01,073,0023,44201,1,33.553056,-86.815000,WGS84,Ozone,W,...,087,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2024-05-24
2,01,073,0023,44201,1,33.553056,-86.815000,WGS84,Ozone,1,...,087,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2024-05-24
3,01,073,0023,44201,1,33.553056,-86.815000,WGS84,Ozone,W,...,087,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2024-05-24
4,01,073,0023,44201,1,33.553056,-86.815000,WGS84,Ozone,1,...,087,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2024-05-24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
243,01,049,9991,44201,1,34.289001,-85.970065,NAD83,Ozone,W,...,047,INSTRUMENTAL - ULTRA VIOLET,Sand Mountain,Sand Mountain Alabama Agricultural Experiment ...,Alabama,DeKalb,Crossville,22840,"Fort Payne, AL",2024-05-24
244,01,049,9991,44201,1,34.289001,-85.970065,NAD83,Ozone,1,...,047,INSTRUMENTAL - ULTRA VIOLET,Sand Mountain,Sand Mountain Alabama Agricultural Experiment ...,Alabama,DeKalb,Crossville,22840,"Fort Payne, AL",2024-05-24
245,01,049,9991,44201,1,34.289001,-85.970065,NAD83,Ozone,W,...,047,INSTRUMENTAL - ULTRA VIOLET,Sand Mountain,Sand Mountain Alabama Agricultural Experiment ...,Alabama,DeKalb,Crossville,22840,"Fort Payne, AL",2024-05-24
246,01,049,9991,44201,1,34.289001,-85.970065,NAD83,Ozone,W,...,047,INSTRUMENTAL - ULTRA VIOLET,Sand Mountain,Sand Mountain Alabama Agricultural Experiment ...,Alabama,DeKalb,Crossville,22840,"Fort Payne, AL",2024-05-24


In [47]:
# Date of last change is the last time the data was changed not the day of collection

In [48]:
def fmt_yyyymmdd(d):
    # Accepts str, date/datetime, or pandas Timestamp and returns 'YYYYMMDD'
    return pd.to_datetime(d).strftime('%Y%m%d')

In [49]:
fmt_yyyymmdd(date(2020, 5, 17))

'20200517'

In [50]:
def fmt_scode(sc):
    return f"{sc:02d}"

In [51]:
fmt_scode(1)

'01'

In [16]:
ALL_PARAMS = {
    "Carbon Monoxide": {
        "parameter_code": 42101,
        "abbreviation": "CO",
        "description": "Carbon monoxide concentration (ppm)"
    },
    "Nitrogen Dioxide": {
        "parameter_code": 42602,
        "abbreviation": "NO2",
        "description": "Nitrogen dioxide concentration (ppb)"
    },
    "Ozone": {
        "parameter_code": 44201,
        "abbreviation": "O3",
        "description": "Ozone concentration (ppm)"
    },
    "Sulfur Dioxide": {
        "parameter_code": 42401,
        "abbreviation": "SO2",
        "description": "Sulfur dioxide concentration (ppb)"
    },
    "PM10 (Coarse Particulate Matter)": {
        "parameter_code": 81102,
        "abbreviation": "PM10",
        "description": "Particulate matter with diameter ≤ 10 μm (µg/m³)"
    },
    "PM2.5 (Fine Particulate Matter)": {
        "parameter_code": 88101,
        "abbreviation": "PM2.5",
        "description": "Particulate matter with diameter ≤ 2.5 μm (µg/m³)"
    },
    "Lead": {
        "parameter_code": 14129,
        "abbreviation": "Pb (TSP) LC",
        "description": "Lead measured as total suspended particulates (µg/m³)"
    }
}

In [37]:
def get_daily_data_per_year(start_year, params):
    url = "https://aqs.epa.gov/data/api/dailyData/byState"
    all_data = []

    # get the end_date (one year from start)
    start_date = date(start_year, 1, 1)
    end_date   = date(start_year, 12, 31)

    # Convert dates to a single string
    start_date = fmt_yyyymmdd(start_date)
    end_date   = fmt_yyyymmdd(end_date)

    # Get all the parameter codes
    param_codes = [params["parameter_code"] for params in ALL_PARAMS.values()]

    for param_code in param_codes:
        for state in range(0, 56):
            time.sleep(5) # Wait 5 per api limitations
            state_code = fmt_scode(state)

            params = {
                'email': MY_EMAIL,
                'key': MY_KEY,
                'param': param_code,
                'bdate': start_date,
                'edate': end_date,
                'state': state_code
            }

            response = requests.get(url, params=params)  # Use GET, not POST
            data = response.json()
            if data:
                all_data.extend(data["Data"])
    return pd.DataFrame(all_data)


In [38]:
test_df = get_daily_data_per_year(2020, ALL_PARAMS)

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

In [35]:
test_df

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,...,method_code,method,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,01,073,0023,42101,2,33.553056,-86.815000,WGS84,Carbon monoxide,1,...,093,INSTRUMENTAL - GAS FILTER CORRELATION CO ANALYZER,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2021-10-30
1,01,073,0023,42101,2,33.553056,-86.815000,WGS84,Carbon monoxide,Z,...,093,INSTRUMENTAL - GAS FILTER CORRELATION CO ANALYZER,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2023-02-05
2,01,073,0023,42101,2,33.553056,-86.815000,WGS84,Carbon monoxide,1,...,093,INSTRUMENTAL - GAS FILTER CORRELATION CO ANALYZER,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2021-10-30
3,01,073,0023,42101,2,33.553056,-86.815000,WGS84,Carbon monoxide,Z,...,093,INSTRUMENTAL - GAS FILTER CORRELATION CO ANALYZER,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2023-02-05
4,01,073,0023,42101,2,33.553056,-86.815000,WGS84,Carbon monoxide,1,...,093,INSTRUMENTAL - GAS FILTER CORRELATION CO ANALYZER,North Birmingham,"NO. B'HAM,SOU R.R., 3009 28TH ST. NO.",Alabama,Jefferson,Birmingham,13820,"Birmingham-Hoover, AL",2021-10-30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57042,01,109,0003,14129,2,31.790479,-85.978974,NAD83,Lead (TSP) LC,7,...,813,HI-VOL - ICP-MS FRM; Heated ultrasonic 1.03M ...,TROY LEAD,HENDERSON ROAD,Alabama,Pike,Troy,45980,"Troy, AL",2024-03-20
57043,01,109,0003,14129,2,31.790479,-85.978974,NAD83,Lead (TSP) LC,7,...,813,HI-VOL - ICP-MS FRM; Heated ultrasonic 1.03M ...,TROY LEAD,HENDERSON ROAD,Alabama,Pike,Troy,45980,"Troy, AL",2024-03-20
57044,01,109,0003,14129,2,31.790479,-85.978974,NAD83,Lead (TSP) LC,7,...,813,HI-VOL - ICP-MS FRM; Heated ultrasonic 1.03M ...,TROY LEAD,HENDERSON ROAD,Alabama,Pike,Troy,45980,"Troy, AL",2024-03-20
57045,01,109,0003,14129,2,31.790479,-85.978974,NAD83,Lead (TSP) LC,7,...,813,HI-VOL - ICP-MS FRM; Heated ultrasonic 1.03M ...,TROY LEAD,HENDERSON ROAD,Alabama,Pike,Troy,45980,"Troy, AL",2024-03-20
