## Units of Measure (Pay attention to this)
### Two Types of Values Stored:  
* Reported Value - What the monitor actually measured (in various units)  
* Standard Value - Converted to the official standard unit for that parameter  

Standard Units by Pollutant Type:
Criteria Pollutants:
* Ozone (44201): Parts per million (ppm)  
* CO (42101): Parts per million (ppm)  
* SO₂ (42401): Parts per billion (ppb)  
* NO₂ (42602): Parts per billion (ppb)  

Particulate Matter:
* PM2.5 (88101, 88502): Micrograms/cubic meter at Local Conditions (LC)  
* PM10 (81102, 85101): Micrograms/cubic meter  

Metals and Elements:
Varies by size fraction:
* TSP (Total Suspended Particulates): Usually micrograms/m³ at 25°C  
* PM2.5: Usually micrograms/m³ at LC  
* PM10: Often nanograms/m³  

Important Abbreviations:
* LC = Local Conditions (temperature/pressure at site)  
* STP = Standard Temperature & Pressure (25°C, 1 atm)  
* Parts per billion Carbon (ppbC) = Used for VOCs

In [17]:
### IMPORTS ###
import requests
import pandas as pd
import json
import time
from datetime import datetime, date
from typing import Dict, List, Optional

In [18]:
# GLOBAL ENVIRONMENT VARS:
MY_EMAIL = "huex@rose-hulman.edu"
MY_KEY = "greenhawk32"

In [19]:
# Signing up for the API
def signup_for_api_key(email):
    url = f"https://aqs.epa.gov/data/api/signup"
    params = {'email': email}

    response = requests.get(url, params=params)
    data = response.json()

    return data



In [20]:
# api_key_data = signup_for_api_key("huex@rose-hulman.edu")

In [21]:
# api_key_data

In [39]:
# Use the API to return daily summary for all states
# ALL STATES (minus hawaii, alaska, country of mexico, and puerto rico)
ALL_STATES = ['Alabama', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 
              'District Of Columbia', 'Florida', 'Georgia', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 
              'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 
              'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 
              'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
              'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas',
              'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']

ALL_PARAMS = {
    "Carbon Monoxide": {
        "parameter_code": 42101,
        "abbreviation": "CO",
        "description": "Carbon monoxide concentration (ppm)"
    },
    "Nitrogen Dioxide": {
        "parameter_code": 42602,
        "abbreviation": "NO2",
        "description": "Nitrogen dioxide concentration (ppb)"
    },
    "Ozone": {
        "parameter_code": 44201,
        "abbreviation": "O3",
        "description": "Ozone concentration (ppm)"
    },
    "Sulfur Dioxide": {
        "parameter_code": 42401,
        "abbreviation": "SO2",
        "description": "Sulfur dioxide concentration (ppb)"
    },
    "PM10 (Coarse Particulate Matter)": {
        "parameter_code": 81102,
        "abbreviation": "PM10",
        "description": "Particulate matter with diameter ≤ 10 μm (µg/m³)"
    },
    "PM2.5 (Fine Particulate Matter)": {
        "parameter_code": 88101,
        "abbreviation": "PM2.5",
        "description": "Particulate matter with diameter ≤ 2.5 μm (µg/m³)"
    },
    "Lead": {
        "parameter_code": 14129,
        "abbreviation": "Pb (TSP) LC",
        "description": "Lead measured as total suspended particulates (µg/m³)"
    }
}

def fmt_yyyymmdd(d):
    # Accepts str, date/datetime, or pandas Timestamp and returns 'YYYYMMDD'
    return pd.to_datetime(d).strftime('%Y%m%d')

def get_daily_data(email, key, start_date, end_date):
    url = "https://aqs.epa.gov/data/api/dailyData/byState"
    rows = []

    # Convert dates to a single string
    start_date = fmt_yyyymmdd(start_date)
    end_date = fmt_yyyymmdd(end_date)

    for state in range(1, 2):
        state = str(state)
        for param in ALL_PARAMS.values():
            time.sleep(5)
            params = {
                'email': email, 
                'key': key, 
                'param': str(param["parameter_code"]), 
                'bdate': start_date, 
                'edate': end_date, 
                'state': state
            }

            response = requests.get(url, params=params)
            data = response.json()

            if "Data" in data and data["Data"]:
                rows.extend(data["Data"])

    return pd.DataFrame(rows)

In [42]:
test_df = get_daily_data(MY_EMAIL, MY_KEY, date(2023,1,1), date(2023,1,31))

In [43]:
test_df