# Introduction

This notebook presents an exploratory data analysis (EDA) of air quality data, with a focus on the deterioration observed in recent years. As climate change intensifies and forest fires become more frequent, the impact on air quality has become an increasingly personal issue to me. 

Through data analysis and visualizations, my goal is to demonstrate this decline in air quality, specifically in New York City, and to explore the potential contributing factors behind this trend.

Some key questions I have:
- How has air quality in NYC changed over time?
- What correlations exist between air quality and factors like temperature, seasonal variation, and forest fires?
- Is there a significant decline in air quality during the summer months? 

# Data Collection and Sources

For this project, I'll be using the [EPA's Air Quality System (AQS) API](https://aqs.epa.gov/aqsweb/documents/data_api.html#daily) to obtain historical daily air quality data. 

In [1]:
# Get API key
import requests 

def signup_for_aqs_api_key(email):
    url = 'https://aqs.epa.gov/data/api/signup'
    
    params = {
        'email': email
    }
    
    response = requests.post(url, data=params)
    
    if response.status_code == 200:
        print('Signup request sent successfully. Check your email for the API key.')
        print('Response:', response.text)
    else:
        print(f'Failed to sign up. Status code: {response.status_code}')
        print('Error message:', response.text)
        
# signup_for_aqs_api_key('iherman10@gmail.com')

In [4]:
# Load API key from .env file 
import os
from dotenv import load_dotenv 

load_dotenv() 
_API_KEY = os.getenv('AQS_API_KEY')

if _API_KEY:
    print('API key loaded successfully')
else:
    print('API key not found. Check your .env file')

API key loaded successfully


In [20]:
# Function to fetch API lists
import json
import pandas as pd

def fetch_api_list(endpoint, params, overwrite=False):
    data_dir = 'lists'
    os.makedirs(data_dir, exist_ok=True)
    
    path = os.path.join(data_dir, f'{endpoint}.json')
    
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', None)
    
    if os.path.exists(path) and not overwrite:
        with open(path, 'r') as infile:
            data = json.load(infile)
        df = pd.DataFrame(data)
        print(df[['code', 'value_represented']])
    else:
        url = f'https://aqs.epa.gov/data/api/list/{endpoint}'

        response = requests.get(url, params=params)

        if response.status_code == 200:
            data = response.json()['Data']
            with open(path, 'w') as outfile:
                json.dump(data, outfile)
            df = pd.DataFrame(data)
            print(df[['code', 'value_represented']])
        else: 
            print(f'Failed to retrieve parameters. Status code: {response.status_code}')

In [19]:
# Fetch Parameter Classes (groups of parameters, like criteria or all)

params = {
    'email': 'iherman10@gmail.com', 
    'key': _API_KEY
}
fetch_api_list('classes', params)

                       code                                  value_represented
0               AIRNOW MAPS  The parameters represented on AirNow maps (881...
1                       ALL                    Select all Parameters Available
2            AQI POLLUTANTS                Pollutants that have an AQI Defined
3                 CORE_HAPS                         Urban Air Toxic Pollutants
4                  CRITERIA                                Criteria Pollutants
5                  CSN DART  List of CSN speciation parameters to populate ...
6                  FORECAST     Parameters routinely extracted by AirNow (STI)
7                      HAPS                           Hazardous Air Pollutants
8            IMPROVE CARBON                          IMPROVE Carbon Parameters
9        IMPROVE_SPECIATION  PM2.5 Speciated Parameters Measured at IMPROVE...
10                      MET                          Meteorological Parameters
11          NATTS CORE HAPS  The core list of toxics

In [21]:
# Fetch Parameters in a class (obtain the list of classes from the List - Parameter Classes service)
params = {
    'email': 'iherman10@gmail.com', 
    'key': _API_KEY, 
    'pc': 'PM2.5 MASS/QA' 
    # This refers to PM2.5 mass and quality assurance parameters. 
    # If you're interested in general PM2.5 mass concentration, this is likely the most relevant category. 
}
fetch_api_list('parametersByClass', params)

    code            value_represented
0  68101         Sample Flow Rate- CV
1  68102                Sample Volume
2  68103      Ambient Min Temperature
3  68104      Ambient Max Temperature
4  68105  Average Ambient Temperature
5  68106     Sample Min Baro Pressure
6  68107     Sample Max Baro Pressure
7  68108     Average Ambient Pressure
8  68109          Elapsed Sample Time
9  88101     PM2.5 - Local Conditions


In [None]:
# Get data


def download_aqi_data(api_key, start_date, end_date, file_name, overwrite=False):
    data_dir = 'data/air_quality'
    os.makedirs(data_dir, exist_ok=True)
    
    file_path = os.path.join(data_dir, file_name)
    
    if os.path.exists(file_path) and not overwrite: 
        print(f"File '{file_path}' already exists. Skipping download")
        return 
    
    url = 'https://aqs.epa.gov/data/api/dailyData/byCounty'
    
    # param: the AQS parameter code for data selection, using proprietary 5 digit codes. 
        # They may be obtained using the list parameters service. 
        # Up to 5 parameters may be requested, separated by commas.
    
    params = {
        'email':  'iherman10@gmail.com', 
        'key': api_key, 
        'param': 
        
    }


# Data Cleaning and Preprocessing

# Exploratory Data Analysis (EDA)

# Seasonal Trends and Anomalies

# Impact of Forest Fires and Temperature

# Conclusions and Key Insights

# Future Work