# Data Processing

The purpose of this notebook is to process the data and keep it separate from the main analysis and visualization. We will write the code unnecessary for the user to interact with such as functions, import libraries, and processing the data into sheets of data we can interpret.

## Table of Contents
* content

## Notebooks
* [Overview Notebook](airpollution.ipynb)
* [Processing Notebook](dataprocessing.ipynb)
* [Analysis Notebook](data-analysis.ipynb)

## Import Libraries

Here are the libraries we are going to use to graph and analyze our data.

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from glob import glob
from matplotlib.pyplot import figure 

## Functions

Here are the functions we have created to analyze the datasets we are looking at and graph them to display trends and comparisons.

In [2]:
def concatinate(y):
    
    path = y
    data = glob(path + '*csv')
    data.sort()
    return pd.concat([pd.read_csv(f) for f in data])

def createDataframe(x):

    x['ozone_rolling_mean_365'] = x['Ozone_AQI_Value'].rolling(window=365).mean()
    x['particle_rolling_mean_365'] = x['PM2.5_AQI_Value'].rolling(window=365).mean()
    data = pd.DataFrame({'X' : x['ozone_rolling_mean_365'],
                         'Y' : x['particle_rolling_mean_365'],
                         'T' : x['Date']})
    return data

def aqi_plot(x): 
    
    ax = x.plot.line('T',['X', 'Y'], figsize = (15, 10), title = 'Air Quality Trend')
    ax.set_xlabel('Time (1985-2020)')
    ax.set_ylabel('Air Quality Index (AQI)')
    ax.legend(['Ozone (O3)', 'Particulate Matter (PM2.5)'])
    ax.grid(True)
    
def plot_all(x):
    
    data = x[0]
    
    for f in range (1, len(x)):
        data = pd.merge(data, x[f], how = 'inner', on = 'T')
    ax = data.plot.line('T', ['X_x', 'X_y', 'X'], figsize = (40, 10), title = 'Ozone Quality Trend', colormap = 'Paired')
    ax.set_xlabel('Time (1985-2020)')
    ax.set_ylabel('Average AQI')
    ax.legend(['NY-NJ-PA', 'TX', 'MA-NH', 'CA', 'PA-NJ-DE-MD', 'WA', 'MN_WI', 'IL-IN-WI', 'GA', 
              'MI', 'FL', 'AZ', 'DC-VA-MD-WV', 'CO', 'MO-IL'])
    ax.grid(True)
    
    bx = data.plot.line('T', ['X_y', 'Y_y', 'Y'], figsize = (40, 10), title = 'Particulate Matter Quality Trend', colormap = 'Paired')
    bx.set_xlabel('Time (1985-2020)')
    bx.set_ylabel('Average AQI')
    bx.legend(['NY-NJ-PA', 'TX', 'MA-NH', 'CA', 'PA-NJ-DE-MD', 'WA', 'MN-WI', 'IL-IN-WI', 'GA', 
              'MI', 'FL', 'AZ', 'DC-VA-MD-WV', 'CO', 'MO-IL'])
    bx.grid(True)


### Testing Merging Dataframes For Cleaner Graphs

In [3]:
def createOzoneDataframe(x):
    df = x
    df.drop(['PM2.5_AQI_Value'], axis = 1)
    df['ozone_rolling_mean_365'] = df['Ozone_AQI_Value'].rolling(window=365).mean()
    data = pd.DataFrame({'X' : df['ozone_rolling_mean_365'],
                         'T' : df['Date']})
    
    return data

def createParticleDataframe(x):
    df = x
    df.drop(['Ozone_AQI_Value'], axis = 1)
    x['particle_rolling_mean_365'] = x['PM2.5_AQI_Value'].rolling(window=365).mean()
    data = pd.DataFrame({'Y' : x['particle_rolling_mean_365'],
                         'T' : x['Date']})
    return data

def merge_plots(x, y):
    data = pd.merge(x, y, how = 'inner', on = 'T')
    return data

## Processing The Data

Here we access the file locations for each metropolitan statistical area and use the methods to create dataframes we can work with so we can plot them.

In [4]:
ny_nj_pa_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/NewYork-Newark-JerseyCity/')
ny_nj_pa = (createDataframe(ny_nj_pa_files))
il_in_wi_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Chicago-Naperville-Elgin/')
il_in_wi = (createDataframe(il_in_wi_files))
tx_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Dallas-FortWorth-Arlington/')
tx = (createDataframe(tx_files))
ga_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Atlanta-SandySprings-Roswell/')
ga = (createDataframe(ga_files))
ma_nh_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Boston-Cambridge-Newton/')
ma_nh = (createDataframe(ma_nh_files))
mi_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Detroit-Warren-Dearborn/')
mi = (createDataframe(mi_files))
ca_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/LosAngeles-LongBeach-Anaheim/')
ca = (createDataframe(ca_files))
fl_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Miami-FortLauderdale-WestPalmBeach/')
fl = (createDataframe(fl_files))
pa_nj_de_md_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Philadelphia-Camden-Wilmington/')
pa_nj_de_md = (createDataframe(pa_nj_de_md_files))
az_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Phoenix-Mesa-Scottsdale/')
az = (createDataframe(az_files))
wa_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Seattle-Tacoma-Bellevue/')
wa = (createDataframe(wa_files))
dc_va_md_wv_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Washington-Arlington-Alexandria/')
dc_va_md_wv = (createDataframe(dc_va_md_wv_files))
mn_wi_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Minneapolis-St.Paul-Bloomington/')
mn_wi = (createDataframe(mn_wi_files))
co_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/Denver-Aurora-Lakewood/')
co = (createDataframe(co_files))
mo_il_files = concatinate('/Users/smaslam/Desktop/airpollution/airpollution/Areas/St.Louis/')
mo_il = (createDataframe(mo_il_files))

areas = [ny_nj_pa, il_in_wi, tx, ga, ma_nh, mi, ca, fl, pa_nj_de_md, az, wa, dc_va_md_wv, mn_wi, co, mo_il]

### Testing Merge Graphs

In [5]:
test = ny_nj_pa_files
test2 = createOzoneDataframe(test)
test3 = createParticleDataframe(test)
test4 = merge_plots(test2, test3)