<h1><center>South Coast Air Quality Management District Sensors</center></h1>

In [1]:
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.ticker import FuncFormatter
import pandas as pd
import numpy as np
import seaborn as sns
from pandas.plotting import scatter_matrix
%matplotlib inline
from branca.element import Figure
from pylab import rcParams

import folium
from folium.features import DivIcon
import geopandas as gpd
pd.set_option("display.max_columns", None)

import requests

In this notebook, we are trying to get the information of the sensors from South Coast Air Quality Management District, in particular its' exact locations within the Los Angeles Neighborhood Councils. The sensors' data were taken from [EPA's Air Quality System (AQS)]( https://aqs.epa.gov/aqsweb/documents/data_api.html#sample). 

<h2>Getting Data Through Monitoring Station API</h2>

In [2]:
USER_ID = "susankol@alumni.usc.edu"
USER_KEY = "silvercat24" 

In [3]:
# Find the State Code
url_state = "https://aqs.epa.gov/data/api/list/states?email={}&key={}".format(USER_ID, USER_KEY)

results = requests.get(url_state).json()
#results
for i in range(len(results["Data"])): 
    if "California" in results["Data"][i]['value_represented']:
        print(results["Data"][i])

{'code': '06', 'value_represented': 'California'}


In [4]:
# Find the County Code
url_county = "https://aqs.epa.gov/data/api/list/countiesByState?email={}&key={}&state=06".format(USER_ID, USER_KEY)

results = requests.get(url_county).json()
results
#for i in range(len(results["Data"])): 
#    if "Simi Valley" in results["Data"][i]['value_represented']:
#        print(results["Data"][i])
#   if "Los Angeles" in results["Data"][i]['value_represented']:
#        print(results["Data"][i])

{'Header': [{'status': 'Success',
   'request_time': '2022-10-12T19:14:12-04:00',
   'url': 'https://aqs.epa.gov/data/api/list/countiesByState?email=susankol@alumni.usc.edu&key=silvercat24&state=06',
   'rows': 58}],
 'Data': [{'code': '001', 'value_represented': 'Alameda'},
  {'code': '003', 'value_represented': 'Alpine'},
  {'code': '005', 'value_represented': 'Amador'},
  {'code': '007', 'value_represented': 'Butte'},
  {'code': '009', 'value_represented': 'Calaveras'},
  {'code': '011', 'value_represented': 'Colusa'},
  {'code': '013', 'value_represented': 'Contra Costa'},
  {'code': '015', 'value_represented': 'Del Norte'},
  {'code': '017', 'value_represented': 'El Dorado'},
  {'code': '019', 'value_represented': 'Fresno'},
  {'code': '021', 'value_represented': 'Glenn'},
  {'code': '023', 'value_represented': 'Humboldt'},
  {'code': '025', 'value_represented': 'Imperial'},
  {'code': '027', 'value_represented': 'Inyo'},
  {'code': '029', 'value_represented': 'Kern'},
  {'code': 

In [5]:
# Find Parameter Classes (groups of parameters, like criteria or all)
url_class = "https://aqs.epa.gov/data/api/list/classes?email={}&key={}".format(USER_ID, USER_KEY)

results = requests.get(url_class).json()
results["Data"]

[{'code': 'AIRNOW MAPS',
  'value_represented': 'The parameters represented on AirNow maps (88101, 88502, and 44201)'},
 {'code': 'ALL', 'value_represented': 'Select all Parameters Available'},
 {'code': 'AQI POLLUTANTS',
  'value_represented': 'Pollutants that have an AQI Defined'},
 {'code': 'CORE_HAPS', 'value_represented': 'Urban Air Toxic Pollutants'},
 {'code': 'CRITERIA', 'value_represented': 'Criteria Pollutants'},
 {'code': 'CSN DART',
  'value_represented': 'List of CSN speciation parameters to populate the STI DART tool'},
 {'code': 'FORECAST',
  'value_represented': 'Parameters routinely extracted by AirNow (STI)'},
 {'code': 'HAPS', 'value_represented': 'Hazardous Air Pollutants'},
 {'code': 'IMPROVE CARBON', 'value_represented': 'IMPROVE Carbon Parameters'},
 {'code': 'IMPROVE_SPECIATION',
  'value_represented': 'PM2.5 Speciated Parameters Measured at IMPROVE sites'},
 {'code': 'MET', 'value_represented': 'Meteorological Parameters'},
 {'code': 'NATTS CORE HAPS',
  'value

In [6]:
# Parameters in a class (obtain the list of classes from the List - Parameter Classes service)
url_param = "https://aqs.epa.gov/data/api/list/parametersByClass?email={}&key={}&pc=AQI%20POLLUTANTS".format(USER_ID, USER_KEY)

results = requests.get(url_param).json()
results["Data"]

[{'code': '42101', 'value_represented': 'Carbon monoxide'},
 {'code': '42401', 'value_represented': 'Sulfur dioxide'},
 {'code': '42602', 'value_represented': 'Nitrogen dioxide (NO2)'},
 {'code': '44201', 'value_represented': 'Ozone'},
 {'code': '81102', 'value_represented': 'PM10 Total 0-10um STP'},
 {'code': '88101', 'value_represented': 'PM2.5 - Local Conditions'},
 {'code': '88502',
  'value_represented': 'Acceptable PM2.5 AQI & Speciation Mass'}]

In [7]:
# Find Sites by LA County
url_site = "https://aqs.epa.gov/data/api/list/sitesByCounty?email={}&key={}&state=06&county=037".format(USER_ID, USER_KEY)

results = requests.get(url_site).json()
results["Data"]

for i in range(len(results["Data"])): 
    if results["Data"][i]['value_represented'] != None:
        print(results["Data"][i])

{'code': '0002', 'value_represented': 'Azusa'}
{'code': '0016', 'value_represented': 'Glendora'}
{'code': '0018', 'value_represented': 'El Monte'}
{'code': '0030', 'value_represented': 'SB25 trailer at Hollenbeck School'}
{'code': '0031', 'value_represented': 'Wilmington-N. Mahar Ave'}
{'code': '0113', 'value_represented': 'West Los Angeles'}
{'code': '0201', 'value_represented': 'Carson'}
{'code': '0202', 'value_represented': 'Commerce-Ayers Ave'}
{'code': '0203', 'value_represented': 'City of Industry-Volkswagon'}
{'code': '0204', 'value_represented': 'City of Industry-Whitco'}
{'code': '0205', 'value_represented': 'Commerce-AT&SF RR'}
{'code': '0206', 'value_represented': 'SITE IS LOCATED ONE HALF MILE EAST OF THE I-57/I-60 INTERCHANGE'}
{'code': '1002', 'value_represented': 'Burbank'}
{'code': '1101', 'value_represented': 'UNKNOWN COORDINATE LOCATION'}
{'code': '1102', 'value_represented': 'ON BUILDING'}
{'code': '1103', 'value_represented': 'Los Angeles-North Main Street'}
{'code'

In [8]:
# Find Sites in Ventura County
url_site = "https://aqs.epa.gov/data/api/list/sitesByCounty?email={}&key={}&state=06&county=111".format(USER_ID, USER_KEY)

results = requests.get(url_site).json()
results["Data"]

for i in range(len(results["Data"])): 
    if results["Data"][i]['value_represented'] != None:
        print(results["Data"][i])

{'code': '0007', 'value_represented': 'Thousand Oaks'}
{'code': '0008', 'value_represented': 'SITE IN SIMI VALLEY LANDFILL. NEAREST TRAFFIC HWY 118, SO, 1/2 MI.'}
{'code': '0009', 'value_represented': 'Piru - Pacific'}
{'code': '1004', 'value_represented': 'Ojai - East Ojai Ave'}
{'code': '2002', 'value_represented': 'Simi Valley-Cochran Street'}
{'code': '2003', 'value_represented': 'Emma Wood State Beach, Ventura'}
{'code': '3001', 'value_represented': 'El Rio-Rio Mesa School #2'}


Now we will pick the sensors which located in or near the LA NCs area and use the API to get their Ozone, PM2.5, and PM 10 data if any.

<h3>Compton Air Quality Data</h3>

In [9]:
site = "1302"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [10]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

Comp_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [11]:
len(Comp_df)

3633

In [12]:
Comp_df['parameter'].unique()

array(['Ozone', 'PM2.5 - Local Conditions'], dtype=object)

<h3>LA Main Street Air Quality Data</h3>

In [13]:
# param = 44201,81102,88101,88502
site = 1103
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()
#results["Data"]

In [14]:
LA_df = pd.json_normalize(results["Data"])
LA_df.drop_duplicates()
LA_df['date_local'] = pd.to_datetime(LA_df['date_local'])
LA_df.sort_values(by=["date_local", "parameter"], inplace = True)
LA_df = LA_df.reset_index(drop=True)
LA_df.head(1)

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,event_type,observation_count,observation_percent,validity_indicator,arithmetic_mean,first_max_value,first_max_hour,aqi,method_code,method,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,6,37,1103,44201,1,34.06659,-118.22688,WGS84,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,No Events,24,100.0,Y,0.015083,0.038,14,,87,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,Los Angeles-North Main Street,"1630 N MAIN ST, LOS ANGELES",California,Los Angeles,Los Angeles,31080,"Los Angeles-Long Beach-Anaheim, CA",2021-10-31


In [15]:
LA_df.columns

Index(['state_code', 'county_code', 'site_number', 'parameter_code', 'poc',
       'latitude', 'longitude', 'datum', 'parameter', 'sample_duration_code',
       'sample_duration', 'pollutant_standard', 'date_local',
       'units_of_measure', 'event_type', 'observation_count',
       'observation_percent', 'validity_indicator', 'arithmetic_mean',
       'first_max_value', 'first_max_hour', 'aqi', 'method_code', 'method',
       'local_site_name', 'site_address', 'state', 'county', 'city',
       'cbsa_code', 'cbsa', 'date_of_last_change'],
      dtype='object')

In [16]:
LA_df = LA_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [17]:
LA_df.head(12)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,1103,44201,34.06659,-118.22688,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,24,100.0,0.015083,0.038,14,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"1630 N MAIN ST, LOS ANGELES"
1,1103,44201,34.06659,-118.22688,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-01,Parts per million,24,100.0,0.01425,0.031,10,29.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"1630 N MAIN ST, LOS ANGELES"
2,1103,44201,34.06659,-118.22688,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-01-01,Parts per million,24,100.0,0.01425,0.031,10,29.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"1630 N MAIN ST, LOS ANGELES"
3,1103,44201,34.06659,-118.22688,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-01-01,Parts per million,17,100.0,0.016471,0.031,10,29.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"1630 N MAIN ST, LOS ANGELES"
4,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 24-hour 2006,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"
5,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 Annual 2006,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"
6,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 24-hour 2012,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"
7,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 Annual 2012,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"
8,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 24-hour 1997,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"
9,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 Annual 1997,2020-01-01,Micrograms/cubic meter (LC),1,100.0,17.4,17.4,0,62.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"


In [18]:
LA_df.tail(1)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
4034,1103,88101,34.06659,-118.22688,PM2.5 - Local Conditions,7,24 HOUR,PM25 Annual 1997,2020-12-31,Micrograms/cubic meter (LC),1,100.0,9.0,9.0,0,38.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1630 N MAIN ST, LOS ANGELES"


In [19]:
len(LA_df)

4035

In [20]:
LA_df['parameter'].unique()

array(['Ozone', 'PM2.5 - Local Conditions', 'PM10 Total 0-10um STP'],
      dtype=object)

In [21]:
sum(LA_df['parameter'] == "PM10 Total 0-10um STP")

45

In [22]:
# Create a dataframe which contains PM10 only
LA10_df = LA_df.loc[(LA_df.parameter == "PM10 Total 0-10um STP")]
#LA10_df 

In [23]:
# Find the number of days where we have multiple PM10 data
sum(LA10_df['date_local'].value_counts() > 1)

0

In [24]:
# Create a PM10 df which only consist of date_loca, parameter, and aqi
LA10_final = LA10_df[['date_local', 'parameter','aqi']]

In [25]:
LA10_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 45 entries, 34 to 4002
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date_local  45 non-null     datetime64[ns]
 1   parameter   45 non-null     object        
 2   aqi         45 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 1.4+ KB


In [26]:
# Create a dataframe which contains Ozone only
LAO3_df = LA_df.loc[(LA_df.parameter == "Ozone") & (LA_df.pollutant_standard == 'Ozone 8-hour 2015')]
#LAO3_df

In [27]:
# Find the number of days where we have multiple Ozone data
sum(LAO3_df['date_local'].value_counts() > 1)

0

In [28]:
# Find the date which Ozone data is missing
LAO3_in_df = LAO3_df.set_index('date_local')
print(pd.date_range(
  start="2020-01-01", end="2020-12-31").difference(LAO3_in_df.index))

DatetimeIndex(['2020-03-12', '2020-03-13', '2020-05-10', '2020-05-11',
               '2020-05-16', '2020-05-17', '2020-05-18', '2020-05-20',
               '2020-05-21', '2020-05-22', '2020-05-23', '2020-05-24',
               '2020-05-25', '2020-05-26'],
              dtype='datetime64[ns]', freq=None)


Since the total number of days in 2020 is 366 and we only has 352 rows of Ozone data, this means that there are 14 days where the Ozone data does not exist. From above, we see that there are two missing values in March and 12 missing values in May.

In [29]:
# Create an Ozone df which only consist of date_local, parameter, and aqi
LAO3_final = LAO3_df[['date_local', 'parameter', 'aqi']]

In [30]:
LAO3_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 352 entries, 3 to 4028
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date_local  352 non-null    datetime64[ns]
 1   parameter   352 non-null    object        
 2   aqi         352 non-null    float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 11.0+ KB


In [31]:
# Create a dataframe which contains PM25 only
LA25_df = LA_df.loc[(LA_df.parameter == "PM2.5 - Local Conditions") & (LA_df.pollutant_standard == 'PM25 24-hour 2012')]
#LA25_df

In [32]:
#LA25_df['date_local'].value_counts()

In [33]:
sum(LA25_df['date_local'].value_counts() > 1)

65

In [34]:
# Find the date where the value counts of PM25 is double or more
#LA25_df[LA25_df['date_local'].map(LA25_df['date_local'].value_counts()) > 1].head(5)

In [35]:
#LA25_final = LA25_df[["site_number", "latitude", "longitude", "parameter", "date_local", "aqi", "site_address"]]
LA25_final = LA25_df[["date_local", "parameter", "aqi"]]
LA25_final = LA25_final.sort_values(['date_local', 'aqi'], ascending=[True, True])
LA25_final = LA25_final.drop_duplicates(subset='date_local', keep = "last")
#LA25_final

In [36]:
sum(LA25_final['date_local'].value_counts() > 1)

0

In [37]:
LA25_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 363 entries, 6 to 4031
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date_local  363 non-null    datetime64[ns]
 1   parameter   363 non-null    object        
 2   aqi         363 non-null    float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 11.3+ KB


In [38]:
# Find the date which PM25 data is missing
LA25_final_in_df = LA25_final.set_index('date_local')
print(pd.date_range(
  start="2020-01-01", end="2020-12-31").difference(LA25_final_in_df.index))

DatetimeIndex(['2020-01-09', '2020-02-07', '2020-09-25'], dtype='datetime64[ns]', freq=None)


<h4>Combine LA Main Street Ozone and PMs Air Quality Data</h4>

In [39]:
temp = pd.concat([LA25_final,LA10_final, LAO3_final])
temp = temp.sort_values(['date_local', 'parameter'], ascending=[True, True])
temp = temp.pivot(index = "date_local", columns = "parameter")
temp.columns=temp.columns.droplevel(0)
temp.reset_index(inplace=True)
temp.columns = ['date_local', 'Ozone', 'PM10 Total 0-10um STP', 'PM2.5 - Local Conditions']
temp['max_aqi'] = temp[["Ozone", 'PM10 Total 0-10um STP', 'PM2.5 - Local Conditions']].max(axis=1)
temp['latitude'], temp['longitude'], temp['site_address'] = [LA_df.latitude[0],LA_df.longitude[0], LA_df.site_address[0]]

In [40]:
LA_O3PM = temp.copy()
LA_O3PM.head()

Unnamed: 0,date_local,Ozone,PM10 Total 0-10um STP,PM2.5 - Local Conditions,max_aqi,latitude,longitude,site_address
0,2020-01-01,29.0,,62.0,62.0,34.06659,-118.22688,"1630 N MAIN ST, LOS ANGELES"
1,2020-01-02,21.0,,62.0,62.0,34.06659,-118.22688,"1630 N MAIN ST, LOS ANGELES"
2,2020-01-03,14.0,,52.0,52.0,34.06659,-118.22688,"1630 N MAIN ST, LOS ANGELES"
3,2020-01-04,20.0,34.0,69.0,69.0,34.06659,-118.22688,"1630 N MAIN ST, LOS ANGELES"
4,2020-01-05,23.0,,58.0,58.0,34.06659,-118.22688,"1630 N MAIN ST, LOS ANGELES"


<h3>Long Beach (Hudson) Air Quality Data</h3>

In [41]:
site = "4006"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [42]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

LB_Hd_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [43]:
LB_Hd_df['date_local']

0    2020-01-04
1    2020-01-10
2    2020-01-16
3    2020-01-22
4    2020-01-28
5    2020-02-03
6    2020-02-09
7    2020-02-15
8    2020-02-21
9    2020-02-27
10   2020-03-04
11   2020-03-10
12   2020-03-16
13   2020-03-22
Name: date_local, dtype: datetime64[ns]

In [44]:
len(LB_Hd_df)

14

In [45]:
LB_Hd_df['parameter'].unique()

array(['PM10 Total 0-10um STP'], dtype=object)

Long Beach (Hudson) only __has 14 rows of PM10 data (from early Januray 2020 to late March 2020)__ in 2020. Since the data is very small, we will not use the data from Long Beach (Hudson) monitoring station.

<h3>Long Beach - Route 710 Air Quality Data</h3>

In [46]:
site = "4008"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [47]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

LB_710_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [48]:
len(LB_710_df)

4522

In [49]:
LB_710_df['parameter'].unique()

array(['PM2.5 - Local Conditions'], dtype=object)

<h3>Long Beach (South) Air Quality Data</h3>

In [50]:
site = "4004"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [51]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

LB_S_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [52]:
LB_S_df.head(1)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,7,24 HOUR,PM25 24-hour 2006,2020-01-01,Micrograms/cubic meter (LC),1,100.0,26.1,26.1,0,80.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"1305 E. Pacific Coast Hwy., Long Beach"


In [53]:
len(LB_S_df)

4663

In [54]:
LB_S_df['parameter'].unique()

array(['PM2.5 - Local Conditions', 'PM10 Total 0-10um STP'], dtype=object)

<h3>Long Beach (Signal Hill) Air Quality Data</h3>

In [55]:
site = "4009"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [56]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

LB_Sg_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [57]:
LB_Sg_df.head(1)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,4009,44201,33.793713,-118.171019,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,24,100.0,0.019708,0.041,14,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street


In [58]:
len(LB_Sg_df)

1400

In [59]:
LB_Sg_df['parameter'].unique()

array(['Ozone'], dtype=object)

Long Beach (South) and Long Beach (Signal Hill) are located in close proximity. Long Beach (South) has PM2.5 and PM10 data, while Long Beach (Signal Hill) has Ozone data. We will combine both data and called it __Long Beach (Signal Hill + South)__ and set the location of the combined dataset to the __location of Long Beach (Signal Hill)__. 

<h4>Long Beach (Signal Hill + South) Air Quality Data</h4>

In [60]:
LB_Sg_S_df = pd.concat([LB_Sg_df, LB_S_df])
len(LB_Sg_S_df)

6063

In [61]:
LB_Sg_S_df.head()

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,4009,44201,33.793713,-118.171019,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,24,100.0,0.019708,0.041,14,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street
1,4009,44201,33.793713,-118.171019,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-01,Parts per million,24,100.0,0.021125,0.036,10,33.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street
2,4009,44201,33.793713,-118.171019,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-01-01,Parts per million,24,100.0,0.021125,0.036,10,33.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street
3,4009,44201,33.793713,-118.171019,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-01-01,Parts per million,17,100.0,0.026,0.036,10,33.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street
4,4009,44201,33.793713,-118.171019,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-02,Parts per million,22,92.0,0.015045,0.033,15,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,1710 E. 20th Street


In [62]:
LB_Sg_S_df.tail()

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
4658,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,X,24-HR BLK AVG,PM25 Annual 2006,2020-12-31,Micrograms/cubic meter (LC),1,100.0,31.3,31.3,0,91.0,Met One BAM-1020 Mass Monitor w/VSCC - Beta At...,"1305 E. Pacific Coast Hwy., Long Beach"
4659,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,X,24-HR BLK AVG,PM25 24-hour 2012,2020-12-31,Micrograms/cubic meter (LC),1,100.0,31.3,31.3,0,91.0,Met One BAM-1020 Mass Monitor w/VSCC - Beta At...,"1305 E. Pacific Coast Hwy., Long Beach"
4660,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,X,24-HR BLK AVG,PM25 Annual 2012,2020-12-31,Micrograms/cubic meter (LC),1,100.0,31.3,31.3,0,91.0,Met One BAM-1020 Mass Monitor w/VSCC - Beta At...,"1305 E. Pacific Coast Hwy., Long Beach"
4661,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,X,24-HR BLK AVG,PM25 24-hour 1997,2020-12-31,Micrograms/cubic meter (LC),1,100.0,31.3,31.3,0,91.0,Met One BAM-1020 Mass Monitor w/VSCC - Beta At...,"1305 E. Pacific Coast Hwy., Long Beach"
4662,4004,88101,33.79236,-118.17533,PM2.5 - Local Conditions,X,24-HR BLK AVG,PM25 Annual 1997,2020-12-31,Micrograms/cubic meter (LC),1,100.0,31.3,31.3,0,91.0,Met One BAM-1020 Mass Monitor w/VSCC - Beta At...,"1305 E. Pacific Coast Hwy., Long Beach"


<h3>Pico Rivera Air Quality Data</h3>

In [63]:
site = "1602"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [64]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

Pic_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [65]:
len(Pic_df)

2544

In [66]:
Pic_df['parameter'].unique()

array(['Ozone', 'PM2.5 - Local Conditions'], dtype=object)

<h3>North Hollywood (NOHO) Air Quality Data</h3>

In [67]:
site = "4010"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [68]:
Noho_df = pd.json_normalize(results["Data"])
Noho_df.drop_duplicates()
Noho_df['date_local'] = pd.to_datetime(Noho_df['date_local'])
Noho_df.sort_values(by=["date_local", "parameter"], inplace = True)
Noho_df = Noho_df.reset_index(drop=True)

In [69]:
Noho_df = Noho_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [70]:
Noho_df.tail()

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
1447,4010,44201,34.181977,-118.363036,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-12-30,Parts per million,17,100.0,0.013412,0.03,10,28.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,10659 W. Delano Street
1448,4010,44201,34.181977,-118.363036,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-12-31,Parts per million,24,100.0,0.021333,0.042,13,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,10659 W. Delano Street
1449,4010,44201,34.181977,-118.363036,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-12-31,Parts per million,19,79.0,0.024263,0.038,10,35.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,10659 W. Delano Street
1450,4010,44201,34.181977,-118.363036,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-12-31,Parts per million,19,79.0,0.024263,0.038,10,35.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,10659 W. Delano Street
1451,4010,44201,34.181977,-118.363036,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-12-31,Parts per million,12,71.0,0.033083,0.038,10,35.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,10659 W. Delano Street


In [71]:
len(Noho_df)

1452

In [72]:
Noho_df['parameter'].unique()

array(['Ozone'], dtype=object)

<h3>Pasadena Air Quality Data</h3>

In [73]:
site = "2005"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [74]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

Pas_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [75]:
len(Pas_df)

2209

In [76]:
Pas_df['parameter'].unique()

array(['Ozone', 'PM2.5 - Local Conditions'], dtype=object)

<h3>Reseda Air Quality Data</h3>

In [77]:
site = "1201"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [78]:
Res_df = pd.json_normalize(results["Data"])
Res_df.drop_duplicates()
Res_df['date_local'] = pd.to_datetime(Res_df['date_local'])
Res_df.sort_values(by=["date_local", "parameter"], inplace = True)
Res_df = Res_df.reset_index(drop=True)

Res_df = Res_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [79]:
Res_df.head()

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,1201,44201,34.19925,-118.53276,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,24,100.0,0.022375,0.047,14,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"18330 GAULT ST., RESEDA"
1,1201,44201,34.19925,-118.53276,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-01,Parts per million,24,100.0,0.0265,0.041,11,38.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"18330 GAULT ST., RESEDA"
2,1201,44201,34.19925,-118.53276,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-01-01,Parts per million,24,100.0,0.0265,0.041,11,38.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"18330 GAULT ST., RESEDA"
3,1201,44201,34.19925,-118.53276,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-01-01,Parts per million,17,100.0,0.032706,0.041,11,38.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"18330 GAULT ST., RESEDA"
4,1201,88101,34.19925,-118.53276,PM2.5 - Local Conditions,7,24 HOUR,PM25 24-hour 2006,2020-01-01,Micrograms/cubic meter (LC),1,100.0,15.4,15.4,0,58.0,R & P Model 2025 PM-2.5 Sequential Air Sampler...,"18330 GAULT ST., RESEDA"


In [80]:
Res_df.tail(1)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
2125,1201,44201,34.19925,-118.53276,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-12-31,Parts per million,12,71.0,0.038,0.04,9,37.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"18330 GAULT ST., RESEDA"


In [81]:
len(Res_df)

2126

In [82]:
Res_df['parameter'].unique()

array(['Ozone', 'PM2.5 - Local Conditions'], dtype=object)

<h3>Santa Clarita Air Quality Data</h3>

In [83]:
site = "6012"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [84]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

San_Cl_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [85]:
len(San_Cl_df)

1483

In [86]:
San_Cl_df['parameter'].unique()

array(['Ozone', 'PM10 Total 0-10um STP'], dtype=object)

<h3>Simi Valley Air Quality Data</h3>

In [87]:
site = "2002"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=111&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [88]:
temp_df = pd.json_normalize(results["Data"])
temp_df.drop_duplicates()
temp_df['date_local'] = pd.to_datetime(temp_df['date_local'])
temp_df.sort_values(by=["date_local", "parameter"], inplace = True)
temp_df = temp_df.reset_index(drop=True)

Sim_V_df = temp_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [89]:
len(Sim_V_df)

7230

In [90]:
Sim_V_df['parameter'].unique()

array(['Ozone', 'PM10 Total 0-10um STP', 'PM2.5 - Local Conditions'],
      dtype=object)

<h3>West LA Air Quality Data</h3>

In [91]:
site = "0113"
url_data = "https://aqs.epa.gov/data/api/dailyData/bySite?email={}&key={}&param=44201,81102,88101&bdate={}&edate={}&state=06&county=037&site={}".format(USER_ID, USER_KEY, 20200101, 20201231, site)

results = requests.get(url_data).json()

In [92]:
West_LA_df = pd.json_normalize(results["Data"])
West_LA_df.drop_duplicates()
West_LA_df['date_local'] = pd.to_datetime(West_LA_df['date_local'])
West_LA_df.sort_values(by=["date_local", "parameter"], inplace = True)
West_LA_df = West_LA_df.reset_index(drop=True)

In [93]:
West_LA_df = West_LA_df.drop(['state_code', 'county_code', 'poc', 'datum', 'event_type', 'validity_indicator', 'method_code', 'local_site_name', 'state', 'county', 'cbsa_code', 'cbsa', 'date_of_last_change', 'city' ], axis=1)

In [94]:
West_LA_df.head(10)

Unnamed: 0,site_number,parameter_code,latitude,longitude,parameter,sample_duration_code,sample_duration,pollutant_standard,date_local,units_of_measure,observation_count,observation_percent,arithmetic_mean,first_max_value,first_max_hour,aqi,method,site_address
0,113,44201,34.05111,-118.45636,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-01,Parts per million,24,100.0,0.027167,0.043,15,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
1,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-01,Parts per million,24,100.0,0.0245,0.037,9,34.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
2,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-01-01,Parts per million,24,100.0,0.0245,0.037,9,34.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
3,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-01-01,Parts per million,17,100.0,0.023941,0.037,9,34.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
4,113,44201,34.05111,-118.45636,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-02,Parts per million,24,100.0,0.017333,0.036,14,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
5,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-02,Parts per million,24,100.0,0.017417,0.03,9,28.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
6,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 2008,2020-01-02,Parts per million,24,100.0,0.017417,0.03,9,28.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
7,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-hour 2015,2020-01-02,Parts per million,17,100.0,0.016706,0.03,9,28.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
8,113,44201,34.05111,-118.45636,Ozone,1,1 HOUR,Ozone 1-hour 1979,2020-01-03,Parts per million,24,100.0,0.014708,0.029,10,,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"
9,113,44201,34.05111,-118.45636,Ozone,W,8-HR RUN AVG BEGIN HOUR,Ozone 8-Hour 1997,2020-01-03,Parts per million,24,100.0,0.013208,0.025,10,23.0,INSTRUMENTAL - ULTRA VIOLET ABSORPTION,"VA HOSPITAL, WEST LOS ANGELES"


In [95]:
len(West_LA_df)

1456

In [96]:
West_LA_df['parameter'].unique()

array(['Ozone'], dtype=object)

In [97]:
# Create a dataframe which contains Ozone only
West_LAO3_df = West_LA_df.loc[(West_LA_df.pollutant_standard == 'Ozone 8-hour 2015')]
#West_LAO3_df

In [98]:
# Find the number of days where we have multiple Ozone data
sum(West_LAO3_df['date_local'].value_counts() > 1)

0

In [99]:
# Find the date which Ozone data is missing
West_LAO3_in_df = West_LAO3_df.set_index('date_local')
print(pd.date_range(
  start="2020-01-01", end="2020-12-31").difference(West_LAO3_in_df.index))

DatetimeIndex(['2020-10-25', '2020-10-26'], dtype='datetime64[ns]', freq=None)


In [100]:
temp = West_LAO3_df[["date_local", "aqi", "latitude", 'longitude', 'site_address']]
temp['max_aqi'] = temp[['aqi']].copy()
temp = temp.rename(columns={'aqi': 'Ozone'})
temp = temp[['date_local', 'Ozone', 'max_aqi', 'latitude', 'longitude', 'site_address']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp['max_aqi'] = temp[['aqi']].copy()


In [101]:
West_LA_O3 = temp.copy()
West_LA_O3.head()

Unnamed: 0,date_local,Ozone,max_aqi,latitude,longitude,site_address
3,2020-01-01,34.0,34.0,34.05111,-118.45636,"VA HOSPITAL, WEST LOS ANGELES"
7,2020-01-02,28.0,28.0,34.05111,-118.45636,"VA HOSPITAL, WEST LOS ANGELES"
11,2020-01-03,23.0,23.0,34.05111,-118.45636,"VA HOSPITAL, WEST LOS ANGELES"
15,2020-01-04,29.0,29.0,34.05111,-118.45636,"VA HOSPITAL, WEST LOS ANGELES"
19,2020-01-05,36.0,36.0,34.05111,-118.45636,"VA HOSPITAL, WEST LOS ANGELES"


<h2>Displaying the Map of the Monitoring Stations in the LA NCs Area.</h2>

The NC geojson file was taken from:
* [Neighborhood Councils (Certified) | City of Los Angeles Hub](https://geohub.lacity.org/datasets/9c8639737e3a457a8c0f6a93f9c36974/explore?location=34.020333%2C-118.412044%2C10.97)

<h3>Reading and Creating the Map of Los Angeles Neighborhoods Council</h3>

In [102]:
#f_name = 'Neighborhood_Councils_(Certified)/Neighborhood_Councils_(Certified).shp'
#nc = gpd.read_file(f_name)
#nc.head()

In [103]:
#nc['NC_ID'].unique()

In [104]:
nc_geo = r'Neighborhood_Councils_(Certified).geojson' # geojson file

longitude = -118.2518
latitude = 34.0488

In [105]:
fig = Figure(width=1000, height=700)
nc_map = folium.Map(location=[latitude, longitude], zoom_start=11, min_zoom=8,max_zoom=12)
fig.add_child(nc_map)
folium.TileLayer('CartoDB positron',name="Light Map").add_to(nc_map)
#nc_map

<folium.raster_layers.TileLayer at 0x1799e6f03d0>

In [106]:
folium.Choropleth(geo_data = nc_geo,
        name='Choropleth',
        key_on='features.properties.Name',
        #line_color='green', 
        #fill_color='YlGn',
        fill_opacity=0.1, line_opacity=0.3
        ).add_to(nc_map) 
#folium.LayerControl().add_to(nc_map)

<folium.features.Choropleth at 0x1799c763400>

In [107]:
nc_map.save('nc.html')
#nc_map

In [108]:
#color_scale = np.array(['#7AF17A','#FFFF7A','#FFBC7A','#FF7A7A','#CA7AA2', '#A27A8E'])
#sns.palplot(sns.color_palette(color_scale))

<h3>Creating a Map of Monitoring Stations</h3>

In [109]:
# Create a Monitoring Stations Dataframe
columns = ["site_number", "site_address", "latitude", "longitude"]
df_names = [Comp_df, LA_df, LB_710_df, LB_S_df, LB_Sg_df, Noho_df, Pas_df, Res_df, Sim_V_df, San_Cl_df, West_LA_df]
all_stations = []

for i in df_names:
    fields = [i.site_number[0], i.site_address[0], i.latitude[0], i.longitude[0]]
    all_stations.append(fields)

stations_loc_df = pd.DataFrame(all_stations, columns = columns)

In [110]:
stations_loc_df.to_csv('EPA_stations.csv',  index=False)
stations_loc_df

Unnamed: 0,site_number,site_address,latitude,longitude
0,1302,700 North Bullis Road,33.901389,-118.205
1,1103,"1630 N MAIN ST, LOS ANGELES",34.06659,-118.22688
2,4008,5895 Long Beach Blvd.,33.859662,-118.200707
3,4004,"1305 E. Pacific Coast Hwy., Long Beach",33.79236,-118.17533
4,4009,1710 E. 20th Street,33.793713,-118.171019
5,4010,10659 W. Delano Street,34.181977,-118.363036
6,2005,"752 S. WILSON AVE., PASADENA",34.1326,-118.1272
7,1201,"18330 GAULT ST., RESEDA",34.19925,-118.53276
8,2002,"5400 COCHRAN STREET, SIMI VALLEY, CA 93063",34.276316,-118.683685
9,6012,"22224 PLACERITA CANYON RD, SANTA CLARITA",34.38344,-118.5284


In [111]:
# Set the radius of the circles surrounding the monitoring stations in meter
miles = 2
meters_c = miles * 1609.34

In [112]:
# Plot the selected sensors
for i, lat, lng, site_address, site_number in zip(stations_loc_df.index.values.tolist(),stations_loc_df.latitude, stations_loc_df.longitude, stations_loc_df.site_address, stations_loc_df.site_number):
    tooltip_text = site_address+", ID: " + str(site_number)
    
    folium.CircleMarker([lat, lng], radius = 15, color = "green", weight=1,
                fill = True, fill_color = "green", fill_opacity = 0.1,
                tooltip=tooltip_text).add_to(nc_map)
    folium.Circle([lat, lng], radius = meters_c, color = "blue",  weight=1.5,
                             ).add_to(nc_map)
    folium.Marker([lat, lng],
    icon=folium.DivIcon(html=f"""<div style="font-family: Arial; 
    color: b; font-weight: bold; font size="+5"">{i}</div>""")
    ).add_to(nc_map)
nc_map.save('south_coast_stations.html')
nc_map

Only four sensors which belong to South Coast Air Quality Management District are located in the LA NCs area. These four sensors won't give us enough information needed to find the relationship between air quality and the 311 requests.