Data downloaded from https://cd.epic.epd.gov.hk/EPICDI/air/station/
on January 12th 2019.

Data is located in subfolder data-files, and split into 4 csv files, each containing
data from one year from one station (Tung Chung and Hong Kong Central).

In [67]:
# Imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os
import math

In [63]:
subfolder = "data-files"
file_names=os.listdir(subfolder)
# load the csvs
frames=[]
for file in file_names:
    complete_path=subfolder + "/" + file
    df=pd.read_csv(complete_path, header=10)
    df['file']=file
    frames.append(df)

In [87]:
# Lets have a look on the data
df.head()

Unnamed: 0,DATE,HOUR,STATION,CO,FSP,NO2,NOX,O3,RSP,SO2,file
0,1/1/2018,1,CENTRAL,82,54,57,75,58,73,5,hk-air_hourly2018.csv
1,1/1/2018,2,CENTRAL,77,51,49,67,63,71,5,hk-air_hourly2018.csv
2,1/1/2018,3,CENTRAL,76,37,37,49,68,58,4,hk-air_hourly2018.csv
3,1/1/2018,4,CENTRAL,77,40,34,50,69,58,4,hk-air_hourly2018.csv
4,1/1/2018,5,CENTRAL,82,44,33,48,67,61,5,hk-air_hourly2018.csv


In [65]:
# Lets concatenate dataframes
dfa=pd.concat(frames)

# Pollutants to Air quality indices
Next challenge is to transform the hourly pollutant level data into either -> AQHI (air quality health index, by environmental protection department of HK)
-> AQI (air quality index, by environmental protection agency of the USA)

## AQI US Calculation
Explanation on how to calculate the US AQI can be found documented in wikipedia
https://en.wikipedia.org/wiki/Air_quality_index#Computing_the_AQI

Since there exists a well documented Python package for calculating the US AQI we resort to that for now. The library is called ```python-aqi```. Source code and documentation can be found [on github](https://github.com/hrbonz/python-aqi).

Installation is done in command line interface with ```pip install python-aqi```, and importing with ```import aqi```.

In [40]:
import aqi

### Terminology
Carbon Monoxide = CO  
Fine Suspended Particulates = FSP = PM25  
Nitrogen Dioxide = NO2  
Nitrogen Oxides = NOX  
Ozone = O3  
Respirable Suspended Particulates = RSP = PM10  
Sulphur Dioxide = SO2  


### AQI components
For computing the AQI in addition to hourly measures we will need  
* 8h average of the Ozone concentration
* 24h average of the PM25 concentration
* 24h average of the PM10 concentration

When reporting current air quality conditions averaging past 24h observations
would not make much sense, the US EPA uses an averaging system called [NowCast]( https://en.wikipedia.org/wiki/NowCast_(air_quality_index%29).

TODO: NowCast computation

Currently we resort to observed values, and calculate the real 24h averages (which are of course not same as the reported momentary predictive AQI values)


### Hong Kong AQHI
AQHI (air quality health index) is a measure designed to signal the expected health
effects of present pollutant concentration.

A study report by a team in CUHK may be found [online](http://www.aqhi.gov.hk/pdf/related_websites/APIreview_report.pdf) and the [general portal in english](http://www.aqhi.gov.hk/en.html) by the HK Environmental Protection Department.

The Hong Kong AQHI is based on the [Canadian counterpart](https://en.wikipedia.org/wiki/Air_Quality_Health_Index_(Canada)), but the model parameters have been recalibrated to localize the model.

The model for computing the AQHI is shown in full [in EPD's faq](http://www.aqhi.gov.hk/en/what-is-aqhi/faqs.html).


In [85]:
# Function for computing the AQHI
def added_risk(beta,concentration):
    '''generic added risk function for computing the added health risk
    of a pollutant concentration'''
    ar=math.expm1(beta*concentration)*100
    return ar

def aqhi_func(conc, added_risk):
    '''each input represents the 3h moving average
    of the pollutant concentration in question'''
    #constants (regression coefficients)
    pollutants=["NO2","SO2","O3","PM10","PM25"]
    betas=[0.0004462559,0.0001393235,0.0005116328,0.0002821751,0.0002180567]
    ars=list(map(added_risk, betas, conc))
    ar_total=sum(ars[0:2])+max(ars[3:4])
    return ar_total


In [86]:
conc_test=[115,15,4,65,40]
print(aqhi_func(conc_test,added_risk))

7.326174439235971


In [90]:
# testing the aqi library's to_aqi function - seems to work
aqi.to_aqi([
    (aqi.POLLUTANT_PM25, df['FSP'].iloc[1]),
    (aqi.POLLUTANT_PM10, '24'),
    (aqi.POLLUTANT_O3_8H, '0.087')
])

Decimal('139')

### Data for comparison

For comparison and validation we need to download the hourly AQHI data, and respective US AQI data to run the comparison

# References

1. [Development and Application of a Next Generation
Air Sensor Network for the Hong Kong Marathon
2015 Air Quality Monitoring](https://pdfs.semanticscholar.org/76b5/d17b3d917cf5b9a16a6afd806d1c8f3ba7bc.pdf)
2. [How is the hourly AQHI calculated?](http://www.aqhi.gov.hk/en/what-is-aqhi/faqs.html#e_02)
3. [Computing the AQI](https://en.wikipedia.org/wiki/Air_quality_index#Computing_the_AQI)
4. [NowCast (air quality index)](https://en.wikipedia.org/wiki/NowCast_(air_quality_index%29)
5. [Inquire and Download Air Quality Monitoring Data](https://cd.epic.epd.gov.hk/EPICDI/air/station/)
6. [A library to convert between AQI value and pollutant concentration (µg/m³ or ppm)](https://github.com/hrbonz/python-aqi/blob/master/aqi/algos/epa.py)
7. [Past AQHI Record for Download
](http://www.aqhi.gov.hk/en/aqhi/statistics-of-aqhi/past-aqhi-records.html)