notebook to show the steps behind the development, complements tests

Source of data -> https://www.londonair.org.uk/LondonAir/Default.aspx

Most of the functionality of this package is parsing the raw data which consists of nested dicts and returning data structures that are more convenient to work with.

API for hourly returns data in format (see `hourly.json`):
```
LocalAuthority (Borough)
    Site
        Species (CO2, NO2)
```

Not all Boroughs have sites, and generally each site might have a different combination of species.

In [21]:
import sys
import os
import requests
import json
sys.path.insert(0, os.path.dirname(os.path.abspath('.'))) ## get parent dir
import london_air_quality as laq

# Optionals
import pandas as pd # not a dependency
import matplotlib.pyplot as plt
%matplotlib inline

## Capture API data for tests

In [22]:
url = laq.LAQ_HOURLY_URL

In [35]:
repsonse = requests.get(url)
response_json = repsonse.json()
# response_json = str(response_json).strip("'<>() ").replace('\'', '\"') # strip some bad characters

with open('hourly.json', 'w') as f:
    json.dump(repsonse.json(), f)

In [33]:
with open('hourly.json') as json_file:
    data = json.load(json_file)

In [2]:
# AUTHORITIES = ALL_AUTHORITIES # Check functionality with all 

In [37]:
try:
    hourly_data_raw = laq.request_data(laq.LAQ_HOURLY_URL)
except laq.LondonAirQualityException as exc:
    print(exc)

## Sites parsing
Parse out the sites data for a single local authority

In [38]:
example_site = hourly_data_raw['HourlyAirQualityIndex']['LocalAuthority'][0]['Site']
example_site

[{'@BulletinDate': '2019-09-26 11:00:00',
  '@SiteCode': 'BG1',
  '@SiteName': 'Barking and Dagenham - Rush Green',
  '@SiteType': 'Suburban',
  '@Latitude': '51.563752',
  '@Longitude': '0.177891',
  '@LatitudeWGS84': '6721627.34498',
  '@LongitudeWGS84': '19802.7355367',
  '@OwnerID': '1',
  'Species': [{'@SpeciesCode': 'NO2',
    '@SpeciesDescription': 'Nitrogen Dioxide',
    '@AirQualityIndex': '0',
    '@AirQualityBand': 'No data',
    '@IndexSource': 'Measurement'},
   {'@SpeciesCode': 'SO2',
    '@SpeciesDescription': 'Sulphur Dioxide',
    '@AirQualityIndex': '0',
    '@AirQualityBand': 'No data',
    '@IndexSource': 'Measurement'}]},
 {'@BulletinDate': '2019-09-26 11:00:00',
  '@SiteCode': 'BG2',
  '@SiteName': 'Barking and Dagenham - Scrattons Farm',
  '@SiteType': 'Suburban',
  '@Latitude': '51.529389',
  '@Longitude': '0.132857',
  '@LatitudeWGS84': '6715476.18683',
  '@LongitudeWGS84': '14789.5735883',
  '@OwnerID': '1',
  'Species': [{'@SpeciesCode': 'NO2',
    '@SpeciesD

In [39]:
laq.parse_site(example_site)

[{'updated': '2019-09-26 11:00:00',
  'latitude': '51.563752',
  'longitude': '0.177891',
  'site_code': 'BG1',
  'site_name': 'Rush Green',
  'site_type': 'Suburban',
  'pollutants': ['no_species_data'],
  'pollutants_status': 'no_species_data',
  'number_of_pollutants': 0},
 {'updated': '2019-09-26 11:00:00',
  'latitude': '51.529389',
  'longitude': '0.132857',
  'site_code': 'BG2',
  'site_name': 'Scrattons Farm',
  'site_type': 'Suburban',
  'pollutants': [{'description': 'Nitrogen Dioxide',
    'code': 'NO2',
    'quality': 'Low',
    'index': '1',
    'summary': 'NO2 is Low'},
   {'description': 'PM10 Particulate',
    'code': 'PM10',
    'quality': 'Low',
    'index': '1',
    'summary': 'PM10 is Low'}],
  'pollutants_status': 'Low',
  'number_of_pollutants': 2}]

## Species parsing
The species parsing

In [40]:
example_species = example_site[0]['Species']
example_species

[{'@SpeciesCode': 'NO2',
  '@SpeciesDescription': 'Nitrogen Dioxide',
  '@AirQualityIndex': '0',
  '@AirQualityBand': 'No data',
  '@IndexSource': 'Measurement'},
 {'@SpeciesCode': 'SO2',
  '@SpeciesDescription': 'Sulphur Dioxide',
  '@AirQualityIndex': '0',
  '@AirQualityBand': 'No data',
  '@IndexSource': 'Measurement'}]

In [41]:
laq.parse_species(example_species)

([], [])

## Package Usage

In [42]:
hourly_data = laq.parse_hourly_response(hourly_data_raw)

In [43]:
hourly_data['Barking and Dagenham']

[{'updated': '2019-09-26 11:00:00',
  'latitude': '51.563752',
  'longitude': '0.177891',
  'site_code': 'BG1',
  'site_name': 'Rush Green',
  'site_type': 'Suburban',
  'pollutants': ['no_species_data'],
  'pollutants_status': 'no_species_data',
  'number_of_pollutants': 0},
 {'updated': '2019-09-26 11:00:00',
  'latitude': '51.529389',
  'longitude': '0.132857',
  'site_code': 'BG2',
  'site_name': 'Scrattons Farm',
  'site_type': 'Suburban',
  'pollutants': [{'description': 'Nitrogen Dioxide',
    'code': 'NO2',
    'quality': 'Low',
    'index': '1',
    'summary': 'NO2 is Low'},
   {'description': 'PM10 Particulate',
    'code': 'PM10',
    'quality': 'Low',
    'index': '1',
    'summary': 'PM10 is Low'}],
  'pollutants_status': 'Low',
  'number_of_pollutants': 2}]

## Dataframe
We can also process the data into a list of dict that can be turned into a pandas dataframe (don't actually do this in the package as don't want pandas dependency)

In [11]:
df = pd.DataFrame(laq.get_hourly_data_flat(hourly_data))

In [12]:
df.head()

Unnamed: 0,borough,code,description,index,latitude,longitude,quality,site_code,site_name,summary,updated
0,Barking and Dagenham,NO2,Nitrogen Dioxide,1,51.563752,0.177891,Low,BG1,Rush Green,NO2 is Low,2019-09-26 09:00:00
1,Barking and Dagenham,SO2,Sulphur Dioxide,1,51.563752,0.177891,Low,BG1,Rush Green,SO2 is Low,2019-09-26 09:00:00
2,Barking and Dagenham,NO2,Nitrogen Dioxide,1,51.529389,0.132857,Low,BG2,Scrattons Farm,NO2 is Low,2019-09-26 09:00:00
3,Barking and Dagenham,PM10,PM10 Particulate,1,51.529389,0.132857,Low,BG2,Scrattons Farm,PM10 is Low,2019-09-26 09:00:00
4,Bexley,NO2,Nitrogen Dioxide,1,51.4946486813055,0.137279111232178,Low,BQ7,Belvedere West,NO2 is Low,2019-09-26 09:00:00


In [13]:
boroughs_with_data = df['borough'].unique()
print(boroughs_with_data)
len(boroughs_with_data)

['Barking and Dagenham' 'Bexley' 'Brent' 'Camden' 'City of London'
 'Croydon' 'Ealing' 'Enfield' 'Greenwich' 'Hackney' 'Haringey' 'Harrow'
 'Havering' 'Hillingdon' 'Islington' 'Kensington and Chelsea' 'Kingston'
 'Lambeth' 'Lewisham' 'Merton' 'Redbridge' 'Richmond' 'Southwark' 'Sutton'
 'Tower Hamlets' 'Wandsworth' 'Westminster']


27

In [17]:
df['code'].value_counts()

NO2     80
PM10    62
PM25    18
O3      17
SO2      5
Name: code, dtype: int64

In [18]:
df['quality'].value_counts()

Low    182
Name: quality, dtype: int64

In [19]:
df['index'].value_counts()

1    160
2     22
Name: index, dtype: int64