# 3.2 Downloading the PM2.5 sample data set (Step-1)

This Jupyter notebook is used to retrieve sensor data (particulate matter - PM 2.5) from the openSenseMap platform (https://docs.opensensemap.org/) and save it as a CSV file. This dataset will be used in step 2 and step 3 of our exercise to mimic sensor data streams that will be sent to a sensor data broker, which will then forward these messages to applications that have subscribed to data of a specific topic. 

You must complete the following 4 tasks in order to successfully download the data to a CSV file:

1. Find the lat/lon value from Google Maps for the region of Münster, Germany 
2. Draw/Edit the (Area of Interest) AOI Over Münster, Germany
3. Limit the spatial extent of your API request for data to your AOI coordinates (BBOX)
4. Fetch PM2.5 readings for 7 days, starting from **15 Jan, 2024**

In [1]:
## Import Libraries

import requests # for connecting URL of the API endpoints
import geojson # to read geojson files
from geojson import dump
import pandas as pd # to handle tabular data
from ipyleaflet import Map, basemaps, WidgetControl, Marker, basemap_to_tiles, DrawControl, GeoJSON # widget to enable map interactions
from ipywidgets import IntSlider, ColorPicker, jslink # widget to enable map interactions
import geopandas as gpd # to read files with spatial information like raster or vector

In [2]:
## Defining the Service Endpoints that we'll use

sensebox_url = "https://api.opensensemap.org/boxes?"
sensebox_data_url = "https://api.opensensemap.org/statistics/descriptive?"

phenomenon = "PM2.5"


### Load Map Widget

The map widget enables us to perform interactive GIS/Map operations programmatically within the jupyter environment.
Run the following cell to visualise/test the map widget. The current center of map is [0,0]

In [3]:
center = (0, 0)

m = Map(center=center, zoom=15)
marker = Marker(location=center, draggable=True)
m.add_layer(marker);
display(m)

Map(center=[0, 0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text'…

**By the way:**
Do you know the name given to this point of zero degrees latitude and zero degrees longitude, i.e., where the prime meridian and the equator intersect?

### TASK 1: Find the lat/lon value from Google Maps for the region of Münster, Germany

Head over to https://maps.google.com/. Search or locate to "Münster, Germany". Now, from the **url** in your browser, find the lat/lon values. 

*Hint: It should start with "@xx.xxxxxxx,xx.xxxxxxx"*

Copy the two numbers and use it to center the map

In [4]:
lat =  
lng = 

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)
display(m)

Map(center=[51.952187, 7.623206], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', …

### TASK 2: Draw/Edit the (Area of Interest) AOI Over Münster, Germany

Our objective is to capture PM2.5 sensors installed in Münster.
Make sure you cover as much of Münster you can but not too big to slow down the API!

In [5]:
## Function to save AOI as GeoJSON

feature_collection = {
    'type': 'FeatureCollection',
    'features': []
}

def handle_draw(self, action, geo_json):    
    feature_collection['features'].append(geo_json)

In [7]:
## Configure draw properties on map

draw_control = DrawControl()
draw_control.rectangle = {
    "shapeOptions": {
        "fillColor": "#fca45d",
        "color": "#fca45d",
        "fillOpacity": 0.5
    }
}

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)

m.add_control(draw_control)

draw_control.on_draw(handle_draw)
display(m)


Map(center=[51.952187, 7.623206], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', …

In [8]:
## Save and display your GeoJSON

munster_aoi = feature_collection

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)

geo_json = GeoJSON(
    data=munster_aoi,
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.4, 'weight': 1, 'fillColor': '#fca45d'
    }
)

m.add_layer(geo_json)
display(m)

Map(center=[51.952187, 7.623206], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', …

In [9]:
## Save the GeoJSON

with open('../data/aoi_opensensemap.geojson', 'w') as f:
    dump(munster_aoi, f)

In [10]:
## Get the Bounding Box Coordinates of AOI

gdf = gpd.read_file('../data/aoi_opensensemap.geojson')
munster_bbox = map(str, gdf.total_bounds)

### Get Senseboxes Installed in Münster, Germany

In [11]:
## Convert bbox coordinates to string representation
geometry = map(str, munster_bbox)

In [12]:
## DEFINE PARAMETERS ##

from_date = "2024-01-01T00:00:00.000Z"
bbox_geometry = ','.join(geometry)

### TASK 3: Limit the spatial extent of your API request for data to your AOI coordinates (BBOX)

Add the bbox parameter after "+" sign

In [13]:
sensebox_url_params = "date="+from_date+"&phenomenon="+phenomenon+"&bbox="+

In [15]:
## Define URL endpoint

pull_senseboxes_url = sensebox_url+sensebox_url_params


Final url should look something like this

*https://api.opensensemap.org/boxes?date=2024-01-01T00:00:00.000Z&phenomenon=PM2.5&bbox=7.547844,51.907801,7.689313,51.999208*

In [17]:
response = requests.get(pull_senseboxes_url)
json_output = response.json()

## We'll use only 10 senseboxes to limit the amount of data
json_output = json_output[:10]
json_output

# retrieving the sensebox data may take some time (30 seconds to some minutes, depending on the network and server workload)..

[{'_id': '591f578c51d34600116a8ea5',
  'createdAt': '2022-03-30T11:25:43.288Z',
  'updatedAt': '2024-08-20T18:15:03.963Z',
  'name': 'Wetterstation Erpho',
  'exposure': 'outdoor',
  'model': 'homeWifiFeinstaub',
  'sensors': [{'title': 'Temperatur',
    'unit': '°C',
    'sensorType': 'HDC1008',
    'icon': 'osem-thermometer',
    '_id': '591f578c51d34600116a8ea6',
    'lastMeasurement': {'createdAt': '2024-08-20T18:15:03.954Z',
     'value': '25.87'}},
   {'title': 'rel. Luftfeuchte',
    'unit': '%',
    'sensorType': 'HDC1008',
    'icon': 'osem-humidity',
    '_id': '591f578c51d34600116a8ea7',
    'lastMeasurement': {'createdAt': '2024-08-20T18:15:03.954Z',
     'value': '62.08'}},
   {'title': 'Luftdruck',
    'unit': 'hPa',
    'sensorType': 'BMP280',
    'icon': 'osem-barometer',
    '_id': '591f578c51d34600116a8ea8',
    'lastMeasurement': {'createdAt': '2024-08-20T18:15:03.954Z',
     'value': '1002.20'}},
   {'title': 'Beleuchtungsstärke',
    'unit': 'lx',
    'sensorType':

In [18]:
senseboxes_list = []
sensebox_coords = {}

## Extract the IDs of the senseboxes

for sensebox in json_output:
    senseboxes_list.append(sensebox['_id'])
    coordinates = sensebox['loc'][0]['geometry']['coordinates']
    sensebox_coords[sensebox['_id']] = coordinates[:2]


In [19]:
## Check the sensebox id numbers
senseboxes_list

['591f578c51d34600116a8ea5',
 '59ad958fd67eb50011b85f6d',
 '5a0c347b9fd3c2001111b701',
 '5abd221b850005001b1aff35',
 '5acfae2a223bd8001977b61e',
 '5b3e7f6f5dc1ec001be11cf1',
 '5d6e465a953683001a2b62c5',
 '5d91f4bb5f3de0001ab6bb78',
 '5e98843845f937001cf26c6d',
 '5f4542b8badf01001bd5cf24']

In [20]:
## Check locations of senseboxes
sensebox_coords = pd.DataFrame(sensebox_coords).T.reset_index()
sensebox_coords.rename(columns={'index': 'sensorId', 0: 'lon', 1: 'lat'}, inplace=True)
sensebox_coords


Unnamed: 0,sensorId,lon,lat
0,591f578c51d34600116a8ea5,7.645218,51.96422
1,59ad958fd67eb50011b85f6d,7.635283,51.903004
2,5a0c347b9fd3c2001111b701,7.620606,51.921065
3,5abd221b850005001b1aff35,7.64153,51.973023
4,5acfae2a223bd8001977b61e,7.646677,51.988501
5,5b3e7f6f5dc1ec001be11cf1,7.569078,51.994149
6,5d6e465a953683001a2b62c5,13.369128,52.520151
7,5d91f4bb5f3de0001ab6bb78,7.631939,51.954339
8,5e98843845f937001cf26c6d,7.543988,51.993367
9,5f4542b8badf01001bd5cf24,7.723412,51.910651


### Get PM2.5 Sensor Readings for the Senseboxes

In [21]:
## Define parameters for the new endpoint

senseboxes_list = ','.join(senseboxes_list)
operation = "arithmeticMean" ## Perform a mean for all values in the duration of "window" timeframe
window = "90000000" ## time in ms. Default: 25 hours
output_format = "tidy" ## Clean CSV Format

### TASK 4: Fetch PM2.5 readings for 7 days, starting from 15 Jan, 2024

The format of timestamp should be in **RFC 3339** notation. For eg: *2024-01-15T00:00:00.000Z*

Note: from_date and to_date is inclusive

It is understood that the time range defined here is older and hence not real-time anymore, however, if we had the ability to pull data that was generated few seconds ago, the functionality would still be the same and it would still be "near real-time app"

In [22]:
from_date = 
to_date = 

In [23]:
sensebox_data_url_params = "boxId="+senseboxes_list+"&from-date="+from_date+"&to-date="+to_date+"&phenomenon="+phenomenon+"&operation="+operation+"&window="+window+"&format="+output_format+"&columns=boxId,boxName,phenomenon,sensorType,unit"

In [24]:
## Define URL endpoint

pull_readings_url = sensebox_data_url+sensebox_data_url_params
pull_readings_url

'https://api.opensensemap.org/statistics/descriptive?boxId=591f578c51d34600116a8ea5,59ad958fd67eb50011b85f6d,5a0c347b9fd3c2001111b701,5abd221b850005001b1aff35,5acfae2a223bd8001977b61e,5b3e7f6f5dc1ec001be11cf1,5d6e465a953683001a2b62c5,5d91f4bb5f3de0001ab6bb78,5e98843845f937001cf26c6d,5f4542b8badf01001bd5cf24&from-date=2024-01-15T00:00:00.000Z&to-date=2024-01-22T00:00:00.000Z&phenomenon=PM2.5&operation=arithmeticMean&window=90000000&format=tidy&columns=boxId,boxName,phenomenon,sensorType,unit'

Final url should look something like this

https://api.opensensemap.org/statistics/descriptive?boxId=5750220bed08f9680c6b4154,591f578c51d34600116a8ea5,599180be7e280a001044b837,59c67b5ed67eb50011666dbb,5a0c15289fd3c200110f3d33&from-date=2022-01-15T00:00:00.000Z&to-date=2022-01-21T00:00:00.000Z&phenomenon=PM2.5&operation=arithmeticMean&window=86400000&format=tidy&columns=boxId,boxName,exposure,height,lat,lon,phenomenon,sensorType,unit

In [30]:
df = pd.read_csv(pull_readings_url)
df.head()
# again: retrieving the sensebox data may take some time (30 seconds to some minutes, depending on the network and server workload)..

Unnamed: 0,sensorId,boxId,boxName,phenomenon,sensorType,unit,time_start,arithmeticMean_90000000
0,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-14T11:00:00.000Z,11.232186
1,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-15T12:00:00.000Z,4.7794
2,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-16T13:00:00.000Z,6.492572
3,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-17T14:00:00.000Z,9.893489
4,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-18T15:00:00.000Z,10.077617


In [33]:
## Check how many values were received for each day

df['time_start'].value_counts()

time_start
2024-01-15T12:00:00.000Z    10
2024-01-16T13:00:00.000Z    10
2024-01-17T14:00:00.000Z    10
2024-01-18T15:00:00.000Z    10
2024-01-14T11:00:00.000Z     9
2024-01-19T16:00:00.000Z     9
2024-01-20T17:00:00.000Z     9
2024-01-21T18:00:00.000Z     9
2024-01-22T19:00:00.000Z     9
Name: count, dtype: int64

In [34]:
## Check how many senseboxes returned values

df['boxId'].value_counts()

boxId
591f578c51d34600116a8ea5    9
59ad958fd67eb50011b85f6d    9
5a0c347b9fd3c2001111b701    9
5abd221b850005001b1aff35    9
5acfae2a223bd8001977b61e    9
5b3e7f6f5dc1ec001be11cf1    9
5d6e465a953683001a2b62c5    9
5d91f4bb5f3de0001ab6bb78    9
5f4542b8badf01001bd5cf24    9
5e98843845f937001cf26c6d    4
Name: count, dtype: int64

In [35]:
## Rename columns appropriately

df.rename(columns={'time_start': 'day', 'arithmeticMean_90000000': 'value'}, inplace=True)

In [36]:
## Get location of senseboxes and fetch other senseboxes without values

stream_data = sensebox_coords.merge(df, left_on='sensorId', right_on='boxId', how='left')
stream_data.head()

Unnamed: 0,sensorId_x,lon,lat,sensorId_y,boxId,boxName,phenomenon,sensorType,unit,day,value
0,591f578c51d34600116a8ea5,7.645218,51.96422,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-14T11:00:00.000Z,11.232186
1,591f578c51d34600116a8ea5,7.645218,51.96422,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-15T12:00:00.000Z,4.7794
2,591f578c51d34600116a8ea5,7.645218,51.96422,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-16T13:00:00.000Z,6.492572
3,591f578c51d34600116a8ea5,7.645218,51.96422,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-17T14:00:00.000Z,9.893489
4,591f578c51d34600116a8ea5,7.645218,51.96422,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2024-01-18T15:00:00.000Z,10.077617


Save the file as CSV

In [37]:
stream_data.to_csv('../data/sample_multilocation.csv')

At the point you should have a file saved by the name **sample_multilocation.csv** in the **data/** folder and this marks the end of STEP-1 of 3 for our worflow

#### END STEP - 1