# OpenSenseMap API: Data Downloading

In this notebook you will learn how to pull data from https://opensensemap.org/ using it's API (https://docs.opensensemap.org/). Using ipyleaflet extension you'll also be able to perform interactive GIS actions and customize the kind of data you want to download.

You'll need to perform the following 4 tasks to be able to successfully download the data into a CSV file:

1. Find the lat/lon value from Google Maps for the region of Münster, Germany 
2. Draw/Edit the (Area of Interest) AOI Over Münster, Germany
3. Limit the spatial extent of API to your BBOX coordinates 
4. Fetch PM2.5 readings for 7 days, starting from **15 Jan, 22**

In [1]:
import requests, geojson
from geojson import dump
import pandas as pd
from ipyleaflet import Map, basemaps, WidgetControl, Marker, basemap_to_tiles, DrawControl, GeoJSON
from ipywidgets import IntSlider, ColorPicker, jslink
import geopandas as gpd

In [2]:
## ENDPOINTS DEFINITION ##

phenomenon = "PM2.5"
sensebox_url = "https://api.opensensemap.org/boxes?"
sensebox_data_url = "https://api.opensensemap.org/statistics/descriptive?"

### Load Map Widget

Run the following cell to visualise/test the map widget. The current center of map is [0,0]

In [3]:
center = (0, 0)

m = Map(center=center, zoom=15)
marker = Marker(location=center, draggable=True)
m.add_layer(marker);
display(m)

Map(center=[0, 0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text'…

**Question:**
Do you know the name given to this point of zero degrees latitude and zero degrees longitude, i.e., where the prime meridian and the equator intersect?

### TASK 1: Find the lat/lon value from Google Maps for the region of Münster, Germany

Head over to https://maps.google.com/. Search or locate to "Münster, Germany". Now, from the **url** in your browser, find the lat/lon values. 

*Hint: It should start with "@xx.xxxxxxx,xx.xxxxxxx"*

Copy the two numbers and use it to center the map

In [4]:
lat = 
lng = 

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)
display(m)

Map(center=[51.9500023, 7.4840147], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title'…

### TASK 2: Draw/Edit the (Area of Interest) AOI Over Münster, Germany

Our objective is to capture PM2.5 sensors installed in Münster.
Make sure you cover as much of Münster you can but not too big to slow down the API!

In [5]:
## Function to save AOI as GeoJSON

feature_collection = {
    'type': 'FeatureCollection',
    'features': []
}

def handle_draw(self, action, geo_json):    
    feature_collection['features'].append(geo_json)

In [6]:
## Configure draw properties on map

draw_control = DrawControl()
draw_control.rectangle = {
    "shapeOptions": {
        "fillColor": "#fca45d",
        "color": "#fca45d",
        "fillOpacity": 0.5
    }
}

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)

m.add_control(draw_control)

draw_control.on_draw(handle_draw)
display(m)


Map(center=[51.9500023, 7.4840147], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title'…

In [7]:
## Save and display your GeoJSON

munster_aoi = feature_collection

center = (lat, lng)

m = Map(center=center, zoom=11)
marker = Marker(location=center, draggable=True)

geo_json = GeoJSON(
    data=munster_aoi,
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.4, 'weight': 1, 'fillColor': '#fca45d'
    }
)

m.add_layer(geo_json)
display(m)

Map(center=[51.9500023, 7.4840147], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title'…

In [8]:
## Save the GeoJSON

with open('../data/aoi_opensensemap.geojson', 'w') as f:
    dump(munster_aoi, f)

In [9]:
## Get the Bounding Box Coordinates of AOI

gdf = gpd.read_file('../data/aoi_opensensemap.geojson')
munster_bbox = map(str, gdf.total_bounds)

### Get Senseboxes Installed in Münster, Germany

In [10]:
## Convert bbox coordinates to string representation
geometry = map(str, munster_bbox)

In [11]:
## DEFINE PARAMETERS ##

from_date = "2020-01-01T00:00:00.000Z"
bbox_geometry = ','.join(geometry)

### TASK 3: Limit the spatial extent of API to your BBOX coordinates

Add the bbox parameter after "+" sign

In [12]:
sensebox_url_params = "date="+from_date+"&phenomenon="+phenomenon+"&bbox="+

In [13]:
## Define URL endpoint

pull_senseboxes_url = sensebox_url+sensebox_url_params
pull_senseboxes_url

'https://api.opensensemap.org/boxes?date=2020-01-01T00:00:00.000Z&phenomenon=PM2.5&bbox=7.523121,51.875076,7.7573,52.018978'

Final url should look something like this

*https://api.opensensemap.org/boxes?date=2020-01-01T00:00:00.000Z&phenomenon=PM2.5&bbox=7.547844,51.907801,7.689313,51.999208*

In [15]:
response = requests.get(pull_senseboxes_url)
json_output = response.json()

## We'll use only 7 senseboxes to limit the amount of data
json_output = json_output[:7]

In [16]:
senseboxes_list = []
sensebox_coords = {}

## Extract the IDs of the senseboxes

for sensebox in json_output:
    senseboxes_list.append(sensebox['_id'])
    sensebox_coords[sensebox['_id']] = sensebox['loc'][0]['geometry']['coordinates']

In [17]:
## Check the sensebox id numbers
senseboxes_list

['5750220bed08f9680c6b4154',
 '591f578c51d34600116a8ea5',
 '599180be7e280a001044b837',
 '59ad958fd67eb50011b85f6d',
 '59c67b5ed67eb50011666dbb',
 '5a0c15289fd3c200110f3d33',
 '5a0c2cc89fd3c200111118f0']

In [18]:
## Check locations of senseboxes
sensebox_coords = pd.DataFrame(sensebox_coords).T.reset_index()
sensebox_coords.rename(columns={'index': 'sensorId', 0: 'lon', 1: 'lat'}, inplace=True)
sensebox_coords

Unnamed: 0,sensorId,lon,lat
0,5750220bed08f9680c6b4154,7.651169,51.956168
1,591f578c51d34600116a8ea5,7.645218,51.96422
2,599180be7e280a001044b837,7.684194,51.929339
3,59ad958fd67eb50011b85f6d,7.635283,51.903004
4,59c67b5ed67eb50011666dbb,7.62677,51.946322
5,5a0c15289fd3c200110f3d33,7.641463,51.953351
6,5a0c2cc89fd3c200111118f0,7.641426,51.960435


### Get PM2.5 Sensor Readings for the 5 Senseboxes

In [19]:
## Define parameters for the new endpoint

senseboxes_list = ','.join(senseboxes_list)
operation = "arithmeticMean" ## Perform a mean for all values in the duration of "window" timeframe
window = "90000000" ## time in ms. Default: 25 hours
output_format = "tidy" ## Clean CSV Format

### TASK 4: Fetch PM2.5 readings for 7 days, starting from 15 Jan, 22

The format of timestamp should be in **RFC 3339** notation. For eg: *2015-01-22T00:00:00.000Z*

Note: from_date and to_date is inclusive

In [20]:
from_date = ""
to_date = ""

In [21]:
sensebox_data_url_params = "boxId="+senseboxes_list+"&from-date="+from_date+"&to-date="+to_date+"&phenomenon="+phenomenon+"&operation="+operation+"&window="+window+"&format="+output_format+"&columns=boxId,boxName,phenomenon,sensorType,unit"

In [22]:
## Define URL endpoint

pull_readings_url = sensebox_data_url+sensebox_data_url_params
pull_readings_url

'https://api.opensensemap.org/statistics/descriptive?boxId=5750220bed08f9680c6b4154,591f578c51d34600116a8ea5,599180be7e280a001044b837,59ad958fd67eb50011b85f6d,59c67b5ed67eb50011666dbb,5a0c15289fd3c200110f3d33,5a0c2cc89fd3c200111118f0&from-date=2022-01-15T00:00:00.000Z&to-date=2022-01-21T00:00:00.000Z&phenomenon=PM2.5&operation=arithmeticMean&window=90000000&format=tidy&columns=boxId,boxName,phenomenon,sensorType,unit'

Final url should look something like this

https://api.opensensemap.org/statistics/descriptive?boxId=5750220bed08f9680c6b4154,591f578c51d34600116a8ea5,599180be7e280a001044b837,59c67b5ed67eb50011666dbb,5a0c15289fd3c200110f3d33&from-date=2022-01-15T00:00:00.000Z&to-date=2022-01-21T00:00:00.000Z&phenomenon=PM2.5&operation=arithmeticMean&window=86400000&format=tidy&columns=boxId,boxName,exposure,height,lat,lon,phenomenon,sensorType,unit

In [23]:
df = pd.read_csv(pull_readings_url)
df.head()

Unnamed: 0,sensorId,boxId,boxName,phenomenon,sensorType,unit,time_start,arithmeticMean_90000000
0,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2022-01-14T06:00:00.000Z,131.078049
1,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2022-01-15T07:00:00.000Z,184.012058
2,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2022-01-16T08:00:00.000Z,147.403699
3,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2022-01-17T09:00:00.000Z,211.829843
4,59458624a4ad590011186665,591f578c51d34600116a8ea5,Wetterstation Erpho,PM2.5,SDS 011,µg/m³,2022-01-18T10:00:00.000Z,34.967519


In [24]:
## Check how many values were received for each day

df['time_start'].value_counts()

2022-01-14T06:00:00.000Z    3
2022-01-15T07:00:00.000Z    3
2022-01-16T08:00:00.000Z    3
2022-01-17T09:00:00.000Z    3
2022-01-18T10:00:00.000Z    3
2022-01-19T11:00:00.000Z    3
2022-01-20T12:00:00.000Z    3
2022-01-21T13:00:00.000Z    3
Name: time_start, dtype: int64

In [25]:
## Check how many senseboxes returned values

df['boxId'].value_counts()

591f578c51d34600116a8ea5    8
59ad958fd67eb50011b85f6d    8
5750220bed08f9680c6b4154    8
Name: boxId, dtype: int64

In [26]:
## Rename columns appropriately

df.rename(columns={'time_start': 'day', 'arithmeticMean_90000000': 'value'}, inplace=True)

In [27]:
## Get location of senseboxes and fetch other senseboxes without values

stream_data = sensebox_coords.merge(df, left_on='sensorId', right_on='boxId', how='left')
stream_data.head()

Unnamed: 0,sensorId_x,lon,lat,sensorId_y,boxId,boxName,phenomenon,sensorType,unit,day,value
0,5750220bed08f9680c6b4154,7.651169,51.956168,5a0d58d69fd3c2001129024f,5750220bed08f9680c6b4154,BalkonBox Mindener Str.,PM2.5,SDS 011,µg/m³,2022-01-14T06:00:00.000Z,26.9782
1,5750220bed08f9680c6b4154,7.651169,51.956168,5a0d58d69fd3c2001129024f,5750220bed08f9680c6b4154,BalkonBox Mindener Str.,PM2.5,SDS 011,µg/m³,2022-01-15T07:00:00.000Z,24.967154
2,5750220bed08f9680c6b4154,7.651169,51.956168,5a0d58d69fd3c2001129024f,5750220bed08f9680c6b4154,BalkonBox Mindener Str.,PM2.5,SDS 011,µg/m³,2022-01-16T08:00:00.000Z,23.8172
3,5750220bed08f9680c6b4154,7.651169,51.956168,5a0d58d69fd3c2001129024f,5750220bed08f9680c6b4154,BalkonBox Mindener Str.,PM2.5,SDS 011,µg/m³,2022-01-17T09:00:00.000Z,7.263709
4,5750220bed08f9680c6b4154,7.651169,51.956168,5a0d58d69fd3c2001129024f,5750220bed08f9680c6b4154,BalkonBox Mindener Str.,PM2.5,SDS 011,µg/m³,2022-01-18T10:00:00.000Z,15.919987


Save the file as CSV

In [28]:
stream_data.to_csv('../data/sample_multilocation.csv')

### Stream Data Using Kafka

Option 1:

To stream data you can either open a new **Terminal** and enter the following command:

*python src/sendStream.py data/sample_multilocation.csv*

Option 2:

Run Python Command From Jupyter in the following cell

In [29]:
!python ../src/sendStream.py ../data/sample_multilocation.csv

Message produced: b'{"26.9782": [51.95616769306822, 7.6511693559587, "2022-01-14T06:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"24.967153792623524": [51.95616769306822, 7.6511693559587, "2022-01-15T07:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"23.817200000000003": [51.95616769306822, 7.6511693559587, "2022-01-16T08:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"7.263709139426283": [51.95616769306822, 7.6511693559587, "2022-01-17T09:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"15.919986675549634": [51.95616769306822, 7.6511693559587, "2022-01-18T10:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"12.40247160988644": [51.95616769306822, 7.6511693559587, "2022-01-19T11:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"3.4782666666666664": [51.95616769306822, 7.6511693559587, "2022-01-20T12:00:00.000Z", "5750220bed08f9680c6b4154"]}'
Message produced: b'{"4.190406395736176": [51.9

#### END