# Python IUDX SDK - pyIUDX
In this notebook we will use the pyIUDX SDK to interact with IUDX, the India Urban Data Exchange (https://www.iudx.org.in/). IUDX provides easy access to smart city resources, like air quality monitors, smart transportation systems, emergency reporting sensors (flooding), streetlighting metrics, crowd sourced data sources and so on! Along with access, IUDX also provides semantic information about the properties associated with these resources to foster a semantic and wholistic understanding of data. This facilitates richer apps to be developed and provides for intelligent and well integrated analytics.
Visit https://pudx.catalogue.iudx.org.in to explore the dearth of resources available for your consumption.


We encourage you to save a copy on your google drive and try these examples out.


In the first part, we will query and obtain resources as per our requirements. This information is obtained by querying the catalogue server. The catalogue server provides all metadata associated with a sensor, including, sensor type, sensor location, sensor attributes, sensed quantities and their units etc.

In the second part, we will obtain sensed data associated with a set of sensors and perform some simple analytics with it. This section will evoke the power of IUDX and illustrate how dynamic apps and analytics can be developed.

## Install pyIUDX SDK module

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.dates
import folium

In [2]:
# install the latest version of pyIUDX module from github
pip install git+https://github.com/iudx/pyIUDX@v0.1.0

SyntaxError: invalid syntax (<ipython-input-2-cb91ed73b5b9>, line 2)

## Accessing Catalogue server
All sensor properties are stored in the catalogue server. Querying the catalog entry for a given sensor provides us metadata and information about the sensor, particularly:
* The sendor id (the unique id with which we query the sensor information and data)
* Tags associated with the sensor
* Information on the type of data provided by the sensor and their units

### Import *cat* class from *pyIUDX.cat*
The *cat* class provides the APIs to fetch data from the catalogue server.

In [None]:
from pyIUDX.cat import cat

In [None]:
# Specify the catalogue server details.
# initialize a catalogue class
cat = cat.Catalogue("https://pudx.catalogue.iudx.org.in/catalogue/v1")

### Search for catalogue items (sensors)
*getManyResourceItems* member of the catalogue class can be used to fetch a filtered version of catalogue items. The example below shows how to obtain items (sensors) whose tags attribute have the values "aqi" or "aqm". The metadata is returned as a list of dictionary item for each sensor.

In [None]:
attributes = {"tags": ["aqi", "aqm", "climo"]}
allAQMItems = cat.getManyResourceItems(attributes)

In [None]:
print(allAQMItems[0:2])

In [None]:
allAQMItemsCount = cat.getItemCount(attributes)

In [None]:
print(allAQMItemsCount)

### Filter the catalogue response
An unfiltered call to *getManyResourceItems* will return all information associated with each sensor. Using the filters option, we can filter the information returned.

In the example below, by specifying the filters = ["id"], only the "id" of the sensor is returned.

In [None]:
filters = ["id"]
allAQMItemsByID = cat.getManyResourceItems(attributes, filters)

In [None]:
print(allAQMItemsByID[0:3])

### Filter items (sensors) by geo-location
The function *getManyResourceItems* can also be used to filter items (sensors) based on their locations. In the example below, we specify the area of interest as a circle with center specified by latitude ("lat") and longitude ("lon") and radius in meters.
In this case, we are requesting for sensors around a 3km radius.
We also will specify that we are specifically interested in Air Quality Monitoring Stations present in that region.
We only require the ID's for now, which we can obtain by passing a filters option

In [None]:
geo1 = {"circle": {"lat": 18.539107, "lon": 73.853987, "radius": 3000}}
attributes = {"tags": ["aqi", "aqm", "climo"]}
filters = ["id"]


The below call returns all sensors with tags "aqi" or "aqm" and within the geographical area specified. Further, as specified by the filters argument, only the "id" information is returned for each of the sensors.


In [None]:
allAQMItemsByID = cat.getManyResourceItems(attributes, filters, geo=geo1)

In [None]:
print(allAQMItemsByID)
print("Number of items = ", len(allAQMItemsByID))

## Fetch data from sensors
In this section we will fetch sensor data from the resource server and show off a little bit of the magic of IUDX.

### Import *item* class from *pyIUDX.rs*
The *item* class provides the APIs to access relevant sensor data.
This is a high level abstraction module which encapsulates multiple functionalities of IUDX such as fetching meta information from data models and live data from  resource servers

In [None]:
# Import the item class from pyIUDX.rs
from pyIUDX.rs import item


### Plot sensor locations

We can pass the previously obtained list of filtered Air Quality Monitoring stations to the Items class. This will load a list of resourceItem objects and provide neat access to their data.

In [None]:
geo1 = {"circle": {"lat": 18.539107, "lon": 73.853987, "radius": 3000}}
attributes = {"tags": ["aqm"]}
filters = ["id"]
allAQMItemsByID = cat.getManyResourceItems(attributes, filters, geo=geo1)

In [None]:


m = folium.Map(location=[18.5204,73.8567],zoom_start=12)
aqms = item.Items("https://pudx.catalogue.iudx.org.in/catalogue/v1", "https://pudx.resourceserver.iudx.org.in/resource-server/pscdcl/v1", allAQMItemsByID)
print(aqms[0].geoProperties)
for sensor in aqms:
  sensor_id = sensor.id  
  print("Sensor location = ", sensor.location.coordinates)
  folium.Marker([sensor.location.coordinates[1], sensor.location.coordinates[0] ], popup=sensor_id).add_to(m)    
m

### Fetch Quantitative Properties 
We will iterate across the list of sensors and obtain *PM10_MAX* values and its meta-information. Further, we will also obtain data for a specified duration.
You can find the data model for an AQM sensor here 
https://github.com/iudx/iudx-ld/blob/master/data_models/environment/airQuality/env_aqm_climoPune_0.json

Let's use one of the items form the list of sensors we just created

In [None]:
print(aqms[0].id)

We can find out all the QuantitativeProperties (measured properties) of an aqm item by calling the Object.quantitativeProperties property.
Since our previous filter filtered out all AQM sensors, we can assume that the quantitative properties for sensor aqms[0] is the same for the rest.

In [None]:
print(aqms[0].quantitativeProperties)

A quantitativeProperty also has further meta information related to that property, such as a detailed description, units, etc.
We can get a list of such attributes for a quantitativeProperty, and access them directly.
For e.g, for PM10_MAX.


In [None]:
print(aqms[0].CO2_MAX.attributes)
print("Name of the property is " + aqms[0].CO2_MAX.symbol)
print("Units of the property are " + aqms[0].CO2_MAX.symbol)
print("The property tells us the " + aqms[0].CO2_MAX.describes)

We need to call the object's latest() method to get the latest data .
Calling aqms.latest() will update the latest values of all these properties and for **all the previously filtered sensors**.

In [None]:
aqms.latest()

We can now call aqms[0]."quantitativeProperty".value to obtain a numpy array with the first column as datetime and second column as that property's value.
A quantitativeProperty always has a .value attribute.


In [None]:
aqms[0].CO2_MAX.value

Calling latest() once is enough. We can access other quantitativeProperties as well.

In [None]:
aqms[0].SO2_MAX.value

Supposing there is a need to find the trend of a particular property over a period of time. We can use Items object's during() functionality to show a time series view of that quantitativeProperty. We need to specify the start and end time in utc format. Let's get the data during 25th October 2019 and 2nd November 2019.

In [None]:
aqms.during("2019-10-26T00:00:00.000+05:30", "2019-11-02T00:00:00.000+05:30")



Now we can repeat what we did earlier and find the value of the quantitativeProperty **during** that period of time for **sensor 1**

In [None]:
print(aqms[1].PM10_MAX.value[0:10,:])

We can utilize all of the other meta information that's part of the object and plot it right away!

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(15,10))

fig.suptitle(aqms[1].PM10_MAX.name + "\n" + aqms[1].PM10_MAX.describes, fontsize=20)
plt.plot(aqms[1].PM10_MAX.value[:,0], aqms[1].PM10_MAX.value[:,1])
plt.xlabel(aqms[1].timeProperty, fontsize=18)
plt.ylabel(aqms[1].PM10_MAX.name + " (" + aqms[1].PM10_MAX.symbol + ")", fontsize=16)

### IUDX Magic!
We can repeat the above for all the selected sensors!
Dynamically populating the x and y axis, showing the units of measurement and many more!

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(15,10))
fig.suptitle(aqms[0].PM10_MAX.name + "\n" + aqms[0].PM10_MAX.describes, fontsize=20)
plt.xlabel(aqms[0].timeProperty, fontsize=18)
plt.ylabel(aqms[0].PM10_MAX.name + " (" + aqms[0].PM10_MAX.symbol + ")", fontsize=16)

for sensor in aqms:
  plt.plot(sensor.PM10_MAX.value[:,0], sensor.PM10_MAX.value[:,1], label=sensor.id.split("/")[-1])
plt.legend()
plt.show()

Notice how the pollution significantly peaks around deepavali (28th-30th october)!

### Geo-spatial analytics with IUDX
Let's do something a bit more advance. 
Let's plot the geo-spatial distrubution of PM10 across the entire city of Pune.

Install a few dependencies and import a few libraries.

In [None]:
!pip install geojsoncontour


In [None]:
from folium import plugins
import geojsoncontour
import scipy.ndimage
import scipy as sp
from scipy.interpolate import griddata



Let's extend our analysis to a large area, about 60 kms in diameter. This covers most Pune city.
Again, we use the catalogue to filter out sensors within a wide region and pass it to the Items class.

In [None]:
from pyIUDX.rs import item
geo1 = {"circle": {"lat": 18.539107, "lon": 73.853987, "radius": 30000}}
attributes = {"tags": ["aqm"]}
filters = ["id"]
allAQMItemsByID = cat.getManyResourceItems(attributes, filters, geo=geo1)
print(allAQMItemsByID)
aqms = item.Items("https://pudx.catalogue.iudx.org.in/catalogue/v1", allAQMItemsByID)



Let's now get the latest data from these sensors.
This will take about 30 seconds on the cloud

In [None]:
aqms.latest()

Make numpy arrays of all the items locations and values.

In [None]:
# Get the latest PM_10 values for all aqms
zs = []
x_orig = []
y_orig = []
for aqm in aqms:
  val = aqm.PM10_MAX.value[:, 1].astype("float")
  if val.size > 0:
    zs.append(val[0])
    x_orig.append(aqm.location.coordinates[0])
    y_orig.append(aqm.location.coordinates[1])

x_orig = np.array(x_orig)
y_orig = np.array(y_orig)
zs = np.array(zs)

Initialize the map and show all sensor locations

In [None]:
# Initialize the map
geomap = folium.Map([y_orig.mean(), x_orig.mean()], zoom_start=13, tiles="cartodbpositron")

for sensor in aqms:
  sensor_id = sensor.id  
  folium.Marker([sensor.location.coordinates[1], sensor.location.coordinates[0] ], popup=sensor_id).add_to(geomap)  

geomap  

We are now ready to perform the analytic. 
This is a rather simple example where we perform cubic geo-spatial interplation of PM10_MAX concentrations across Pune. We will see where the major pollutant hotspots are located

In [None]:
# Make lat and lon linspace
y_arr = np.linspace(np.min(y_orig), np.max(y_orig), 100)
x_arr = np.linspace(np.min(x_orig), np.max(x_orig), 100)
# Make mesh grid
x_mesh, y_mesh = np.meshgrid(x_arr, y_arr)


# Perform cubic interpolation
z_mesh = griddata((x_orig, y_orig), zs, (x_mesh, y_mesh), method='cubic')
# Number of levels of colors
levels = 20
# Make contours of the fird value obtained in z_mesh
contourf = plt.contourf(x_mesh, y_mesh, z_mesh, levels, alpha=0.5, cmap="bwr", linestyles='None', vmin=0, vmax=100)

# Convert matplotlib contourf to geojson
geojson = geojsoncontour.contourf_to_geojson(
    contourf=contourf,
    min_angle_deg=3.0,
    ndigits=5,
    stroke_width=1,
    fill_opacity=0.5)

# Plot the contour plot on folium
folium.GeoJson(
    geojson,
    style_function=lambda x: {
        'color':     x['properties']['stroke'],
        'weight':    x['properties']['stroke-width'],
        'fillColor': x['properties']['fill'],
        'opacity':   0.6,
    }).add_to(geomap)

# Show map
geomap


## Downloading larger datasets
Because of the large size of the data availabe, we have restricted PUDX "during" queries to only work when the time is less than one day. If data for a longer period is required, you will be needing the download API.
We will however need the resourceServerGroup id instead of the id for this. To find this, you can go to pudx.catalogue.iudx.org.in and search for aqm with tags. Once the item is shown in the list view, you can click "details" and obtain the group id.
The resourceServerGroup id for AQM is "urn:iudx-catalogue-pune:pudx-resource-server/aqm-bosch-climo"

This will now give us a Google Drive link which we can use to download files based on weeks of the year. 

In [None]:
from pyIUDX.rs import rs

rs = rs.ResourceServer("https://pudx.resourceserver.iudx.org.in/resource-server/pscdcl/v1")

groupId = "urn:iudx-catalogue-pune:pudx-resource-server/aqm-bosch-climo"

data = rs.downloadData(groupId)
data

On opening that download_URL, you will find different files corresponding to different weeks of the year for AQM. You can then use python pyDrive module to download that file.

In [None]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from pydrive.files import GoogleDriveFile
from google.colab import auth
from oauth2client.client import GoogleCredentials

Authenticate with Google Drive. This will ask you to follow a link and allow access. You need to have a gmail account. Once you have accessed the link 

In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
files = GoogleDriveFile(auth=gauth)

In [None]:
folder_id = data["download_URL"].split("=")[-1]
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % folder_id}).GetList()

for f in file_list:
  print(f["title"])

Let's get data for one such week.

In [None]:
fl = file_list[0]
fl.GetContentFile(fl["title"])

In [None]:
!ls

In [None]:
import json 

with open(fl["title"], "r") as f:
  df_json = json.load(f)
""" The first data point in the downloaded data """
df_json[0:1]

Doing you analysis this way might be simpler that using the Item() module as was shown previously, but it hides away some of the meta information that you would need, for example location. This can still be overcome by querying for the location of the "NAME" field in the data packet as shown below.
Refer to - getOneresourceItem in https://pyiudx.readthedocs.io/en/latest/pyIUDX.cat.html

In [None]:
attributes = {"NAME": [df_json[0]["NAME"]]}
item = cat.getManyResourceItems(attributes)[0]
item

Obtaining location for this is now simple -

In [None]:
item["location"]["value"]["geometry"]["coordinates"]

Bear in mind, the location is a geo-json and the latitude and longitude are interchanged. This is because usually coordinates are mentioned as [x, y] where x axis is usually the longitude and lattitude. However, most of the mapping utitilities consider [y, x] or [latitude, longitude]. You will have to interchange the latitude and longitude in this case.

### Grouping in pandas
The downloaded data can be grouped by "NAME" field and help you in generating a dataset. You will need Pandas for this.

In [None]:
import pandas as pd

df_aqm = pd.DataFrame(df_json)
df_aqm.head(3)

Now we can group by "NAME" field -

In [None]:
df_grouped = df_aqm.groupby("NAME")
print("Getting data frame for " + df_json[0]["NAME"])
some_aqm_sensor = df_grouped.get_group(df_json[0]["NAME"])
some_aqm_sensor.head(3)


# Conclusion

To recap, we've shown you a simple flow of obtaining the different sensors available in a region using the Catalogue module. We've taken this information and passed it to the Items module which provides a complex abstraction over the selected items and makes data access simple. We've then gone on to plot the different quantitative properties for an air quality monitor across the city and used IUDX's magic to dynamically give us the units, locations, and information about a pollutant, PM 10. Lastly we showed how IUDX makes complex analytics simple by helping discover and handle data. 

We encourage you to download this notebook and tinker around. Feedback is welcome. Visit https://github.com/iudx/pyIUDX for more advance usage and report errors/enhancement requests.