# NPFE Data with CSV on the Web

This notebooks exemplifies how data about [new particle formation events](http://purl.obolibrary.org/obo/ENVO_01001372) can be described using [CSV on the Web](https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/). 

We interpret particle size distribution data (primary data) as measured by an observation system of the [SMEAR](https://www.atm.helsinki.fi/SMEAR/) research infrastructure in order to detect the occurrence of new particle formation events on selected days in [Hyytiälä](http://sws.geonames.org/656888/), Finland. Detected events are then described, whereby we generate secondary (derivative) data about events. These data are stored to disk in CSV format.

In addition, we use CSV on the Web to describe the secondary CSV data using common, shared terminology that is unambiguously identified and described on the web. This makes the secondary CSV data more interoperable, reusable and understandable to machines. For instance, we can use the description to transform the CSV data into RDF and then leverage SPARQL to query this data.

Before we start, we need to install and load required Python modules as well as a few functions used in the workflow. Let's load first the required Python modules. Please execute the following (and all other) code blocks using `ALT+ENTER` or the `Run` button in the menu.

In [None]:
!pip install csvwlib 

In [1]:
import requests, io, os, pandas as pd, numpy as np
from urllib.parse import urlencode
from pytz import timezone
from datetime import datetime, timedelta
from csvwlib import CSVWConverter
from matplotlib import pyplot as plt
from rdflib.plugins.sparql.results.csvresults import CSVResultSerializer

def fetch(date):
    time_from = timezone('Europe/Helsinki').localize(datetime.strptime(date, '%Y-%m-%d'))
    time_to = time_from + timedelta(days=1)

    query = {
        'table': 'HYY_DMPS', 'quality': 'ANY', 'averaging': 'NONE', 'type': 'NONE',
        'from': str(time_from), 'to': str(time_to), 'variables': 'd316e1,d355e1,d398e1,'\
        'd447e1,d501e1,d562e1,d631e1,d708e1,d794e1,d891e1,d100e2,d112e2,d126e2,d141e2,d158e2,'\
        'd178e2,d200e2,d224e2,d251e2,d282e2,d316e2,d355e2,d398e2,d447e2,d501e2,d562e2,d631e2,'\
        'd708e2,d794e2,d891e2,d100e3,d112e3,d126e3,d141e3,d158e3,d178e3,d200e3'
    }
    
    url = 'https://avaa.tdata.fi/smear-services/smeardata.jsp?' + urlencode(query)
    response = requests.post(url)

    return pd.read_csv(io.StringIO(response.text))


def plot(data):
    d = data.copy(deep=True)
    d = d.iloc[:, 6:].values
    m = len(d)
    n = len(d[0])
    x = range(0, m)
    y = range(0, n)
    x, y = np.meshgrid(x, y)
    z = np.transpose(np.array([row[1:] for row in d]).astype(np.float))
    plt.figure(figsize=(10, 5), dpi=100)
    plt.pcolormesh(x, y, z)
    plt.plot((0, x.max()), (y.max()/2, y.max()/2), "r-")
    plt.colorbar()
    plt.xlim(right=m-1)
    x_ticks = np.arange(x.min(), x.max(), 6)
    x_labels = range(x_ticks.size)
    plt.xticks(x_ticks, x_labels)
    plt.xlabel('Hours')
    y_ticks = np.arange(y.min(), y.max(), 6)
    y_labels = ['3.16', '6.31', '12.6', '25.1', '50.1', '100']
    plt.yticks(y_ticks, y_labels)
    plt.ylabel('Diameter [nm]')
    plt.ylim(top=n-1)
    plt.show()
    

def query(q):
    serializer = CSVResultSerializer(g.query(q))
    output = io.BytesIO()
    serializer.serialize(output)
    return pd.read_csv(io.StringIO(output.getvalue().decode('utf-8')), encoding='utf-8')

Let's also initialize two data structures we need to collect data about new particle formation events.

In [15]:
labels = ['date', 'start', 'end', 'class']
data = []

## Data Interpretation

For a number of days, we fetch and plot particle size distribution data (primary data) as measured by the observation system. We provide several (example) days at which an event occurred. Please process some of the provided days. Your task is to record the start and end times as well as the classification of the event by looking at the visualization. We follow the classification scheme by [dal Maso et al.](http://www.borenv.net/BER/pdfs/ber10/ber10-323.pdf) consisting of three classes, namely Ia, Ib and II:

* Class I: Days when the growth and formation rate can be determined with good confidence
* Class Ia: Very clear and strong particle formation events
* Class Ib: Other Class I events
* Class II: Days where the derivation of these parameters is not possible or the accuracy of the results is questionable.

You can of course also select days other than the ones suggested here. You could choose a day that follows one provided here and see how the visualized observational data differs. To guide you, we also provide example days at which no event occurred.

In [None]:
# Days to process
#
# Event days
# 2007-04-15, 2007-05-05, 2007-05-18, 2007-10-19, 2008-02-19, 2009-03-19, 2009-03-22 
# 2011-03-15, 2011-04-19, 2011-10-01, 2012-05-01, 2012-05-29, 2013-02-20, 2013-04-04
#
# Non Event days
# 2007-04-20, 2008-02-20, 2009-04-03, 2011-04-21, 2012-05-05, 2013-02-21

day = '2007-05-05'

plot(fetch(day))

In [None]:
start = '12:00'
end = '13:30'
# One of Ia, Ib or II
classification = 'Ia'

In [None]:
data.append((day, start, end, classification))

In [None]:
df = pd.DataFrame.from_records(data, columns=labels)
df.to_csv('data.csv', index=False)

In [14]:
# FIXME
url = 'https://raw.githubusercontent.com/markusstocker/carbon-workshop/master/data.csv'

In [3]:
g = CSVWConverter.to_rdf(url, mode='minimal')

print(g.serialize(format='ttl').decode('utf-8'))

@prefix : <https://raw.githubusercontent.com/markusstocker/carbon-workshop/master/data.csv#> .
@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix ctag: <http://commontag.org/ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix dc11: <http://purl.org/dc/elements/1.1/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dqv: <http://www.w3.org/ns/dqv#> .
@prefix duv: <https://www.w3.org/TR/vocab-duv#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix gr: <http://purl.org/goodrelations/v1#> .
@prefix grddl: <http://www.w3.org/2003/g/data-view#> .
@prefix ical: <http://www.w3.org/2002/12/cal/icaltzd#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix ma: <http://www.w3.org/ns/ma-ont#> .
@prefix ns1: <http://purl.obolibrary.org/obo/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix og: <http://ogp.me/ns#> .
@prefix org: <http

In [13]:
display(query("""
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX class: <http://avaa.tdata.fi/class/>
SELECT ?date ?start ?end
WHERE {
  [] a obo:ENVO_01001372 ;
    obo:OBI_0000999 ?class ;
    obo:STATO_0000093 ?date ;
    obo:RO_0002537 ?start ;
    obo:RO_0002538 ?end .
  FILTER (?class = class:Ia)
  FILTER (?date > "2007-05-01"^^xsd:date)
}
"""))

Unnamed: 0,date,start,end
0,2007-05-05,12:00:00,13:30:00
