# Introduction

This notebook supports the analysis of information about new particle formation events managed by the database. Events can be mapped, evaluated statistically, or described. First we configure the analysis for a day and/or place.

In [None]:
# Select the day and place
# Day format: yyyy-mm-dd
# Valid places: Hyytiälä, Värriö
# Examples: 
#  day = '2013-04-04', place = 'Hyytiälä'
#  day = '2013-04-08', place = 'Hyytiälä'
day = ''
place = 'Hyytiälä'

## Initialization

In [None]:
import pandas as pd
from io import StringIO
from IPython.display import display, HTML
from SPARQLWrapper import SPARQLWrapper, CSV
from pynpf.processing.visualization import imap
from pynpf.processing.statistics import duration
from pynpf.processing.description import describe
from pynpf.factory import events, record

pd.set_option('display.max_colwidth', 200)

prefixes = """
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX lode: <http://linkedevents.org/ontology/>
PREFIX gn: <http://www.geonames.org/ontology#>
PREFIX prov: <http://www.w3.org/ns/prov#>
"""

def query(sparql):
    sw = SPARQLWrapper("http://localhost:3030/pynpf/sparql")
    sw.setQuery('{}{}'.format(prefixes, sparql))
    sw.setReturnFormat(CSV)
    display(pd.read_csv(StringIO(sw.query().convert().decode())))

## Execution

In [None]:
# Visualize event places on an interactive map, possibly on a specific day
imap(events())

In [None]:
# Compute the average duration of events, possibly on a specific day and/or place
d = duration(events(), fun='avg', prov={'agent': 'https://orcid.org/0000-0001-5492-3212'})

print(d.value())

Record the computed average duration, for instance if it ought to be published in a paper as a result.

This records the computed average duration as [average value](http://purl.obolibrary.org/obo/OBI_0000679) with [scalar value specification](http://purl.obolibrary.org/obo/OBI_0001931), that is a numeric [duration](https://www.w3.org/TR/owl-time/#time:Duration) with unit type [hour](https://www.w3.org/TR/owl-time/#time:unitHour), whereby the average value [is about](http://purl.obolibrary.org/obo/IAO_0000136) the [dataset](http://purl.obolibrary.org/obo/IAO_0000100) of events for which the average duration was computed. This also records the provenance of the average value as it [was derived from](https://www.w3.org/TR/prov-o/#wasDerivedFrom) the dataset of events, including involved agent and activity of [averaging data transformation](http://purl.obolibrary.org/obo/OBI_0200170).

As a result, the computed average duration is an identified resource and could potentially be referred to in published literature.

In [None]:
record(d)

The following query retrieves computed average durations and related dataset.

In [None]:
query("""
    select ?duration ?unit ?dataset where {
      [] rdf:type obo:OBI_0000679 ;         # average value
         obo:OBI_0001938 [                  # has scalar value specification
           rdf:type time:Duration ;         # a value specification
           time:numericDuration ?duration ;
           time:unitType ?unit
         ] ;
         obo:IAO_0000136 ?dataset .
    }
""")

Given the computed average durations, we can inspect the related datasets using the following query. Make sure to replace the dataset identifier in the filter by choosing one in the results above.

In [None]:
query("""
    select ?beginning ?end ?place where {
      ?dataset rdf:type obo:IAO_0000100 . # data set
      ?dataset obo:BFO_0000051 [          # has part
           rdf:type lode:Event ;
           lode:atTime [ 
             time:hasBeginning [ time:inXSDDateTime ?beginning ] ;
             time:hasEnd [ time:inXSDDateTime ?end ] 
           ] ;
           lode:atPlace [ gn:name ?place ]
         ] . 
      filter (?dataset = <http://avaa.tdata.fi/web/smart/smear/a72e02d77771b28de3c5f9704c0c46b0>)
    }
""")

We can also obtain a provenance description of the computed average duration as it is derived from the dataset of events, including involved agent and activity.

In [None]:
query("""
    select ?average ?dataset where {
      ?average prov:wasDerivedFrom ?dataset .
      ?average prov:wasGeneratedBy obo:OBI_0200170 .                          # averaging data transformation
      ?average prov:wasAttributedTo <https://orcid.org/0000-0001-5492-3212> .
    }
""")

The following query displays the computed average duration derived from a dataset using the averaging data transformation activity.

In [None]:
query("""
    select ?duration ?unit where {
      ?average rdf:type obo:OBI_0000679 . # average value
      ?average obo:OBI_0001938 [          # has scalar value specification: duration
        rdf:type time:Duration ;          # a value specification
        time:numericDuration ?duration ;
        time:unitType ?unit
      ] .
      filter (?average = <http://avaa.tdata.fi/web/smart/smear/03220a5c986d82170241fb757404bec2>)
    }
""")

In [None]:
# Describe an event in plain English text
describe(events(place=place), format='text')

In [None]:
# Describe an event with information in machine readable format 
describe(events(day, place), format='rdf')

In [None]:
# Describe an event as visual RDF graph
describe(events(day, place), format='graph')