# Query Notebook

for the MAMI Path Transparency Measurement Summer School

### Configuration and Environment Setup

Set up out environment to point to the correct instance of the PTO, and set up our API token and the observation set ID for the normalized data we uploaded, as well as for the analyzed data combining observations from all students:

In [None]:
baseurl = "https://summer.pto.mami-project.eu"
token = None
my_obset_id = "1"

Now import some things we'll need to interact with the PTO:

In [None]:
# PTO client
from ptoclient import *

# Pandas
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# pretty printing
from pprint import pprint

And some utility functions we'll use to work with the dataframes we retrieve from the PTO:

In [None]:
def pivot_condition_time(df, prefix, aspect, states):
    """
    Given a dataframe counts of conditions (states of one or more aspects) 
    over a time series, pivot to a time-indexed table with a column for 
    each state.
    
    """
    
    aspect_df = df[df['condition'].map(lambda x: x.startswith(".".join((prefix,aspect))))]
    
    total_df = aspect_df.groupby('time').sum().loc[:,["count"]]
    total_df.columns = ['total']
    total_df['time'] = total_df.index
    
    aspect_df = aspect_df.join(total_df, on="time", rsuffix="_")
    del(aspect_df['time_'])
    
    pivot_df = aspect_df.groupby('time').first().loc[:,['total']]
    for c in states:
        cseries = aspect_df[aspect_df['condition'] == ".".join((prefix,aspect,c))]
        cseries.index = cseries['time']
        cseries = cseries.loc[:,['count']]
        cseries.columns = [c]
        pivot_df = pd.concat([pivot_df, cseries], axis=1)
    
    return pivot_df.fillna(0)

### Create a client to access the PTO

The client object encapsulates a base URL and an API token, and can be used to access queries and observations on the given instance of the PTO.

In [None]:
c = PTOClient(baseurl, token)

### Access my observation set

Let's check metadata for the observation set we uploaded in the earlier part of the course:

In [None]:
s = c.retrieve_set(setid=my_obset_id)
pprint(s.metadata())

Now run a query to look at the ratio of ECN negotiation success in our observation set:

In [None]:
q_mine = c.submit_query(PTOQuerySpec().time("2016-11-01", "2017-01-01")
                                      .condition("ecn.negotiation.*")
                                      .group_by_condition())

Retrieve metadata, wait for state to be complete:

In [None]:
pprint(q_mine.metadata(reload=True))

Have a look at the query results, then calculate the ratio of targets where ECN negotiation succeeded:

In [None]:
r = q_mine.results()
r

In [None]:
r.index = r.group
r.loc['ecn.negotiation.succeeded']['count'] / r['count'].sum()

### Examine combined results

Let's run another query, but on all multipoint conditions -- these look at the result of analysis of each target combining observations from each student:

In [None]:
q_multi = c.submit_query(PTOQuerySpec().time("2016-11-01", "2017-01-01")
                                       .condition("ecn.multipoint.*")
                                       .group_by_condition())

In [None]:
pprint(q_multi.metadata(reload=True))

In [None]:
r = q_multi.results()
r

In [None]:
q_pathdep_obs = c.submit_query(PTOQuerySpec().time("2016-11-01", "2017-01-01")
                                             .condition("ecn.multipoint.connectivity.path_dependent"))

In [None]:
pprint(q_pathdep_obs.metadata(reload=True))