New Directions Strike Back (New Directions 2)

This document summarises the discussion held between (in alphabetical order)

Mirja Kühlewind,
Roman Müntener,
Stephan Neuhaus, and
Brian Trammell

on 2017-01-12, 13:00--14:00 via Skype.

Changes to Database Schema

Observations get a "observation set" number. This number is set by the abnalyser to group related observations. What constitutes a relation between observations is entirely up to the anlayser.

Observation set numbers are used for observatory provenance. If an analyser uses observation setx x and y to produce observation set z, this fact is reflected in observation metadata, since z's metadata would now reflect that it came form x and y. (The easiest way would be with a table that records for each observation set either the name(s) of the HDFS file(s) that were used, and/or the observation set numbers of the immediate antecedents of this observation set.

Observation sets also get two timestamp: a timestamp when the set was created, and a timestamp (possibly null) when the observation set became invalid.

If it turns out that an analyser has done the wrong thing, the observation sets that it created can be marked invalid. The temptation is to invalidate the observation set a millisecond after it came into existence, but this temptation must be resisted, and only the current time at the time of detection must be used. The point of the invalidation timestamp is to enable analysis of observations as if it happened in the past, and backdating an observation set's invalidity would, well, invalidate that approach.

Invalidation of an observation set cascades to those observation sets that depend on invalidated observation sets.

Path elements need extra space for AS number and country code. The AS number, having a real existence in networks, is the more interesting piece of information.

The denormalised database design as it is now will be kept, unless and until it can be demonstrated that there is a compelling reason not to denormalise.

Condition Taxonomy

A condition becomes a foreign key into a table.

Conditions are stored as trees.

Only sibling conditions are related: those are leaves and have the same parent. Cousins and nephews/nieces are not related and will not be rendered in a chart together. Specifically, ecn.connectivity.* does not include ecn.connectivity.super.works.

There will be catalogue queries of the form "What conditions were generated by analysers on observation set x for paths y during time period z".

When queries are generated by users, there is considerable chance that these users will generate nonsensical queries if they are not constrained somehow. Therefore, queries will always implicitly group by observation set. For example, a query like "What is the average of ecn.connectivity.works and ecn.connectivity.broken?" will only ever compute these averages per observation ID and not over all observation IDs that would otherwise apply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New-Directions-2.md

New-Directions-2.md

New Directions Strike Back (New Directions 2)

Changes to Database Schema

Condition Taxonomy

Files

New-Directions-2.md

Latest commit

History

New-Directions-2.md

File metadata and controls

New Directions Strike Back (New Directions 2)

Changes to Database Schema

Condition Taxonomy