Query Language

Gabe Fierro edited this page Oct 7, 2016 · 11 revisions

Query Language

This query language is a subset of the Giles query language which is itself a superset of the sMAP query language. The motivation for the subset is removing the "editing" capabilities from the query language; the BOSSWAVE archiver simply views data and metadata.

Metadata Queries

Metadata queries take the form

query : SELECT <target> WHERE <where-clause>
target : <taglist>
       | <distinct>
taglist : tag
        | tag , taglist
distinct: DISTINCT tag

Returns

// PID 2.0.8.2
type QueryMetadataResult struct {
	Nonce uint32
	Data  []KeyValueMetadata
}
type KeyValueMetadata struct {
	UUID     string
	Path     string
	Metadata map[string]interface{}
}

tags are simply the names of the keys, e.g. "UnitofTime", "UnitofMeasure", "UUID", "Location", etc. Note that here, we no longer maintain the distinction between "Metadata" and "Properties" like was seen in sMAP.

Removing this distinction makes it easier to reason about how your key-value pairs will be represented by the archiver. For example, if a stream inherits metadata from a URI a/b/c/!meta/keyname, that metadata will be made available under the name keyname rather than Metadata/keyname.

Examples

Retrieving the list of all UUIDs:

select distinct uuid;

Retrieving the rooms and UUIDs of temperature sensors in Soda Hall:

select Room, uuid where Building="Soda Hall" and SensorType="Temperature"

Data Queries

The query language supports the three types of data queries available in BtrDB:

  • raw data access
  • statistical summaries of data
  • changed ranges queries

Raw Data Access

Raw data queries will return the exact contents of the reported timeseries streams that match the provided WHERE clause. The returned data takes the form of a list of messages containing a UUID and a list of <time, value> pairs, where time is a uint64 in the requested units of time and value is a float64.

query : SELECT DATA <target> <datalimit?> <streamlimit?> <time-conversion?> WHERE <where-clause>
target : IN ( start-time, end-time )
       | BEFORE start-time
       | AFTER start-time
datalimit : LIMIT number
streamlimit: STREAMLIMIT number
time-conversion : AS timeunit

Returns

// PID 2.0.8.4
type QueryTimeseriesResult struct {
	Nonce uint32
	Data  []Timeseries
	Stats []Statistics
}
type Timeseries struct {
	UUID       string
	Path       string
	Generation uint64
	Times      []uint64
	Values     []float64
}

Statistical Summaries

Statistical queries return the list of summaries for each of the streams matching the provided WHERE clause. The returned data takes the form of a list of messages containing a UUID and a list of <time, count, min, mean, max> pairs; time is a uint64 timestamp in the requested units of time respecting the start of the window, and the rest are all float64 values containing the number of points in the window as well as the minimum, mean and maximum values.

pointwidth is the length of the time intervals to summarize. These intervals have width 1 << pointwidth nanoseconds. The pointwidth parameter should be an integer in the range [0, 62]

width is the width of a window given as a time expression number``units, where number is any integer and units is any entry from the "Time references" table below, e.g. "5min", "3day", "100ms"

query : SELECT <stats-query> DATA <target> <datalimit?> <streamlimit?> <time-conversion?> WHERE <where-clause>
stats-query : STATISTICAL (<pointwidth>)
            | WINDOW (<width>)
pointwidth : number
width: duration
target : IN ( start-time, end-time )
       | BEFORE start-time
       | AFTER start-time
datalimit : LIMIT number
streamlimit: STREAMLIMIT number
time-conversion : AS timeunit

Returns

// PID 2.0.8.4
type QueryTimeseriesResult struct {
	Nonce uint32
	Data  []Timeseries
	Stats []Statistics
}
type Statistics struct {
	UUID       string
	Generation uint64
	Times      []uint64
	Count      []uint64
	Min        []float64
	Mean       []float64
	Max        []float64
}

Changed Range Query

query : SELECT CHANGED(<gen1>, <gen2>, <resolution>) DATA WHERE <where-clause>
gen1 : number
gen2 : number
resolution : number

Returns

// PID 2.0.8.8
type QueryChangedResult struct {
	Nonce   uint32
	Changed []ChangedRange
}
type ChangedRange struct {
	UUID       string
	Generation uint64
	StartTime  int64
	EndTime    int64
}

gen{1,2} are the generations of data as reported by BtrDB; these generations are returned with all data queries. To find the ranges of data that changed between generation X and generation Y (where X < Y), the arguments to "CHANGED" are CHANGED(X,Y, ). resolution is the minimum size of a returned range in 2<<resolution nanoseconds.

Parameters

{start,end}-time

Data can be retrieved for some time region using a range query (in) or relative to some point in time (before, after). These reference times must be a UNIX-style timestamp, the now keyword, or a quoted time string.

Time references use the following abbreviations:

Unit Abbreviation Unix support Conversion to Seconds
nanoseconds ns yes 1 second = 1e9 nanoseconds
microseconds us yes 1 second = 1e6 microseconds
milliseconds ms yes 1 second = 1000 milliseconds
seconds s yes 1 second = 1 second
minutes m no 1 minute = 60 seconds
hours h no 1 hour = 60 minutes
days d no 1 day = 24 hours

Time reference options:

  • Unix-style timestamp: Unix/POSIX/Epoch time is defined as the number of seconds since 00:00:00 1 January 1970, UTC (Coordinated Universal Time), not counting leap seconds. In Python, the current Unix time (in seconds) can be found with

    import time
    # Python actually returns the milliseconds as a decimal,
    # so we use int to coerce to seconds only
    print int(time.time())

    Giles includes support for Unix-style timestamps in units other than seconds. By suffixing timestamps with one of the unit abbreviations specified above (that have Unix support), we can introduce a finer resolution to our data queries. The following timestamps are all equivalent.

    • 1429655468s
    • 1429655468000ms
    • 1429655468000000us
    • 1429655468000000000ns

    Specifying a timestamp without units will default to seconds.

  • The now keyword: uses the current local time as perceived by the server. The now time can be adjusted using relative time references, described below.

  • Quoted time strings: Giles supports timestrings enclosed in double quotes that adhere to one of the following formats:

    • 1/2/2006

    • 1/2/2006 03:04:05 PM MST

    • 1/2/2006 15:04:05 MST

    • 1-2-2006 rather than 1/2/2006 is also supported

      These time strings follow the canonical Go reference time, which is defined to be

      Mon Jan 2 15:04:05 -0700 MST 2006
      
  • Relative time references: the above time references are absolute, meaning that they define a specific point in time. Using relative time references, these absolute times can be altered. The most common form of this is specifying offsets of now.

    Relative time references in Giles take the form of number``unit where number is a positive or negative integer and unit is one of the abbreviations defined in the table above (not limited to those marked with Unix support). Relative time references can be chained.

    For example, to specify 10 minutes before now, we could use now -10m. To specify 15 minutes and 30 seconds after midnight March 13th 2010, we could use "3/13/2010" +15m +30s

limit,streamlimit

limit controls the number of points returned per stream, and streamlimit controls the number of streams returned. For the before and after queries, limit will always be 1, so it only makes sense to use streamlimit in those cases. The exact syntax looks like

limit number streamlimit number

where number is some positive integer. Both the limit and streamlimit components are optional and can be specified independently, together or not at all.

time-conversion

The as component allows a query to specify what units of time it would like the data returned as. The default is milliseconds, but the user can specify others (ns, us, ms, s) as per the Unix-compatible notation in the Time Reference table below.

For a sample source, here's the same data point with 4 different units of time. Obviously the resolution is only as good as the underlying source. The archiver does not add additional time resolution, so if our source published in milliseconds, querying for data as micro- or nanoseconds would not return more detailed information. The sample source here reported in nanoseconds.

Where Clause

The where-clause describes how to filter the result set. There are several operators you can use: Tag values should be quoted strings, and tag names should not be quoted. Statements can be grouped using parenthesis.

Operator Description Usage Example
= Compare tag values. tagname = "tagval" Metadata/Location/Building = "Soda Hall"
like String matching. Use Perl-style regex tagname like "pattern" Metadata/Instrument/Manufacturer like "Dent.*"
has Filters streams that have the provided tag has tagname has Metadata/System
and Logical AND of two queries (on either side) where-clause and where-clause has Metadata/System and Properties/UnitofTime = "s"
or Logical OR of two queries
not Inverts a where clause not where-clause not Properties/UnitofMeasure = "volts"
matches Performs regex search over all documents (warning: this can be slow) matches "regex" matches ".*RTU.*"
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.