Skip to content

Timeseries Aggregated Data

GabrielK-Eqnr edited this page Jul 18, 2024 · 35 revisions

The Omnia Timeseries API can be used to retrieve aggregated timeseries data points.

Notes to consider before using aggregate requests:

  • use with care and common sense, aim to build requests which do not have a long response time (specially in service applications). To protect the API we will disable access if we see improper usage.
  • limit your potential result count to 10.000 buckets per request (eg: it is bad practice to request 1 year worth of data with 1 minute interval).
  • it is not advisable to implement permanent services which rely heavily on aggregate requests targeting data older than 3 years. Data older than 3 years is moved to the data lake and requests targeting this data are slow and heavy on the API
  • try to build your requests so that they hit our cached aggregates functionality

To read aggregates the following key parameters have to be specified:

  • id: The unique id of the timeseries you want to query for aggregated data.
  • startTime: Beginning of the period to calculate aggregated data.
  • endTime: End of the period to calculate aggregated data.
  • aggregateFunction: Type of aggregation function to be executed. Available values [avg, min, max, stddev, count] - scroll down for details.
  • processingInterval (optional): Timespan for which aggregated data are produced based on the specified aggregate function (i.e. interval between returned aggregated values). For example, performing a 10-minute average over the period 12:00 to 12:30 would result in a set of three intervals of processingInterval length, with each interval having a start time of 12:00, 12:10 and 12:20 respectively. All aggregates return a timestamp of the start of the interval. If not given, the aggregation are performed across the entire time period (startTime and endTime). Valid values are 1s, 5m, 10h, 20d etc. The interval will start at default time boundaries regardless of the specified startTime. E.g. daily aggregates will start at midnight, hourly aggregates will start at start of an hour etc.
  • fill (optional): Determines how to handle processing intervals with no data. Available values [null, none, forward, linear] - scroll down for details.

Aggregate functions

The API supports multiple simultaneous aggregate functions in one API request. The response body will then contain each aggregated value with the start of the processing interval as timestamp.

  • Average (AVG): The average aggregate adds up the values for each processing interval, and divides the sum by the number of values.
  • Median: The median aggregate uses T-Digest to obtain the estimated median of the given data set.
  • Minimum (MIN): The minimum aggregate retrieves the minimum value within the processing interval, and returns that value with the timestamp at the start of the processing interval.
  • Maximum (MAX): The maximum aggregate retrieves the maximum value within the processing interval, and returns that value with the timestamp at the start of the processing interval.
  • Standard Deviation (STDDEV): Defined by the formula below. X is each raw value in the processing interval, Avg(X) is the average of the raw values, and n is the number of raw values in the processing interval.

Screenshot

  • Count: The count aggregate retrieves a count of all the raw values within a processing interval.
{{api}}/:timeseriesId/data/aggregates?aggregateFunction=avg&aggregateFunction=count&startTime=2024-05-04T04:00:00Z&limit=10000&useAggregateCache=true&processingInterval=30m&endTime=2024-05-04T10:00:00Z

Fill
By default, a processing interval with no data will report null as its value in the response. The fill parameter can be used to change the returned value for processing intervals with no data.

  • linear: Returns the results of linear interpolation for processing intervals with no data.
  • none: Returns no timestamp and no value for processing intervals with no data.
  • null: Returns null for processing intervals with no data, but returns a timestamp (default behavior)
  • forward: Returns the value from the previous processing interval for intervals with no data.

Status
When requesting aggregated data you can apply a status filter. Filtering on e.g. status = 192 means that only values with that status will be included in the aggregation. Filtering on status have an effect on performance, so expect that using this filter imposes longer response time from API.

Cached Aggregates

Aggregate requests are demanding on the API and can be perceived as slow depending on the request. Cached Aggregates has been implemented to achieve faster response times and better resource conservation when calling the Get aggregated data and Get multiple data endpoints.

The cache will activate when:

  • processingInterval is set to match one of the following: 5m, 15m, 1h, 2h, 4h and 8h (processingInterval=1h)
  • status filter is set to 192 (status=192)
  • fill is set to none (fill=none)
  • startTime and endTime is rounded to the queried interval (5m: 05:00:00, 05:05:00 ... ; 2h: 00:00:00 02:00:00 ... ; 8h: 00:00:00 08:00:00)
  • when the aggregate function is not median, which is not supported by the cache

To increase the chances of hitting the cache the 'useAggregateCache' parameter can be set to true which will modify the original request by:

  • adjusting processingInterval: round down the interval to a time unit which is supported by the cache
  • adjusting startTime: round down to match the interval
  • adjusting endTime: round up to match the interval

Cache mechanics:

A request which meets the cache criteria but is not cached will trigger a job which will prepare the cache while the request itself will return the result from the federation source. Once the caching job is complete all requests matching the initial one will return cached values, regardless of the aggregateFunction used as caching is done for all functions.

A request can also trigger live cache aggregation for a tag when the requested time window is within the past 7 days which will have the API continously update and maintain the cache for the tag with new incoming values.

To disable the cache for a request you must set the 'useAggregateCache' parameter to 'false'.

There is also the possibility to verify if a response has been retrieved from the cache or not by adding 'debug' as a query parameter and setting it's value to true. Doing a request with debug enabled will add a 'cached:true' field to each of the items which were fetched from the cache.

Usage example:

Running this request will return the uncached result and the cache service will initiate populating the cache.

{{api}}/:timeseriesId/data/aggregates?aggregateFunction=avg&startTime=2024-05-04T04:00:00Z&limit=10000&useAggregateCache=true&processingInterval=1h&endTime=2024-05-04T10:00:00Z&debug=true&status=192&fill=none

Once the caching service completes populating the cache, all subsequent requests with the same parameters or smaller time windows from the original request will retrieve values from the cache.