Derived fields and aggregation support #164

jerstlouis · 2022-04-13T16:21:38Z

Suggesting that we plan for a separate part enabling basic analytics capabilities, including conformance classes for:

derived fields supporting arithmetic (e.g. NDVI computation), properties=
filtering (Retrieve values within a certain range #103) (e.g. only retaining cells with elevation values above certain threshold), filter=
sorting (e.g. allowing to flatten multiple scenes into a 2D image with least cloudy cells retained), sortby=
standardized pre-defined aggregation functions, e.g. Max(), Min(), Avg(), StdDev(), Sum()... used within properties=, filter=, sortby= expressions. The dimensions over which data is aggregated could also leverage subset, bbox, datetime, but a distinction mechanism would still be needed to know whether a series should be returned for a particular dimension, or aggregation should be performed.
operating over multiple collections (allowing to perform the above capabilities combining fields from those multiple collections), collections=

This would be informed by the work from DAPA and Testbed-17 GeoDataCube API, and ideally be consistent with the OGC API - Features Search extension as well as with OGC API - DGGS and OGC API - EDR.
We plan to explore this in the upcoming May 2022 Code Sprint.

Example proposed syntax:
properties=NDVI:Max((B5-B4)/(B5+B4))&subset=("2020-07-01":"2020-07-31")

The text was updated successfully, but these errors were encountered:

jerstlouis · 2022-04-28T02:46:12Z

We should consider use cases where we want to aggregate / return results differently for different dimensions, for example:

A) Return a 0D value including derived "minimum NDVI" and "maximum NDVI" values aggregated locally spatially over the time dimension at a single point in space, but averaged over the spatial dimensions.
B) Support aggregating to a time series at a coarser resolution but not to a single value over a dimension, e.g. computing monthly minimum, maximum or average for each months of a year.

How could that look like syntactically? Possibly an additional parameter to the aggregation function to select dimensions on which to aggregate? e.g. time, space, spacetime, [latitude, longitude, datetime].

A) Aggregate minimum of spatially local values over time, then aggregate average over space (a single cell is returned with a minimum and a maximum value)

properties=
   minNDVI:Avg(
      Min((B5-B4)/(B5+B4), time),
   space),
   maxNDVI:Avg(
      Max((B5-B4)/(B5+B4), time),
   space)
&subset=datetime("2020-01-01":"2021-12-31"),Lat(45.0:45.1),Lon(-75.1:-75.0)

With an additional option to specify aggregating to a coarser resolution, as opposed to a single value? e.g., time:month, Lat:0.005

B) Aggregate minimum of spatially local values over time for each given month, then aggregate sum over space. The result would be a 1D time series with 12 cells (data records / features), each with a single value in this case (the sum for each of the monthly minimums and maximums, over all subsetted space).

properties=
   minNDVI:Sum(
      Min((B5-B4)/(B5+B4), time:month),
   space),
   maxNDVI:Sum(
      Max((B5-B4)/(B5+B4), time:month),
   space),
&subset=datetime("2020-01-01":"2020-12-31"),Lat(45.0:45.1),Lon(-75.1:-75.0)

A special month resolution is proposed in the example here to accommodate common usage uneven temporal units. A number corresponding to units (e.g., in seconds or meters or degrees) could also be used to qualify the dimension over which aggregation is performed.

DAPA had some similar ideas for its aggregate query parameter, but more so for the different aggregation processes (area:aggregate-space, area:aggregate-space-time, area:aggregate-time, grid:aggregate-time, position:aggregate-time).

To compare aggregating a gridded coverage with the Features search extension, cells are akin to the features in that their set of properties have given values. Aggregation is essentially creating a new collection of cells (equivalent to a new feature collection) with different dimensionality and/or resolution across some dimension(s).

Note that if aggregation is simply functions used in derived fields properties, then the resulting dimensionality may differ if returned properties use different kinds of aggregation -- that could mean fields that are not aggregated over some dimensions or resolution would get duplicated.

Another use case for sortby might be to more explicitly specify the behavior associated with the subset slice sparse data behavior discussed in #105, if e.g., the time dimension is included as a sortable. That could be combined with other sortable keys, including derived fields using aggregation e.g., Avg() over space (but not time) to sort scenes as a whole without mixing them up.

sortby=
      -Avg((B5-B4)/(B5+B4), space),
      +time

ghobona · 2022-05-10T13:28:32Z

It would be great to have a list or tree like the one at https://github.com/cportele/ogcapi-building-blocks

This would help to visualise what the building blocks are.

jerstlouis · 2022-05-10T13:46:43Z

@ghobona I tried at the top of this issue to organize them in a bullet list.

Most of these building blocks are query parameters:

Analytics Query parameters:

properties
- For the simplest conformance class this is simply for "property selection" (proposed future part of Features) or "range subsetting" (current conformance class of Coverages)
- For more advanced analytics, it can support complex expressions for Derived Fields, as suggested in DAPA (instead of only identifiers -- those expressions can be very similar to CQL2 expressions, except they can return any type of value, not only a boolean)
  - Then pre-defined aggregation functions can be defined:
    - Same aggregation over all dimensions, or
    - With an extra parameter to specify over which dimensions a particular aggregating function should aggregate
filter
- Support a predicate, e.g. defined with CQL2 (can refer to any queryables: e.g., feature properties, coverage cell range values, scene metadata properties) -- as in Features - Part 3: Filtering
sortby
- In Coverages, together with returning a lower dimensionality than the result set, it can also control which pixels to keep (e.g., least cloudy scene or cells on top)
collections
- In the context of DGGS / Coverages, it would allow to use fields (feature properties / coverage data record range values) from multiple collections, including mixed vector/raster collections (much like FROM <tables> in SQL). The fields can then be prefixed by the {collectionId}. to disambiguate them.

Aggregating functions

Min()
Max()
Sum()
Avg()
StdDev()
... ?

Spatiotemporal Subsetting Query parameters:

subset
datetime
bbox

ghobona · 2022-05-16T12:41:22Z

Thanks @jerstlouis !

Cc: @doublebyte1

jerstlouis added 2022-05 Sprint Cross-SWG Discussion Extension Will be addressed by a future extension EDR-related For coordination with EDR labels Apr 13, 2022

This was referenced Apr 13, 2022

Coverage Processing Language #146

Open

Extension to specify simple styling/processing expressions #108

Open

rggibb mentioned this issue Apr 21, 2022

Workshoping the Drafting Process of the OGC API DGGS Spec opengeospatial/ogcapi-discrete-global-grid-systems#47

Open

This was referenced May 5, 2022

Hierarchical Collections opengeospatial/ogcapi-common#298

Open

What is everybody going to be working on at the May 2022 Space Partitions Code Sprint - The 16th OGC API Code Sprint? opengeospatial/developer-events#12

Closed

chris-little mentioned this issue Jul 6, 2023

Add support for basic data summary opengeospatial/ogcapi-environmental-data-retrieval#362

Open

This was referenced Aug 31, 2023

Subsetting by geometry #52

Open

Retrieve values within a certain range #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived fields and aggregation support #164

Derived fields and aggregation support #164

jerstlouis commented Apr 13, 2022 •

edited

jerstlouis commented Apr 28, 2022 •

edited

ghobona commented May 10, 2022

jerstlouis commented May 10, 2022 •

edited

ghobona commented May 16, 2022

Derived fields and aggregation support #164

Derived fields and aggregation support #164

Comments

jerstlouis commented Apr 13, 2022 • edited

jerstlouis commented Apr 28, 2022 • edited

ghobona commented May 10, 2022

jerstlouis commented May 10, 2022 • edited

ghobona commented May 16, 2022

jerstlouis commented Apr 13, 2022 •

edited

jerstlouis commented Apr 28, 2022 •

edited

jerstlouis commented May 10, 2022 •

edited