Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summaries updates - to 'strongly recommended' and with a best practice #985

Merged
merged 9 commits into from Feb 23, 2021
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- URIs (usually found int properties like `href`, `url`) are now validated using the `iri-reference` format in JSON Schema (allows international characters in URIs)
- Relaxed the regular expression for DOIs in the scientific extension ([#910](https://github.com/radiantearth/stac-spec/issues/910))
- The [Stats Object](collection-spec/collection-spec.md#stats-object) for Collection `summaries` changed `min` to `minimum` and `max` to `maximum` to align with JSON Schema.
- Made `summaries` to be *strongly recommended* - everyone should strive to implement them, as they are very useful.
- `proj:geometry` allows all GeoJSON geometries instead of just a polygon.
- `label:description` and `processing:lineage` allow CommonMark for rich-text representation ([#950](https://github.com/radiantearth/stac-spec/issues/950))

Expand Down
46 changes: 46 additions & 0 deletions best-practices.md
Expand Up @@ -12,6 +12,7 @@
* [Working with Media Types](#working-with-media-types)
* [Static and Dynamic Catalogs](#static-and-dynamic-catalogs)
* [Catalog Layout](#catalog-layout)
* [Using Summaries in Collections](#using-summaries-in-collections)
* [Use of Links](#use-of-links)
* [Using Relation Types](#using-relation-types)
* [Versioning for Catalogs](#versioning-for-catalogs)
Expand Down Expand Up @@ -297,6 +298,51 @@ is strongly recommended that Catalogs don't contain differently versioned Items
consistent (Sub-)Catalogs containing either all or no data. Collections that are referenced from Items should always use the same
STAC version. Otherwise some behaviour of functionality may be unpredictable (e.g. merging common fields into Items or reading summaries).

## Using Summaries in Collections

One of the strongest recommendations for STAC is to always provide [summaries](collection-spec/collection-spec.md#summaries) in
your collections. The core team decided to not require them, in case there are future situations where providing a summary
is too difficult. The idea behind them is not to exhaustively summarize every single field in the collection, but to provide
a bit of a 'curated' view.

Some general thinking on what to summarize is as follows:

* Any field that is a range of data (like numbers or dates) is a great candidate to summarize, to give people a sense what values
the data might be. For example in overhead imagery, a
[`view:off_nadir`](extensions/view/README.md#item-properties-and-item-asset-fields) with a range of 0 to 3 would tell people this
imagery is all pretty much straight down, while a value of 15 to 40 would tell them that it's oblique imagery, or 0 to 60 that it's
a collection with lots of different look angles.

* Fields that have only one or a handful of values are also great to summarize. Collections with a single satellite may
use a single [`gsd`](item-spec/common-metadata.md#instrument) field in the summary, and it's quite useful for users to know
that all data is going to be the same resolution. Similarly it's useful to know the names of all the
[`platform` values](item-spec/common-metadata.md#instrument) that are used in the collection.

* It is less useful to summarize fields that have numerous different discrete values that can't easily be represented
in a range. These will mostly be string values, when there aren't just a handful of options. For example if you had a
'location' field that gave 3 levels of administrative region (like 'San Francisco, California, United States') to help people
understand more intuitively where a shot was taken. If your collection has millions of items, or even hundreds, you don't want
to include all those values as a summary.

* Fields that consist of arrays are more of a judgement call. For example [`instruments`](item-spec/common-metadata.md#instrument)
m-mohr marked this conversation as resolved.
Show resolved Hide resolved
is straightforward and recommended, as the elements of the array are a discrete set of options. On the other hand
[`proj:transform`](extensions/projection/README.md#projtransform) makes no sense to summarize, as the union of all the values
in the array are meaningless, as each Item is describing its transform, so combining them would just be a bunch of random numbers.
So if the values contained in the array are independently meaningful (not interconnected) and there aren't hundreds of potential
values then it is likely a good candidate to summarize.

We do highly recommend including an [`eo:bands`](extensions/eo/README.md#eobands) summary if your Items implement `eo:bands`,
m-mohr marked this conversation as resolved.
Show resolved Hide resolved
especially if it represents just one satellite or constellation. This should be a union of all the potential bands that you
have in assets. It is ok to only add the summary at the Collection level without putting an explicit `eo:bands` summary at the
`properties` level of an Item, since that is optional. This gives users of the Collection a sense of the sensor capabilities without
having to examine specific items or aggregate across every item.

Note that the ranges of summaries don't have to be exact. If you are publishing a catalog that is constantly updating with
data from a high agility satellite you can put the `view:off_nadir` range to be the expected values, based on the satellite
design, instead of having it only represent the off nadir angles that are Items for assets already captured in the catalog.
We don't want growing catalogs to have to constantly check and recalculate their summaries whenever new data comes in - its
just meant to give users a sense of what types of values they could expect.

## Use of links

The STAC specifications allow both relative and absolute links, and says that `self` links are not required, but are
Expand Down
16 changes: 11 additions & 5 deletions collection-spec/collection-spec.md
Expand Up @@ -26,7 +26,7 @@ STAC Collections are meant to be compatible with *OGC API - Features* Collection
| license | string | **REQUIRED.** Collection's license(s), either a SPDX [License identifier](https://spdx.org/licenses/), `various` if multiple licenses apply or `proprietary` for all other cases. |
| providers | \[[Provider Object](#provider-object)] | A list of providers, which may include all organizations capturing or processing the data or the hosting provider. Providers should be listed in chronological order with the most recent provider being the last element of the list. |
| extent | [Extent Object](#extent-object) | **REQUIRED.** Spatial and temporal extents. |
| summaries | Map<string, \[*]\|[Stats Object](#stats-object)> | A map of property summaries, either a set of values or statistics such as a range. |
| summaries | Map<string, \[*]\|[Stats Object](#stats-object)> | STRONGLY RECOMMENDED. A map of property summaries, either a set of values or statistics such as a range. |
| links | \[[Link Object](#link-object)] | **REQUIRED.** A list of references to other documents. |

### Additional Field Information
Expand All @@ -42,16 +42,22 @@ Collection's license(s) as a SPDX [License identifier](https://spdx.org/licenses

#### summaries

Provides an overview of the potential values that are available as part of the `properties` in the set STAC Items that are underneath this catalog (including
those in any sub-catalog). Summaries are used to inform users about values they can expect from items without having to crawl through them. It also helps to
fully define collections, especially if they don't link to any Items.
Collections are are *strongly recommended* to provide summaries of the values of fields that they can expect from the `properties`
of STAC Items contained in this collection. This enables users to get a good sense of what the ranges and potential values of
different fields in the collection are, without to inspect a number of items (or crawl them exhaustively to get a definitive answer).
Summaries help to fully define collections, especially if they don't link to any Items. They also give clients enough information to
build tailored user interfaces for querying the data, by presenting the potential values that are available. Summaries can be used in
collections or catalogs, and they should summarize all values in every item underneath it, including in nested sub-catalogs.

A summary for a field can be specified in two ways:

1. A set of all distinct values in an array: The set of values must contain at least one element and it is strongly recommended to list all values. If the field summarizes an array (e.g. `instruments`), the field's array elements of each Item must be merged to a single array with unique elements.
1. A set of all distinct values in an array: The set of values must contain at least one element and it is strongly recommended to list all values. If the field summarizes an array (e.g. [`instruments`](../item-spec/common-metadata.md#instrument)), the field's array elements of each Item must be merged to a single array with unique elements.
2. Statistics in a [Stats Object](#stats-object): Statistics by default only specify the range (minimum and maximum values), but can optionally be accompanied by additional statistical values. The range specified by the minimum and maximum can specify the potential range of values, but it is recommended to be as precise as possible.

It is recommended to list as many properties as reasonable so that consumers get a full overview about the properties included in the Items. Nevertheless, it is not very useful to list all potential `title` values of the Items. Also, a range for the `datetime` property may be better suited to be included in the STAC Collection's `extent` field. In general, properties that are covered by the Collection specification should not be repeated in the summaries.

See the examples folder for collections with summaries to get a sense of how to use them.

### Extent Object

The object describes the spatio-temporal extents of the Collection. Both spatial and temporal extents are required to be specified.
Expand Down