Servers with many collections #76

cportele · 2018-03-08T16:41:37Z

Discussed during the WFS 3.0 Hackathon:

The Core right now is designed for smaller amounts of collections. Issues with a large number of collections:

The feature collections metadata response (/collections) may become large. This could be addressed by supporting paging and filtering in this resource. This could be done in an extension.
The API definition can become very large, too, and computation intensive to compile, if each collection is listed as a separate operation. This could be addressed by using a parameter for the collection name, i.e. having only generic, parameterised /collections/{name} and /collections/{name}/items/{id} operations. Information about the available collections would then have to be obtained from the /collections operation. Another option could also be to support filtering of collections in the api operation.

The paths used above are based on #64.

The text was updated successfully, but these errors were encountered:

thorsten-reitz · 2018-03-09T07:11:18Z

Did you discuss an optional package construct for servers with many collections? In this way, collections could be organised in a hierarchical structure.

cportele · 2018-03-12T14:20:10Z

Discussion in web-meeting 2018-03-12: Not a pressing problem, do not address in the Core, but address as needed in an extension. Chuck will look into the issue and what can be described in the Guide.

lieberjosh · 2018-03-12T14:28:02Z

Sorry I was unable to attend the Hackathon, but it might be helpful to consider WFS 3.0 as various utility operations around a set of links to features. More complex or functional feature relationships may be better expressed in linked data entities and associated API’s such as SPARQL endpoints which then use WFS links to point at appropriate feature data including geometry. —Josh

…

On Mar 8, 2018, at 11:41 AM, Clemens Portele ***@***.***> wrote: Discussed during the WFS 3.0 Hackathon: The Core right now is designed for smaller amounts of collections. Issues with a large number of collections: The feature collections metadata response (/collections) may become large. This could be addressed by supporting paging and filtering in this resource. This could be done in an extension. The API definition can become very large, too, and computation intensive to compile, if each collection is listed as a separate operation. This could be addressed by using a parameter for the collection name, i.e. having only generic, parameterised /collections/{name} and /collections/{name}/items/{id} operations. Information about the available collections would then have to be obtained from the /collections operation. Another option could also be to support filtering of collections in the api operation. The paths used above are based on #64 <#64>. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#76>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AExWhi_oVUsa6mLxO04OzG1skZ2ks21lks5tcV9CgaJpZM4Si-rA>.

cmheazel · 2018-03-20T08:56:45Z

OpenAPI allows you to define a path as a template. The variables in the template can be defined as coming from an enumerated list. The URL template and enumerated lists are defined by the Server Object. See issue #90

It is also possible to include a parameter in a path. The name of a Paths Object can be a URL template. Variables in that template are described using Parameter Objects where the "in" property = "path". Parameters can also be used in the query, header, or cookie. There are lots of options.

Lots of options means that we can create a big mess very quickly. In Issue 90 I propose a delimiter based approach for path templates. The intent is to provide flexibility while retaining semantics. We should be able to use this approach to address the multiple collections issue.

Take a look

cmheazel · 2018-04-02T22:21:25Z

Another perspective - OpenAPI describes an interface, not a server (or service). A single OpenAPI document can describe the offerings of dozens of servers. So it is reasonable to have multiple /collections paths as long as each one is rooted on a different URL. As I read the current draft, this would be an implementation decision.

jerstlouis · 2018-08-21T17:34:57Z

How about defining a hierarchy?

e.g. NaturalEarth/Cultural/ne_10m_admin_0_countries/

Filtering makes sense as an extension, but I feel that basic 'browsing' of a tree-like structure is a critically important piece of functionality.

cmheazel · 2018-09-21T15:06:17Z

@jerstlouis Are we looking for a way to include qualified names in a URI? For example, given the path /collections/{name} we could allow {name} to be "NaturalEarth:Cultural" or "NaturalEarth:Physical", etc. If so, then we just need to identify a delimiter for the qualified names which is legal to use in a URI template.

jerstlouis · 2018-09-21T15:21:16Z

Well I would certainly prefer the colon to the underscore, because the colon is a forbidden character in file names on Windows platforms and less likely to be used in a folder or layer name.

However, the main aspect of it though is to allow to list the different hierarchy levels without the entire contents of all sub-directories, when considering the cases of millions of layers (serving en entire mapping agency's data sets and cascading services from a single end-point). And that doesn't solve that.

The nice thing about that combined with filtering too is that your service can also act as your catalog without needing a specialized service for that.

I argue that the list of collections (or layers) an end-point serves is not a description of the interface (or capabilities) and belongs separately. If you connect to the SEDAC WMTS service for example ( http://sedac.ciesin.columbia.edu/geoserver/gwc/service/wmts?service=wmts&request=GetCapabilities ), you get a 5.6 MB XML file which I find a ridiculous amount of data to do an initial service handshake, and then all your layers show up in your client in a very long list where you cannot find what you're looking for. Listing data layers served should be a separate operation, and that should support simple hierarchies as well as optional filtering capabilities (per geospatial or temporal extent, scale/resolution, data type, keywords, meta data fields, etc.).

jerstlouis · 2018-09-21T16:13:16Z

Quoting you earlier @cmheazel , this is kind of my whole point:

Another perspective - OpenAPI describes an interface, not a server (or service). A single OpenAPI document can describe the offerings of dozens of servers.

Shouldn't the same OpenAPI description apply to ANY service?

Is it possible to leave the actual collections listing outside?
And does/could OpenAPI support resource paths with variable depths?

Just found these OpenAPI issues which discuss this:

OAI/OpenAPI-Specification#892
OAI/OpenAPI-Specification#1459

They are proposing this:

If a “+” suffix modifier is present, e.g. "/items/{itemId+}", the path parameter can match zero or more URL path segments. We call these “multisegment” path parameters.

This would work perfectly.

jerstlouis · 2018-09-21T16:51:35Z

So it seems that OpenAPI allows you to list possible values by doing /api/{collectionID} for example.
Shouldn't that be how it's done? Rather than stuffing all the layers inside /api which is the initial handshake?

And together with the multisegment path parameters it would support the use case.

OpenAPI doesn't currently support enumerating possible values for a parameter based on other parameters earlier in the path. In my opinion this is a major limitation and I filed an issue:

OAI/OpenAPI-Specification#1693

This would allow to list the valid zoom levels for a given tiling scheme, the valid tiling schemes for a given collection etc.

cmheazel · 2018-09-25T01:10:19Z

I prefer to distinguish between services and APIs. A service is an implementation of the SOA pattern where processing is performed through service-specific operations. An API is an implementation of the Resource Oriented pattern where resources are accessed using HTTP verbs and paths. Can't say that everyone buys into this but I helps me to keep things straight.

cmheazel · 2018-09-25T01:15:30Z

@jerstlouis Parameter dependencies, an interesting concept.
Would support for qualified names help? A namespace coupled with a value? Based on the discussions you listed above, I think this would be acceptable (even legal under version 3.0.1) if we choose the correct delimiter.

cmheazel · 2018-09-25T01:25:53Z

@jerstlouis Another option could be to switch to the HATEOAS pattern at some point. We have added support for alternative schema to the response media type schema. This frees you from the requirement that a response is specified in JSON schema. OpenAPI would then take you to the top-level metadata definition, which provides links to the next level, and so on. Similar to the WFS 3 approach for Collections and Collection. (just brain-storming here).

cmheazel · 2018-09-25T01:33:54Z

@jerstlouis Now let's think about /api/{collectionId}. What you are asking for is separate OpenAPI documents based on the collection id. That's perfectly legal under OpenAPI. However, I would be worried about URI confusion. Across all of the multiple OpenAPI documents, is it possible for one extracted URL to point to two (or more) different resources?

pvretano · 2018-09-25T01:54:05Z

@cmheazel F.Y.I. WFS 2.5 took the HATEOAS approach. At every level there were hypermedia controls that would take you to the next resource(s). I prefer this approach.

akuckartz · 2018-09-25T04:36:40Z

Another option could be to switch to the HATEOAS pattern at some point.

👍 One reason for #167

cportele · 2018-09-25T08:29:53Z

A few thoughts:

a. This discussion is starting to look like duplicates to #64 and #90. Maybe someone should make a concrete proposal for an extension with an approach that both works in OpenAPI and the HATEOAS pattern plus that would continue to support the current path pattern for the simpler cases, i.e. the Core.

b. As discussed in #64 I think there is value in having consistent patterns in the URIs (in addition to hypermedia controls in the responses), at least in the Core.

c. The discussion does not consider an important resource, the dataset. If we do not take this into account, we are making a mistake. In schema.org/DCAT (key taxonomies for publishing data on the Web) datasets are important resources and we need to represent this in our resource architecture. Only this will get our datasets properly indexed by search engines, etc.

Which is why the Core discusses datasets and distributions in a way that is consistent with schema.org/DCAT. At least for the Core the rule is that the part of an API that conforms to the spec (and has paths .../api, .../collections etc.) is for one dataset. I.e., that part of the API represents a distribution of the dataset.

So, if you have multiple datasets that should be published via a single API, the approach consistent with the Core would be something like .../{datasetId}/api, not .../api/{datasetId}. Same with .../{datasetId}/collections/{collectionId}/....

Any proposal for hierarchical collections should specify clearly how datasets and distributions are represented as resources in the proposal.

jerstlouis · 2018-09-25T10:01:23Z

@cportele my mention of /api/{collectionID} was referring to the OpenAPI functionality of enumerating the possible values for {collectionID}. With this, the /api itself could potentially be the same for different services serving different datasets.

I am in fundamental disagreement with the idea that a service end-point should represent a single dataset. I think of the service as directly mapping to an organization's SDI's server, serving all datasets available within it. This makes it possible to use the end-point to implement catalog queries and the likes. I have single piece of software serving all these data sets, why would I want more than one end-point? It doesn't make any sense to me to have multiple /api for this.

My proposal for hierarchical collections would depend on support for multi-segment paths that OpenAPI currently does not support ( /collections/{collectionID+} ). At some level within your multi-segment collectionID, you would have a 'dataset', where the meta data would reside. Some datasets already have a hierarchical structures and the current 'the whole end-point is a single data set' only accommodates a single level of 'collections' within that data set.

If we really wanted to make clear the dataset resource distinction I guess it would have to be /{dataSetId+}/collections/{collectionID+} to support multi-segment path in both the datasets as well as within the collections (for single data sets that have a more hierarchical structure). And then the /api could not be at the same level as collections without making it dataset specific...

cportele · 2018-09-25T10:33:42Z

@jerstlouis - I think you are mixing things. There is no question that it should be possible to use a single piece of software for serving multiple datasets (ours supports this, too), at the same time it should also be possible to use a microservices architecture. There are multiple ways how to extend the Core to allow that.

A boundary condition is that whatever we specify should be consistent with the Data on the Web Best Practices and identifying dataset and distribution resources plus providing metadata for them is an important part of it.

Whether it makes sense to support OpenAPI definitions for each dataset or not (i.e., whether to support modular APIs) is a separate discussion and I am not sure, if there is one answer for all cases. It could be an option to make the /api in the Core optional and the landing page for the dataset would simply be required to point to the OpenAPI definition for the whole API that includes the paths for that distribution (or .../api could redirect, or be an alternate convenience URI for the canonical URI of the OpenAPI definition for the whole API).

By the way, in the discussion that lead to the current path structure, we also discussed that it should be possible (for an API) to publish API definitions for each collection separately (see the /collections/buildings/api resource in the whiteboard image in #64).

cportele added the Future work support in an additional part of OGC API Features label Mar 12, 2018

jampukka mentioned this issue Aug 22, 2018

Conflict between CRS extension and Core #151

Closed

cportele added the OGC API: Common Issue related to general resources or requirements (see #190) label Mar 5, 2019

cportele mentioned this issue Mar 20, 2019

Collection of Collections opengeospatial/ogcapi-common#11

Open

cportele mentioned this issue Nov 2, 2019

Search for collections missing? #281

Closed

cportele mentioned this issue Jan 20, 2020

Pagination (limit parameter) for collections? opengeospatial/ogcapi-common#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Servers with many collections #76

Servers with many collections #76

cportele commented Mar 8, 2018

thorsten-reitz commented Mar 9, 2018

cportele commented Mar 12, 2018

lieberjosh commented Mar 12, 2018 via email

cmheazel commented Mar 20, 2018 •

edited

Loading

cmheazel commented Apr 2, 2018

jerstlouis commented Aug 21, 2018

cmheazel commented Sep 21, 2018

jerstlouis commented Sep 21, 2018

jerstlouis commented Sep 21, 2018 •

edited

Loading

jerstlouis commented Sep 21, 2018 •

edited

Loading

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

pvretano commented Sep 25, 2018

akuckartz commented Sep 25, 2018

cportele commented Sep 25, 2018

jerstlouis commented Sep 25, 2018

cportele commented Sep 25, 2018

Servers with many collections #76

Servers with many collections #76

Comments

cportele commented Mar 8, 2018

thorsten-reitz commented Mar 9, 2018

cportele commented Mar 12, 2018

lieberjosh commented Mar 12, 2018 via email

cmheazel commented Mar 20, 2018 • edited Loading

cmheazel commented Apr 2, 2018

jerstlouis commented Aug 21, 2018

cmheazel commented Sep 21, 2018

jerstlouis commented Sep 21, 2018

jerstlouis commented Sep 21, 2018 • edited Loading

jerstlouis commented Sep 21, 2018 • edited Loading

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

cmheazel commented Sep 25, 2018

pvretano commented Sep 25, 2018

akuckartz commented Sep 25, 2018

cportele commented Sep 25, 2018

jerstlouis commented Sep 25, 2018

cportele commented Sep 25, 2018

cmheazel commented Mar 20, 2018 •

edited

Loading

jerstlouis commented Sep 21, 2018 •

edited

Loading

jerstlouis commented Sep 21, 2018 •

edited

Loading