-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Servers with many collections #76
Comments
Did you discuss an optional package construct for servers with many collections? In this way, collections could be organised in a hierarchical structure. |
Discussion in web-meeting 2018-03-12: Not a pressing problem, do not address in the Core, but address as needed in an extension. Chuck will look into the issue and what can be described in the Guide. |
Sorry I was unable to attend the Hackathon, but it might be helpful to consider WFS 3.0 as various utility operations around a set of links to features. More complex or functional feature relationships may be better expressed in linked data entities and associated API’s such as SPARQL endpoints which then use WFS links to point at appropriate feature data including geometry.
—Josh
… On Mar 8, 2018, at 11:41 AM, Clemens Portele ***@***.***> wrote:
Discussed during the WFS 3.0 Hackathon:
The Core right now is designed for smaller amounts of collections. Issues with a large number of collections:
The feature collections metadata response (/collections) may become large. This could be addressed by supporting paging and filtering in this resource. This could be done in an extension.
The API definition can become very large, too, and computation intensive to compile, if each collection is listed as a separate operation. This could be addressed by using a parameter for the collection name, i.e. having only generic, parameterised /collections/{name} and /collections/{name}/items/{id} operations. Information about the available collections would then have to be obtained from the /collections operation. Another option could also be to support filtering of collections in the api operation.
The paths used above are based on #64 <#64>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#76>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AExWhi_oVUsa6mLxO04OzG1skZ2ks21lks5tcV9CgaJpZM4Si-rA>.
|
OpenAPI allows you to define a path as a template. The variables in the template can be defined as coming from an enumerated list. The URL template and enumerated lists are defined by the Server Object. See issue #90 It is also possible to include a parameter in a path. The name of a Paths Object can be a URL template. Variables in that template are described using Parameter Objects where the "in" property = "path". Parameters can also be used in the query, header, or cookie. There are lots of options. Lots of options means that we can create a big mess very quickly. In Issue 90 I propose a delimiter based approach for path templates. The intent is to provide flexibility while retaining semantics. We should be able to use this approach to address the multiple collections issue. Take a look |
Another perspective - OpenAPI describes an interface, not a server (or service). A single OpenAPI document can describe the offerings of dozens of servers. So it is reasonable to have multiple /collections paths as long as each one is rooted on a different URL. As I read the current draft, this would be an implementation decision. |
How about defining a hierarchy? e.g. NaturalEarth/Cultural/ne_10m_admin_0_countries/ Filtering makes sense as an extension, but I feel that basic 'browsing' of a tree-like structure is a critically important piece of functionality. |
@jerstlouis Are we looking for a way to include qualified names in a URI? For example, given the path /collections/{name} we could allow {name} to be "NaturalEarth:Cultural" or "NaturalEarth:Physical", etc. If so, then we just need to identify a delimiter for the qualified names which is legal to use in a URI template. |
Well I would certainly prefer the colon to the underscore, because the colon is a forbidden character in file names on Windows platforms and less likely to be used in a folder or layer name. However, the main aspect of it though is to allow to list the different hierarchy levels without the entire contents of all sub-directories, when considering the cases of millions of layers (serving en entire mapping agency's data sets and cascading services from a single end-point). And that doesn't solve that. The nice thing about that combined with filtering too is that your service can also act as your catalog without needing a specialized service for that. I argue that the list of collections (or layers) an end-point serves is not a description of the interface (or capabilities) and belongs separately. If you connect to the SEDAC WMTS service for example ( http://sedac.ciesin.columbia.edu/geoserver/gwc/service/wmts?service=wmts&request=GetCapabilities ), you get a 5.6 MB XML file which I find a ridiculous amount of data to do an initial service handshake, and then all your layers show up in your client in a very long list where you cannot find what you're looking for. Listing data layers served should be a separate operation, and that should support simple hierarchies as well as optional filtering capabilities (per geospatial or temporal extent, scale/resolution, data type, keywords, meta data fields, etc.). |
Quoting you earlier @cmheazel , this is kind of my whole point:
Shouldn't the same OpenAPI description apply to ANY service? Is it possible to leave the actual collections listing outside? Just found these OpenAPI issues which discuss this: OAI/OpenAPI-Specification#892 They are proposing this:
This would work perfectly. |
So it seems that OpenAPI allows you to list possible values by doing /api/{collectionID} for example. And together with the multisegment path parameters it would support the use case. OpenAPI doesn't currently support enumerating possible values for a parameter based on other parameters earlier in the path. In my opinion this is a major limitation and I filed an issue: OAI/OpenAPI-Specification#1693 This would allow to list the valid zoom levels for a given tiling scheme, the valid tiling schemes for a given collection etc. |
I prefer to distinguish between services and APIs. A service is an implementation of the SOA pattern where processing is performed through service-specific operations. An API is an implementation of the Resource Oriented pattern where resources are accessed using HTTP verbs and paths. Can't say that everyone buys into this but I helps me to keep things straight. |
@jerstlouis Parameter dependencies, an interesting concept. |
@jerstlouis Another option could be to switch to the HATEOAS pattern at some point. We have added support for alternative schema to the response media type schema. This frees you from the requirement that a response is specified in JSON schema. OpenAPI would then take you to the top-level metadata definition, which provides links to the next level, and so on. Similar to the WFS 3 approach for Collections and Collection. (just brain-storming here). |
@jerstlouis Now let's think about /api/{collectionId}. What you are asking for is separate OpenAPI documents based on the collection id. That's perfectly legal under OpenAPI. However, I would be worried about URI confusion. Across all of the multiple OpenAPI documents, is it possible for one extracted URL to point to two (or more) different resources? |
@cmheazel F.Y.I. WFS 2.5 took the HATEOAS approach. At every level there were hypermedia controls that would take you to the next resource(s). I prefer this approach. |
👍 One reason for #167 |
A few thoughts: a. This discussion is starting to look like duplicates to #64 and #90. Maybe someone should make a concrete proposal for an extension with an approach that both works in OpenAPI and the HATEOAS pattern plus that would continue to support the current path pattern for the simpler cases, i.e. the Core. b. As discussed in #64 I think there is value in having consistent patterns in the URIs (in addition to hypermedia controls in the responses), at least in the Core. c. The discussion does not consider an important resource, the dataset. If we do not take this into account, we are making a mistake. In schema.org/DCAT (key taxonomies for publishing data on the Web) datasets are important resources and we need to represent this in our resource architecture. Only this will get our datasets properly indexed by search engines, etc. Which is why the Core discusses datasets and distributions in a way that is consistent with schema.org/DCAT. At least for the Core the rule is that the part of an API that conforms to the spec (and has paths So, if you have multiple datasets that should be published via a single API, the approach consistent with the Core would be something like Any proposal for hierarchical collections should specify clearly how datasets and distributions are represented as resources in the proposal. |
@cportele my mention of /api/{collectionID} was referring to the OpenAPI functionality of enumerating the possible values for {collectionID}. With this, the /api itself could potentially be the same for different services serving different datasets. I am in fundamental disagreement with the idea that a service end-point should represent a single dataset. I think of the service as directly mapping to an organization's SDI's server, serving all datasets available within it. This makes it possible to use the end-point to implement catalog queries and the likes. I have single piece of software serving all these data sets, why would I want more than one end-point? It doesn't make any sense to me to have multiple /api for this. My proposal for hierarchical collections would depend on support for multi-segment paths that OpenAPI currently does not support ( /collections/{collectionID+} ). At some level within your multi-segment collectionID, you would have a 'dataset', where the meta data would reside. Some datasets already have a hierarchical structures and the current 'the whole end-point is a single data set' only accommodates a single level of 'collections' within that data set. If we really wanted to make clear the dataset resource distinction I guess it would have to be /{dataSetId+}/collections/{collectionID+} to support multi-segment path in both the datasets as well as within the collections (for single data sets that have a more hierarchical structure). And then the /api could not be at the same level as collections without making it dataset specific... |
@jerstlouis - I think you are mixing things. There is no question that it should be possible to use a single piece of software for serving multiple datasets (ours supports this, too), at the same time it should also be possible to use a microservices architecture. There are multiple ways how to extend the Core to allow that. A boundary condition is that whatever we specify should be consistent with the Data on the Web Best Practices and identifying dataset and distribution resources plus providing metadata for them is an important part of it. Whether it makes sense to support OpenAPI definitions for each dataset or not (i.e., whether to support modular APIs) is a separate discussion and I am not sure, if there is one answer for all cases. It could be an option to make the By the way, in the discussion that lead to the current path structure, we also discussed that it should be possible (for an API) to publish API definitions for each collection separately (see the |
Discussed during the WFS 3.0 Hackathon:
The Core right now is designed for smaller amounts of collections. Issues with a large number of collections:
/collections
) may become large. This could be addressed by supporting paging and filtering in this resource. This could be done in an extension./collections/{name}
and/collections/{name}/items/{id}
operations. Information about the available collections would then have to be obtained from the/collections
operation. Another option could also be to support filtering of collections in theapi
operation.The paths used above are based on #64.
The text was updated successfully, but these errors were encountered: