Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Resource Catalogs and req. class organization #300

Closed
jerstlouis opened this issue Jul 18, 2023 · 86 comments · Fixed by #305, #358 or #361
Closed

Local Resource Catalogs and req. class organization #300

jerstlouis opened this issue Jul 18, 2023 · 86 comments · Fixed by #305, #358 or #361
Assignees

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Jul 18, 2023

Following up on past suggestions via e-mail, and based on previous discussion and clarification from @pvretano, if I understand correctly the intent of Section 12 - Local Resource Catalogs would allow to catalog OGC API collections, whether local or from other services, as local collections at /collections/{collectionId} (instead of as items at /collections/{collectionId}/items/{itemId}), so as to enable the concept of a super-API that could gather several data collections under a single API (which could still use remote-links to link to the access mechanisms like /coverage, /tiles, /items etc.). In addition, this would enable functionality to filter a large set of collections directly at /collections (also related to potential Common - Part 2 req classes for collection filtering), instead of requiring a special catalog collection that is metadata mixed up with all the other collection which are actual data.

Accessing /collections as a catalog is particularly interesting for a generic OGC API clients (e.g., the GDAL OGCAPI driver), as it does not require the implementation of a dedicated Records client and does not involve any additional step for the user. Furthermore, local processing resource can be integrated into the same API, and data integration/access mechanisms like DGGS and Processes - Part 3: Workflows and Chaining makes for a very interesting API and system of systems.

If this could indeed be possible, I believe there might be some issue with the current requirements class layout.
In particular, the current stated dependencies of Local Resources Catalogue on the Record API and the Record Collection might be problematic.

I made the following suggestion as a potential alternative requirements classes layout to address these issues:

Requirement Classes defining resources

  • Core (concept of a record and core properties) -- should they really be called "core queryables" since they can also be used in "crawlable catalogues" that cannot be queried?

Requirement Classes defining origins

  • Records Collection (dependency on Features, with the records as a collection at /collections/{collectionId}/items/{recordId})
  • Local Resources Catalogue (records at .../{resources}/{resourceId}) where {resources} could be e.g., Collections (/collections/{collectionId}) or Processes (/processes/{processId}) or EO Scenes (e.g.. /collections/sentinel2-l2a/scenes/{sceneId}). This would make it possible to apply Records advanced filtering capabilities at /processes, /collections, /collections/sentinel2-l2a/scenes/. It would be good to have a mechanism to associate specifically with a particulary type of local resources, like "Common - Part 2: Collections" or "Processes", perhaps as requirements classes of those other standards (we are planning to revive the Common SWG at the September meeting, meeting in the OGC API track).

Requirement Classes defining query parameters

  • Queryable Records (q=, type=, externalId= )
  • (for datetime and bbox, Common or Features or some other spec would actually be defining them)
  • Records Filtering (filter=, not necessarily CQL2, since you could have filter-lang= and you would explicitly declare conformance to CQL2 just like with Features - Part 3)
  • Records Sorting (sort=)
  • Records property selection (properties=) -- you may want to get lots of info back for several records, but only want some key properties that makes the results a lot smaller

The only requirements class dependency would be of everything else on "Core" (and Records Collection on OGC API - Features).

Any end-point to a resource supporting OGC API - Records "Core" requirements class is automatically a "Records API" and a "Crawlable catalogue".
If that API implements Full Text Search, Records Filtering, or Records Sorting, or Records property selection, it is a kind of "Searchable Catalog".
A "Searchable Catalog" profile could specify that all of these need to be implemented to conform to it.

Would the SWG consider any of these suggestions?

Thank you!
cc. @tomkralidis @joanma747

@pvretano
Copy link
Contributor

@jerstlouis sorry I don't quite understand your initial description.

A "local resources catalogue" is a way of extending an existing endpoint to allow catalogue-like queries at that endpoint. We always use '/collections' as an example but other endpoints might be candidates too (e.g. /processes).

So you take an endpoint like /collections extend it with some additional metadata (mostly optional), implement some query parameters (like q, type, filter -- for CQL2, etc.) in addition to any that might already be defined there (like bbox) and voila, you can now query that endpoint as if it were a mini catalogue.

The response is exactly as before but the list of objects only includes those that satisfy the query predicates.

Is this your understanding too?

@pvretano
Copy link
Contributor

pvretano commented Jul 18, 2023

B.T.W. @jerstlouis we discussed this before and I told you I would refactor the dependencies because they are not quite right. Unfortunately, my timetable never seems to align with yours!
Still, I will review your proposal and present it to the SWG.

@jerstlouis
Copy link
Member Author

jerstlouis commented Jul 18, 2023

@pvretano Yes I believe so.

The issue is the way the requirements classes, and in particular the dependencies (on Records API and Records Collection) are currently defined in the spec (as you had mentioned, and yes I noticed they were not yet corrected).

So this issue is basically a blueprint on how this dependency issue could be addressed, while also simplifying things and perhaps facilitating how to "plug" Records with Common - Part 2, Processes, and a potential Coverages - Scenes requirements class.

I am also hoping that this /collections filterable Local Resource Catalog can extend to collections whose resources for data access mechanisms (e.g., /items, /coverage, the actual processes...) can actually live on another server.
e.g., I could set up an OGC API that includes both our collections and collections from your service at /collections, and processes from both our service and your service at /processes.

P.S. I think both of our timetables are way too busy to try to align them ;) But these are async, not sync, requests ;)

@jerstlouis
Copy link
Member Author

Still, I will review your proposal and present it to the SWG.

Thanks a lot, I would be happy to hop in whenever that is discussed.

@pvretano
Copy link
Contributor

I'll let you know when it is on the agenda...

@pvretano pvretano added this to Backlog in Part 1: Core via automation Jul 24, 2023
@pvretano
Copy link
Contributor

pvretano commented Aug 7, 2023

07-AUG-2023: We discussed a bit in the SWG but no one has had time to review in detail. Homework for everyone is to review for the next SWG meeting and we can discuss then.

@pvretano pvretano moved this from Backlog to Waiting for Input/feedback in Part 1: Core Aug 7, 2023
@pvretano pvretano self-assigned this Aug 21, 2023
@pvretano
Copy link
Contributor

21-AUG-2023: @pvretano will refactor the conformance classes to put the query parameters together so that there is only a single set of requirements for the query parameters that can then be referenced in the searchable to local resource catalogue conformance classes.

@pvretano
Copy link
Contributor

18-MAR-2024: Had a long discussion about the current organization of the standard. @jerstlouis wants us to cut it back a bit so we will have a full discussion with @jerstlouis about this in the SWG meeting on 15-APR-2024.

@kalxas
Copy link
Member

kalxas commented Mar 18, 2024

Sorry for missing the meeting (still on winter time in EU)
Does this mean we are not going to make the motion for OAB?

@jerstlouis
Copy link
Member Author

jerstlouis commented Mar 18, 2024

@kalxas I don't want this to delay anything. The changes I proposed are mostly organizational, in terms of:

  • needing fewer req. classes,
  • making it easier for a client implementations to work regardless of which deployment pattern is used (crawlable catalog, OGC API - Features pattern / called "searchable catalog" right now, arbitrary resource / Local Resource Catalog),
  • making it easier to reference Records for implementing local catalog of other resources (collections, processes, scenes)

I thought @pvretano mentioned on the mailing list the SWG was not yet ready to make a motion for OAB review just yet, but still I think these changes are not significant enough to prevent going to the OAB / RFC and applying them after the comment period is completed.

Based on discussion with Peter today this is what the reshuffling could look potentially like, assuming the SWG likes the idea:

Requirement Classes defining resources

OGC API - Records "Core" (merge all of these into 1 req. class):

OGC API - Records "Dynamic Catalog"

  • 9.4. Requirements Class "Local Resources Catalog" (Deployment)
  • This would define the .../{recordResources}/{recordId} pattern where individual records are accessed, and the fact that .../{recordResources} returns a JSON response with "{recordResources}": [ ... ] where each element of the array follows the Record Core. This is probably already aligned with /collections, /processes, /items and /scenes.

OGC API - Records "Feature Records" (Records implemented as a profile of OGC API - Features, where individual features are records, merge these into single req. class)

Requirement Classes defining parameters

8.4. Requirements Class "Record Core Query Parameters" (Building Block)

8.6. Requirements Class "Sorting" (Building Block)

8.7. Requirements Class "Filtering" (Building Block)

There would not be a need for req. classes of parameters specific to a particular deployment e.g.:

Actually, there is one issue to resolve regarding "Record Collection", which I think is an issue regardless of the re-organization.

The Local Resource Catalogs do not depend on "Record Collection" right now, but the Crawlable Catalogs do.

It makes a "Local Resource Catalog" not a valid crawlable catalog, which I think is a problem as ideally the base target client should support "any" kind of OGC API - Records catalog.

@pvretano
Copy link
Contributor

@jerstlouis, with all due respect and I mentioned this in the SWG meeting, this WILL delay the OAB submission. Even simply taking editorial considerations into account, broken references, renaming of files to be more consistent, etc. will require quite a bit of work to get the document back into shape for the OAB.

With regard to the "Local Resources Catalog" and "Record Collection". The basic schema of the local resource (i.e. the collection, the process list, the scene list, etc.) is not in the control of the OARec SWG.

If we make "Record Collection" a dependency for the "Local Resources Catalog" then that means that each of the SWGs that control a catalog-like local resource (e.g. Features, Processes, Coverages, etc.) would need to refactor their standards to derive the schema of their "local resource" from the collection schema in Records. This is why we took the approach that we did ... that is we describe how an existing local resource can be extended to behave like a catalogue in a non-breaking way (i.e. you simple add a bunch of additional properties which in most/all cases is not illegal and you add some query parameters to the endpoint).

As I mentioned in the SWG meeting, it would be very nice from the perspective or consistency across the OGC API standards to have all the "catalogue-like" endpoint schemas be drived from "Record Collection" but I don't think that, that level of consensus is achievable in the short term and certainly not before we want to release the standard for OAB review.

I can add language to the current Local Resources Catalog requirements class that say that if a "local resource" is extended as described in the requirement class then that "local resource" can be considered a starting node for a crawlable catalogue the way a record collection is considered a starting node for crawling right now.

@jerstlouis
Copy link
Member Author

jerstlouis commented Mar 18, 2024

@jerstlouis, with all due respect and I mentioned this in the SWG meeting, this WILL delay the OAB submission.

What I meant is that the changes could potentially be done after the OAB / RFC, or if the SWG does not think this is a good idea, and this would cause too much delays, my suggestions can be ignored.

Again, sorry to be the PITA that I am.

The basic schema of the local resource (i.e. the collection, the process list, the scene list, etc.) is not in the control of the OARec SWG

It's not, but it so happens that all of the existing ones right now (.../collections, .../processes, .../scenes, .../items) already match that same high-level pattern:

{
   "links" : [ ... ],
   "{recordResources}" : [
      {
         "id": ...,
         "links": ...,
         ...
      }, ...
   ]
}

So there should be no refactoring needed.

I think it's okay to if they explicitly ask this of future req. classes as well that want to conform to the Records Local
Resource Catalogs.

Even tileMatrixSets fit the pattern.

if a "local resource" is extended as described in the requirement class then that "local resource" can be considered a starting node for a crawlable catalogue the way a record collection is considered a starting node for crawling right now.

That would be the list of local resources, right? e.g., /collections or /processes would be the start of the crawlable catalog ? EDIT: The challenge is the lack of this extra "collection" level in the Local Resource Catalog approach -- the list of local resources is directly the catalog. This differs from the current Crawlable / Searchable Catalog deployments which are based on the OGC API - Features resources organization.

@cportele
Copy link
Member

each of the SWGs that control a catalog-like local resource (e.g. Features, Processes, Coverages, etc.) would need to refactor their standards to derive the schema of their "local resource" from the collection schema in Records

Since Features is mentioned here: I assume that this is about the Collection resource in Features Core? How could that extend Records? I did not find a collection schema in Records, so I assume this is about the catalog schema. The catalog schema extends the Collection resource with additional properties, some of them required. A Catalog is a specific Collection, not the other way around. Or am I missing something?

@jerstlouis
Copy link
Member Author

jerstlouis commented Mar 18, 2024

@cportele Currently there are 3 "deployments" mode:

  • Crawlable catalog
  • Records as items of a feature collection
  • Local resource catalog

A Catalog is a specific Collection

is only true for the first 2.

For local resource catalog, the intent is that /collections (for example) is itself the catalog.

I am suggesting that there could be a more rudimental concept of a list of records implied by an OGC API - Records "Core" req. class, which is not a collection of geospatial data as defined in Common - Part 2, or a collection of geospatial features / items as defined in Features, but is simply a list (collection purely in the CS sense) of records (but of which an OGC API - Common - Part 2 / Features "Collection" is a derived class -- EDIT: but here we're talking about the FeatureCollection response at /items, not the collection description).

Local resource catalogs (e.g., /processes, /collections) could conform to that list of records, without being a full blown OGC API - Common - Part 2 / Features "Collection".

The only caveat is that the Core would need to allow non-GeoJSON catalogs. i.e., application/json content negotiation should not mean application/geo+json for those non-geospatial data lists of records, which might not support application/geo+json. So the current non-GeoJSON responses of /collections and /processes would be valid list of records (properties directly in the records, not inside a GeoJSON "properties" : { } block).

I'm not sure whether the current JSON requirement class is in line with this...
It requires support for GeoJSON, even for non-geospatial records?
And it mentions application/ogc-catalog+json and application/ogc-collection+json as opposed to just application/json which is what is already used with /collections and /processes. Doesn't STAC also uses simply application/json ( https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#media-type-for-stac-collections / https://github.com/radiantearth/stac-spec/blob/master/catalog-spec/catalog-spec.md#media-type-for-stac-catalogs ) ?

I'm wondering if having a very high level schema like { "{recordResources}" : [ { }, ... ] } would be a good reason to just stick to application/json ?

I'm also realizing that the current catalog.yaml schema would be too restrictive to allow existing collections / processes to be valid catalogs per that schema. That's basically the issue that I'm raising: it would be great if there was a common schema (though JSON Schema might not be able to express the "{recordResources}" : [ ] part) between the Local Resource Catalogs and the other catalog deployments.

EDIT: I am probably confused by the two different catalog collection levels of /collections/{collectionId} and /collections/{collectionId}/items that are implied by "Record catalog as an OGC API - Features collection of record features" -- I find this very confusing in trying to make the pattern compatible with Local Resource Catalogs). catalog.yaml depends on collection.yaml, so this is really only for Features. In Local Resource Catalog, your .../{recordResources} is directly your catalog (e.g., .../items, .../collections, .../processes), you may want to have catalog-level properties there.

Alternatively, perhaps all of these "Building blocks" requirement classes do not need to be requirement classes and are just building blocks (as per the OGC Building Blocks registry concept), and only the Deployment requirement classes need to be req. classes? And perhaps the "Local Resource Catalogs" similarly are not really requirement classes of records, but requirement classes of whichever standard where the associated resources are defined? (Common - Part 2, Processes, Coverage Scenes...).

@pvretano
Copy link
Contributor

pvretano commented Mar 18, 2024

@cportele the idea, right now, is that the catalog standard define how to EXTEND exsting OGC resources (like /collection or /processes) with additional properties and query parameter so that they behave like mini-searchable catalogues.

So, the Records standard DOES not use the "Record Collection" or catalog (the terms are used interchangeably in the standard) building block for the Searchable Catalog and Local Resources Catalog deployments ... since these deployments use already-defined "collection objects" that simply need to be extended in order to be usable as mini searchable catalogues. @jerstlouis is correct about that.

NOTE: I use "collection object" here in the general sense to mean any OGC object that is used to describe a collection of stuff the way /processes defines a collection of processes and /collections/{collectionId} defines a collection of features or records.

Since the Crawlable Catalog deployment does not have a pre-existing collection object that the Records standard could extend, we simply took/stole the Features collection object, extended it, and used that to describe a static collection of records . This object is called a "Record Collection" or "Catalog" in the standard and this is where catalog.yaml comes into play.

So you are correct when you say that a "catalog" (as defined by catalog.yaml) is a specific collection and not the other way around.

However, if we were to reorganize the catalog specification "the other way around" as @jerstlouis is suggesting then I believe that would mean that all the OGC standards that define collection objects or one sort or another -- that want those collection objects to be searchable as mini catalogs -- would need to derive the schema of those collection objects from a "Record Collection" object defined in the Records standard. The schema of that "Record Collection" object in JSON would be as defined in catalog.yaml but catalog.yaml would absorbe collection.yaml rather than include it from Features. Features and the other OGC standards would then define their own collection objects (e.g. /collection/{collectionId} or /processes) by including catalog.yaml and then modyfing/adding members as required to satisfy their requirements.

While this is probably a good thing (at least from the perspective of consitency across the OGC standards), in records we decided to go the other way and we let each standard define its collection object(s) in any way it wants and the Records standard simply describes how to extend those objects with additional properties [and the endpoints that server them with additional parameters] to enable catalog-like searching. We figured that would be that path of least resistance to adoption so we went with that.

I don't think that I am not opposed to doing things the other way around, as @jerstlouis is suggesting, but if that is the case then I don't think we can submit to the OAB with such a big outstanding change waiting to be applied during the public review period. The OAB would be reviewing something that looks nothing like what the standard would eventually look like.

@jerstlouis the current JSON requirments class requires GeoJSON for the Record building block only; not the Record Collection building block. There was a cut-n-paste error that I have fixed in the outstanding "carl-edits" PR.

@jerstlouis the current standard also defines a set of media types for JSON and HTML encoded records, catalogs and other collections. If a resource is encoded as HTML then its media type is always text/html. If a record is encoded as JSON then its media type is application/geo+json; application=ogc-record. If a catalog (i.e. a "Record Collection" object) is encoded as JSON then its media type would be application/ogc-catalog+json.

@fmigneault
Copy link

I like @jerstlouis proposal of "always required queries" vs "optional unless x field is used".

If the "one or more" approach is taken instead, I believe the prop=value case should be omitted from that minimum set, otherwise it just means implementing anything would be considered valid.

I think defining a non-OpenAPI way to list supported queries could be acceptable only if it limits itself to a relatively simple listing of IDs (and couldn't the /conformance endpoint already indicate that?). Otherwise, we basically are reinventing what OpenAPI already does by listing which paths support which queries using which types.

@jerstlouis
Copy link
Member Author

@fmigneault

couldn't the /conformance endpoint already indicate that?). Otherwise, we basically are reinventing what OpenAPI already does by listing which paths support which queries using which types.

simple listing of IDs

a simple listing of properties would be the idea, yes

/conformance would indicate whether you support query parameters, but not specifically which query parameters are supported... I'm also wondering how (if at all) .../queryables (that can be used in CQL2 filter) relates to these Records queryables

@fmigneault
Copy link

Couldn't there be respective /conf/queryables/q-param, /conf/queryables/limit-param, etc. in the /conformance list? That would solve the non-OpenAPI definition need and reuse the existing endpoint rather than adding yet another place to look for "does server X support Y".

I don't think all of them apply to /queryables (if looking at it very technically).
The type, datetime and so on are inside the documents, therefore they are valid /queryables, but the q, ids, limit are not themselves embedded in the documents to filter. Mixing them under /queryables would make comprehension of "available fields to query" useless.

@jerstlouis
Copy link
Member Author

@fmigneault q, limit and ids should always be supported anyways from a Records end-point, because they are always applicable. So those are not really the issue.

It is these other parameters corresponding to the applicable queryables to filter on, which are inside the records. But I don't know whether this extends to what .../queryables is about, given that this was originally intended specifically for filtering with CQL2.

@pvretano regarding [ogc-rel:ogc-catalog-local], and going back to the root of this issue about a common core, couldn't this possibly simply be [ogc-rel:catalog], with the same link relation used, regardless of deployment type? This way a client could always follow that link the same way to an end-point with a catalog, supporting the relevant query parameters based on the conformance declaration.

@pvretano
Copy link
Contributor

@jerstlouis @fmigneault these are the kinds of issues we get into when you try to parse things too finely.

For me, the simplest approach is that if you say you implement the Query Parameters requirement class then you implement ALL of them. If your information model does not have spatial or temporal properties then the BBOX and DATETIME query parameters are no-ops. Features, for example, does not give you an option of whether you implement bbox or not even though you can have features with no spatial or temporal information at all (i.e. geometry: null). Same for datetime.

All this pretzel bending of listing individual conformance classes for each query parameter OR having to look into the OpenAPI document OR somehow highjacking the /queryables endpoint OR supporting a subset and then conditionally supporting the others based on whether the information model has spatial or temporal properties is just too complicated. So, I think we should just stick with the simple approach. If you implement the Query Parameters conformance class you must implement ALL the query parameters even if in your particular case BBOX and DATETIME are ignored since you don't happen to have spatial or temporal information in your information model.

@jerstlouis strictly speaking you only need [ogc-rel:catalog]. I broke it out into three to help clients immediately know what kind of catalog they were heading to without having to inspect anything.

Assuming that we only have [ogc-rel:catalog] then a catalog client would, upon landing on a catalog page proceed as follows:

  1. See if the page has a "records" member. If it does then the client can read records encoded directly in the catalog document. There might also be "rel=next" links in the links section of the catalog that point to a next page of records.
  2. See if the links section of the catalog contains "rel=item" links. If it does then resolve each link to get each record of the catalog.
  3. See if the links section of the catalog contains a "rel=items" link. If it does, then you have a queryable endpoint and you can use query parameters to fetch subsets of records.

You could, conceivably, have all three (records in-line, referenced records and records retriveable from a searchable endpoint) mixed together but I have never seen that and I'm not sure who would have such a requirement.

@jerstlouis
Copy link
Member Author

Assuming that we only have [ogc-rel:catalog] then a catalog client would, upon landing on a catalog page proceed as follows:

I really like that approach :) This way, one could write a Records client that can discover things with a clear logic, without having to care about 3 different deployment types.

If it does, then you have a queryable endpoint

Whether the endpoint is queryable is a different story... you should know that from one of the queryable conformance class instead. Because the Local Resource Catalog can also be queryable.

@pvretano
Copy link
Contributor

Whether the endpoint is queryable is a different story... you should know that from one of the queryable conformance class instead. Because the Local Resource Catalog can also be queryable.

The "rel=items" comes from Features and there, the query parameters are part of core and not optional. So, I believe, that if you see "rel=items", it's a queryable endpoint.

@jerstlouis
Copy link
Member Author

So, I believe, that if you see "rel=items", it's a queryable endpoint.

Right, but Searchable Collections or Processes could also be queryable.

You could also set up a catalog not following the Features API, but implementing Query parameters / Filtering / Sorting.
Right now with the latest draft this would be classified as a "Local Resource Catalog" deployment type, but I am hoping nothing prevents such a deployment from cataloguing external resources? e.g., having a CSW-type catalog at a /catalog end-point instead of at /collections/{catalogId}/items.

@pvretano
Copy link
Contributor

@jerstlouis all searchable and crawlable catalogs can catalog external resources. Heck, that is the primary use case! The Element84 site that you sent me does exactly that. The Element84 server catalogs all the Sentinel , Landsat, NAIP, Copernicus DEM and other data products. They did it in a bunch of homogenous catalogs accessible here:

https://earth-search.aws.element84.com/v1/collections.

and you can, for example, access the NAIP catalog here:

https://earth-search.aws.element84.com/v1/collections/naip

and the records of the NAIP catalog here:

https://earth-search.aws.element84.com/v1/collections/naip/items

I would have created a single heterogenous catalog containing all the metadata and you could use the type query parameter to zero in on a specified resource type but that is neither here nor there.

To be honest with you I don't think that /processes or /collections, etc. should be queryable at all. What's the point of making them queryable? It's to find processes or collections that satisfy certain query predicates ... right?

Well, what if, instead I set up a searchable catalog with the id "catalog". I would then crawl the resource tree of the local OGC deployment and harvest ALL the metadata. ALL the process summaries, ALL the collections, ALL the scenes, ALL the styles ... everything! In the end my "catalog" catalog would have a record describing every single resource I could retrieve from my local OGC API deployment. I could then do stuff like:

GET /collections/catalog/items?bbox=1,2,3,4&q=term1,term2,term3&type=collection&itemType=feature

and I would get back the subset of records for feature collections that satisfy the bbox and q predicates. I would say this is an easier way to locate local resources since you only need to hit a single endpoint.

But why stop at "local" resource? I could also add external resources to my "catalog" catalog too ... very easily ... and have a single catalog in my deployment that registers all the local resources and all the external resource that I am interested in. Or, I could do what Element84 did and create two catalogs with ids "LocalCatalog" and "ExternalCatalog" and havest each class of resources (local and external) into the appropriate catalog! All easily supported.

As for making a queryable /catalog endpoint ... why? Features already provides a fully fleshed out API . Why re-invent the wheel?

@jerstlouis
Copy link
Member Author

jerstlouis commented May 12, 2024

As for making a queryable /catalog endpoint ... why? Features already provides a fully fleshed out API . Why re-invent the wheel?

Because I passionately dislike the whole idea of the Features API deployment type of catalogs since the very beginning, because it mixes up data & metadata. I mentioned this at the very first OGC API hackathon in London in June 2019 and I have not changed my mind, and I will unlikely ever change my mind about this. For me, not being forced to have /collections/{catalogId}/items is the salvation of the Records API ;)

Well, what if, instead I set up a searchable catalog with the id "catalog".

But why stop at "local" resource? I could also add external resources to my "catalog" catalog too ...

And this is exactly what I am hoping that /collections and /processes can be.
They could be BOTH catalogs (of local and/or external resources) AND live collections (with one or more data access mechanism) / processes (with an execution end-point) at the same time. The data access mechanisms / execution end-points could be proxies or direct link to external OGC API end-points.

The great things about this are that:

  • By virtue of being "live", clients will hit them up and will run into 404 if the external resource they catalog goes down and moves. I believe this will result in more up to date catalogs / fewer broken links.
  • You can point a client like GDAL or QGIS directly to a catalog and start previewing adding data, without any special "catalog" step.
  • You can set up an OGC API cataloging processes and collections from other reputable OGC API end-points and present them into one API that could act like a single federated OGC API end-point.

harvest ALL the metadata. ALL the process summaries, ALL the collections, ALL the scenes, ALL the styles ...

While you could use the type parameter to search for one of these types of resource, typically you're looking to discover one of these things, so having distinct catalogs for each of these things (which could also happen to be the live resource at the same time) makes things easier in my opinion. And the scenes for example would be for a particular collection like sentinel2-l2a, so it's also nice to have a dedicated scenes catalog end-point within that collection.

In the end, opinions differ and people are likely to implement and deploy things differently.
The idea of a "common core" (the spirit of it, regardless of how req. classes are organized) allows the client to find the catalogs and use queryables/filters/sort the same way if they are supported, regardless of deployment type.

@jerstlouis
Copy link
Member Author

@pvretano

all searchable and crawlable catalogs can catalog external resources.

I was saying:

classified as a "Local Resource Catalog" deployment type, but I am hoping nothing prevents such a deployment from cataloguing external resources?

Since "Searchable catalog" in the current req. classes is the Features API deployment type, I'm still not clear whether the current draft and your comment (that did not mention Local Resource Catalog) precludes the Local Resource Catalog deployment type from cataloguing external resources.

That's because the only deployment type supporting query parameters at the moment are "Searchable Catalog" (Features API) and "Local Resource Catalog".

@pvretano
Copy link
Contributor

@jerstlouis your definition of local and external is too loose.

Say I have two processing servers P1 and P2. P2 has a processes named A on it and P1 "cascades" or behaves as a proxy for that A process. That means that A appears on P1 as a process but when I execute A on P1, P1 actually delegates execution of A to P2.

From the client persepctive if I do a GET /processes on P1, process A will appear in the list of processes. And if I do a GET /processes/A on P1 I will get a description of A. To the client, it is toally opaque that A is acutally a process on P2 and that is where is actually get executed. Of course there might be some metadata in the process description that tells you that it is actually a remote process but without that information the client would never know.

Now, if I enhance /processes on P1 to be a local resources catalog, I can query the /processes endpoint and A will appear in the results (assuming it satisfies whatever predicates I use). As far as the catalog is concerned A is a processes on P1.

If the process descriptions on P1 includes some property in the process summary/description like isRemoteProcess then I could query /processes on P1 and say something like 'GET /processes?isRemoteProcess=true` and see all the processes that are cascaded or proxies on P1.

I think this is what you are looking for and if it is, this is opaque to the catalog and the local resources catalog would work just fine in this situation.

@jerstlouis
Copy link
Member Author

jerstlouis commented May 12, 2024

@pvretano

To the client, it is toally opaque that A is acutally a process on P2 and that is where is actually get executed.

Yes, that's what I'm hoping one can do for both /collections and /processes.

But I was also considering the case where you just have a more traditional CSW-like catalog, and like me, you are philosophically against the idea of the Features API deployment type.

This CSW-like catalog would have those core externalId queryables and would just sit at /catalog in your OGC API (with individual records either linked with rel: item or directly embedded in the response). This /catalog might be the only thing in the API in addition the landing page / API definition / conformance as it's purely a catalog service.

You can't currently call this a "Searchable Catalog", because per the current req. classes that implies the Features API.
You can't call this a "Crawlable Catalog", because those do not have query parameters.
The "Local Resource Catalog" is the only deployment type that fits this deployment, but all the resources in this catalog happen to be truly external.

This was one of the things that would have been fixed with the re-organization. I hope this is still a valid deployment type, even if this makes "Local Resource Catalog" a misnomer.

@kalxas
Copy link
Member

kalxas commented May 12, 2024

@jerstlouis CSW was also based on WFS in a similar way that Records is now based on Features.

@kalxas
Copy link
Member

kalxas commented May 12, 2024

you can just implement your catalog in /collections/catalog and you have the use case you describe above

@jerstlouis
Copy link
Member Author

@kalxas

@jerstlouis CSW was also based on WFS in a similar way that Records is now based on Features.

I did not know that! Thanks for pointing that out. I'm trying to find in https://docs.ogc.org/is/12-176r7/12-176r7.html where that would be. I see a distinct GetRecords operation, but no direct reference to WFS (other than it can catalog WFS services). So I'm a bit confused?

you can just implement your catalog in /collections/catalog and you have the use case you describe above

That's exactly what I don't want to do, because in my view (and as per the resolution of opengeospatial/ogcapi-common#140) /collections is strictly for collections of geospatial data, which such a catalog is not.

@kalxas
Copy link
Member

kalxas commented May 12, 2024

@jerstlouis CSW was also based on WFS in a similar way that Records is now based on Features.

I did not know that! Thanks for pointing that out. I'm trying to find in https://docs.ogc.org/is/12-176r7/12-176r7.html where that would be. I see a distinct GetRecords operation, but no direct reference to WFS (other than it can catalog WFS services). So I'm a bit confused?

There are no direct references in the final CSW document, but during the development stages of CSW, WFS was the base.
Believe me, the SWG is very committed to the way CSW worked, we are just adapting to the OGC API way now.

you can just implement your catalog in /collections/catalog and you have the use case you describe above

That's exactly what I don't want to do, because in my view (and as per the resolution of opengeospatial/ogcapi-common#140) /collections is strictly for collections of geospatial data, which such a catalog is not.

Records can also be considered as resources (in CSWRecord we had embeded GML for the spatial part, nowdays we have GeoJSON).

I understand your view but I believe we should remain in the CSW spirit that was always based on WFS.
It is too late to change the API part.

@jerstlouis
Copy link
Member Author

There are no direct references in the final CSW document, but during the development stages of CSW, WFS was the base.

But this is the important difference between "depending on another spec" (as in Records Features API) vs. re-using common building blocks (as in CSW and WFS).

It is too late to change the API part.

I am not suggeting to get rid of the Features API deployment type, or change the API at all.

I am asking for clarification on whether the "Local Resource Catalog" req. class excludes cataloguing external resources. I really hope it does not, because then my use case / preferred deployment type would already be supported through the Local Resource Catalog provisions.

@kalxas
Copy link
Member

kalxas commented May 12, 2024

One more thing about collections:

In CSW we had:

  • ParentIdentifier property to declare hierarchy between CSWRecords and
  • DatasetSeries record type to declare a collection.
    Those had to be combined by clients in order to make an actual hierarchy.

A few years before OGC API Records, the EO community was heavily working with OpenSearch because the collection handling was easier than CSW.

Now we have a nice collection mechanism in Common/Features/Records and we argue against using it. We are shooting our own feet here.

I would argue on the opposite direction: There is no mechanism in Features to define multiple hierarchies (collection inside a collection) and I foresee that to be a problem in the future...

@jerstlouis
Copy link
Member Author

@kalxas About collections inside a collection, see opengeospatial/ogcapi-common#11 and opengeospatial/ogcapi-common#298 .

I should be finally drafting the req. class this week.

@kalxas
Copy link
Member

kalxas commented May 12, 2024

There are no direct references in the final CSW document, but during the development stages of CSW, WFS was the base.

But this is the important difference between "depending on another spec" (as in Records Features API) vs. re-using common building blocks (as in CSW and WFS).

It is too late to change the API part.

I am not suggeting to get rid of the Features API deployment type, or change the API at all.

I am asking for clarification on whether the "Local Resource Catalog" req. class excludes cataloguing external resources. I really hope it does not, because then my use case / preferred deployment type would already be supported through the Local Resource Catalog provisions.

Lets not waste any more time then.
Since we agree on the API part, perhaps we should consider splitting the specification in the RFC stage and make the "Local Resource Catalog" as Part 2

@kalxas
Copy link
Member

kalxas commented May 12, 2024

@kalxas About collections inside a collection, see opengeospatial/ogcapi-common#11 and opengeospatial/ogcapi-common#298 .

I should be finally drafting the req. class this week.

Thank you for the links!

@jerstlouis
Copy link
Member Author

Lets not waste any more time then.

I feel that we are making some important clarifications on things that are not obvious to external implementers / readers, not wasting time.

Since we agree on the API part, perhaps we should consider splitting the specification in the RFC stage and make the "Local Resource Catalog" as Part 2

That would make the use cases for which I care about Records (/collections, /processes and /catalog) second-class use cases, and lose the opportunity that a client could access these, the crawlable catalog and the Features API approach the exact same way (which I think @pvretano already mostly addressed with #358). So I am really not in favor of moving things out to a Part 2 and lose all that.

I don't want these discussions / feedback to delay anything, so please do not delay the OAB review / RFC .

@kalxas
Copy link
Member

kalxas commented May 12, 2024

Lets not waste any more time then.

I feel that we are making some important clarifications on things that are not obvious to external implementers / readers, not wasting time.

Sorry, I did not mean to say that the discussion is not productive.
On the contrary, I believe all these details/issues/clarifications should have been handled with Common 1.0 before Features 1.0 was final :)

We (SWG) are getting a lot of discussion about the fact that Records is not yet finalized, because a lot of organizations want to adopt (or even have already adopted) Records.

Since we agree on the API part, perhaps we should consider splitting the specification in the RFC stage and make the "Local Resource Catalog" as Part 2

That would make the use cases for which I care about Records (/collections, /processes and /catalog) second-class use cases, and lose the opportunity that a client could access these, the crawlable catalog and the Features API approach the exact same way (which I think @pvretano already mostly addressed with #358). So I am really not in favor of moving things out to a Part 2 and lose all that.

I don't want these discussions / feedback to delay anything, so please do not delay the OAB review / RFC .

Thank you

@tomkralidis
Copy link
Contributor

Hi all: some comments:

  • if one analyzes CSW, we see the parallels to WFS (GetFeature/GetRecords, maxFeatures/maxRecords, FES, Transactions, etc.). CSW was (basically) a WFS with a more fully defined information model
  • OGC API - Records utilizes OGC API - Features in a similar manner (Core, Part 4, etc.), with a basic record model available in multiple representations
  • CSW had a deployment pattern of one CSW against a single repository of user defined hierarchy
  • OGC API - Records takes on the /collections concept of OGC API design
  • In the "metadata record is a collection item" use case, it is understood that this pattern is addressing the searchable catalogue use case/workflow. This fits records that may or may not have deeper granules

Over the past 4 years (or more), we have worked to support 3 clauses based on the use cases/workflows:

  1. Searchable catalogue
  2. Crawleable catalogue
  3. Local resources catalogue

In addition, we have worked to achieve as much harmonization with STAC as possible.

During this time, we have seen significant high profile uptake of OGC API - Records:

  • UN/WMO WIS2: 193 countries voted and approved to use OGC API - Records (Searchable catalogue API + Record core + GeoJSON) as the baseline for mission critical weather/climate/water data exchange
  • GeoConnections/Natural Resources Canada: will be deploying a geo.ca update with OGC API - Records as the API engine for discovery
  • EU/INSPIRE: is working on an OGC API - Records Good Practice
  • ESA/EOEPCA: is using OGC API - Records as the means to search for any resource as well as data granules

I see the discussion in this issue hits some very core underpinnings of metadata in OGC API. Given that one person's metadata is another person's data, as well as the reality of data granularity is something defined by a given SDI/activity, we need to ensure Local Resources discovery is generic enough so as not to block adoption for extension/specialization by downstream API standards.

Having said this, I think OGC API - Records could benefit from having the following parts:

  • Part 1: Searchable catalogue
  • Part 2: Crawleable catalogue
  • Part 3: Local resources catalogue

IMO, this could satisfy the critical path use case of superseding CSW and allow for fulsome discussion of Parts 2/3 as needed over time. We can visit the feasibility of this during the RFC period.

@kalxas
Copy link
Member

kalxas commented May 13, 2024

13-MAY-2024: @pvretano will add some informative text in the Local Resources Catalogue part to describe how the catalog embedded records object can be used to include records in endpoints like e.g. /processes.
The SWG meeting decided that we have consensus to move forward and close this issue after the edits.
The SWG meeting also discussed the possibility to split the specification into multiple parts and we decided to keep the specification as one (unless we face issues at the RFC stage)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment