-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relation between collection-level metadata and STAC #4
Comments
There was a good bit of discussion in the Slack: @calebrob6 wrote:
I wrote:
@andyjenkinson wrote:
@cholmes wrote:
I wrote:
|
An OGC Features API call response is a FeatureCollection anyway, is it not? Plus, the API contract makes single features always part of a hierarchical resource that contains the ID of the collection: If you only had a single Feature object in a separate file it would similarly be decoupled from any STAC collection anyway (this was my comment about backlinks from the feature to the collection). It seems to be that the description of a collection (if indeed there even needs to be one as a mandatory ingredient?) can be expressed in multiple forms: as an OGC collection, as a FeatureCollection object or a STAC Collection. Of these, the FeatureCollection is the least deviant from the minimum dependency (i.e. just GeoJSON). |
/collections/.../items is a FeatureCollection, /collections/.../items/itemId is a Feature.
Yes, until you download/extract/... individual Features. Then your relation to the Collection is gone.
Indeed, we don't necessarily need the Collection to be a STAC Collection. STAC adds a bit of overhead, but also gives us the ecosystem and extension support. Otherwise we need to do that again in fiboa. I'm relatively neutral on which path to go.
I think yes, we need a place to specify the fiboa version and extensions. Also it's a good place to expose "global" data such as license, provider, etc.
What about GeoParquet? |
Does the question about GeoParquet not also apply to a STAC collection? Whether the metadata appears in a STAC collection or e GeoJSON feature collection would have no effect on GeoParquet would it? Both of them would need to be mapped. Regarding a collection property on a feature, is it an array? Because as I mentioned, a feature that is not presented in context of a collection (ie a FeatureCollection or discovered via a parent collection of some other kind such as in an API) can be part of multiple collections. |
So the collection metadata would be living inside the GeoParquet file in the metadata. For a FeatureCollection, it would be a bit weird. Either it would be an empty Feature Collection (+ metadata) or it would be a duplication of all the Features (+ metadata), although we just need the metadata. The container format is just weird in the GeoParquet context. So another container for the metadata might be better. I started with STAC to have something we don't need to define ourselves at the beginning. The STAC Collection (or whetever else) could be embedded into the FeatureCollection, too. {
"id": "FeatureCollection",
"features": [...],
"collection": {
"fiboa_version": "0.1.0",
...
}
} The Collection Object is also embedded in GeoParquet, but for GeoJSON Features it probably always lives externally and we should probably explain implementors how to connect them...
No, it's a string (the collection ID). Multiple collection would lead to a potential conflict in the metadata, e.g. differences in versions or extensions. I'd like to avoid that and only allow a single Collection as responsible parent (although it could be part of multiple collections). But as always, this can all be discussed and changed, of course. |
Ok here's what I'm trying to say: Having a JSON collection object only makes sense for JSON features in the first place, Geoparquet has an entirely different structure. So its relevance here is about how to convert from one to the other - you need to get the Collection metadata from somewhere to embed into the parquet header? The geoparquet issue you described for FeatureCollection (ie needing to have an empty collection) is the same as the one I raised for STAC (an empty STAC collection), the only difference is that the actual features are already in GeoJSON format so typically it's not an empty FeatureCollection at all, it's the FeatureCollection containing the actual FIBOA features. It's weird to not take this collection-level metadata from the same file as the features themselves using the object in the GeoJSON spec that's designed for this very purpose, and then force you to provide another file following a different spec that is designed for containing a different type of object (an asset, not a vector). The reason this is particularly important is because a pure GeoJSON implementation is much neater, and makes standard GeoJSON files and the OGC Features API specification natively compatible with FIBOA - you simply ensure the required properties are included in your existing data and now you have a FIBOA implementation. You also don't need to add a back-reference collection ID inside each one of millions of Feature objects, because it's already specified in the parent FeatureCollection inside the same file/API resource. This potential for adapting existing data is a great opportunity to make FIBOA very easy to implement that didn't really exist with STAC. You can make your existing API endpoints and data distributions FIBOA compliant without standing up new separate endpoints. As soon as you force a STAC collection to exist you break that, and I see no good reason for it - just copy the syntax of the subset of STAC collection metadata you need into FIBOA and you're done. Conversion to Geoparquet is very easy - it's one input file (or if you like, a merge of any number of files whose collection IDs are the same). Now, the only instances where this "FeatureCollection in the same file" solution for providing collection-level metadata doesn't work are those where a GeoJSON Feature is only available serialised into a file with a single feature, ie out of context of any FeatureCollection. I'm not aware of any distributions that do that but maybe it's a valid use case. Your options here are therefore one of:
Basically, I don't see any reason to force creating a second JSON file when the data is already in a FeatureCollection, but if you really don't have any FeatureCollection even though all the features are GeoJSON then you need to solve two problems - a second file to hold it, and a link inside every single feature to reference it (or some other convention about how to autonomously find it like a specific filename at the root of a directory where the feature json files are...). For this extra file in fact any JSON file would do, it doesn't have to be STAC or FeatureCollection but you could design it so it's allowed to be if the author wants (ie a STAC collection can serve also as a FIBOA collection, and so can a FeatureCollection). All of the properties are FIBOA-specified anyway, it's just right now you're doing it 'by proxy'. You want it to have an ID, version etc so just say so directly rather than saying "I require a STAC collection, because that spec requires it to have an ID". |
Also to be clear, the featurecollection IS the collection, it does not need to contain one like in your example. The collection is an object and it has a member called "features", which holds the actual boundary objects. This payload is typically used to represent API resources that are themselves collections (like /collections/foo/items which gives BOTH the collection AND the items in its response) Just make fiboa_version a property of the collection and you're done. |
It feels like we are misunderstanding each other. Potentially better to discuss this in the fiboa call? Anyway, I'll try to clarify below. Generally, I'm happy to have global/collection-level metadata in the FeatureCollection.
It felt better to have the collection properties clearly separated, also makes it easier in conversion between formats, I believe. But there's not a big difference. So any of the following work for me, no strong preference from my side: (1) fiboa Collection combined with a JSON FeatureCollection: {
"type": "FeatureCollection",
"features": [...],
"fiboa_version": "0.1.0",
"fiboa_extensions": "0.1.0",
"license": "CC-0",
} and/or (2) fibao Collection inside GeoParquet: fiboa as JSON FeatureCollection (i.e. remove GeoJSON properties): {
"fiboa_version": "0.1.0",
"fiboa_extensions": "0.1.0",
"license": "CC-0",
...
} and/or (3) STAC Collection integrated into a JSON FeatueCollection: {
"type": "FeatureCollection",
"features": [...],
"collection": {
"stac_version": "1.0.0",
"type": "Collection",
"fiboa_version": "0.1.0",
"fiboa_extensions": "0.1.0",
"license": "CC-0",
...
}
} You can't combine STAC Collections and JSON FeatureCollections into a single object though because the type property conflicts (type: Collection in STAC, type: FeatureCollection in GeoJSON). In this case you need the separation as pointed out in variant 3. The advantage of a STAC Collection is to have the pre-defined fields and ecosystem. The disadvantge is probably the added complexity. We can discuss this with the group, as I said, I'm pretty much happy with all of the variants.
I don't agree, we embed data that is valid for into the GeoParquet metadata, similar to what GeoParquet does with its geo-releated metadata. This is used to explain and validate the GeoParquet file, e.g. define the fiboa version, add the list of extensions, and provide additional metadata that you don't want to repeat in every single row. For example license, provider etc.
We should clarify that. If we don't need individual feature, we can disallow that and enforce FeatureCollections always. Make life simpler, indeed.
Indeed, currently the tooling asks you to provide a JSON file that contains the collection metadata during GeoParquet creation.
What is an empty STAC Collection?
That's already the case as far as I know.
I don't get it. I've never asked to implement separate endpoints?!
Isn't that what I've proposed before and again in the examples above? |
Let's make a breakout meeting for this discussion for sometime in the next few weeks. |
I've created a proposal for this in PR #21, maybe this can already be accepted as a compromise.
Would love to hear feedback. |
Originally posted by @andyjenkinson in #21 (comment) |
I think it's not a good idea as then it's not clear (especially for individual features) which properties are collection-level and which are not. It would require a definitive set of fields in the files, which I think we don't want to aim for. For example, OGC API - Features adds additional properties to the FeatureCollection, which are not collection level metadata (numberMatched, numberReqturned, pagination links). Just moving around a single object is much simpler when migrating between file formats for example. Anyway, you can also link to an external file in an OGC API compliant way and then you don't need to embed them in a fiboa-specific object.
I think we should recommend one standard for metadata. I'm happy to discuss which that might be, whether it's STAC, Dublin Core, OGC APIs, DCAT or whatever. I've paved the way allow this by just requiring the fiboa_version and fiboa_extensions fields. Everything else is right now open to implementors. But as we should guide users to something for now, I recommended STAC. But if discussions across fiboa participants lead to something else, happy to switch. I'd say open an issue and propose a different standard for collection-level metadata to start the discussion...
Isn't that already possible with this proposal as long as you can link to an external collection that includes the two required fiboa fields? Look at the individual-features example. There it's just license, not fiboa -> license...
I'd need to look at the specific APIs above, but this proposal is OGC API - Features compliant AFAIK. Any pointers where I can find documentation about the other APIs?
STAC didn't start from a green field either ;-) OpenSearch, OGC CSW, ISO 19115, even Dublin Core was in the discussion.
Is that feasible at all if the (non-Collection) metadata is not already fully aligned? Right now you'd need to align the features itself and add the two fiboa_* properties or a link to something includes these two fields. |
I don't really understand your comments as all the discussion is about collection level metadata and you're giving examples of other collection level metadata defined by OGC API, but then saying it is non-collection metadata. I don't see how it is possible to confuse these things. Everything expressed at the root of the FeatureCollection is about the collection, and everything inside a feature is about the feature. A stac collection is compatible with a OGC Features API collection - it's expressly stated to be so in the STAC spec. That's all I'm advocating - to take the same approach here. An OGC Features API collection is a GeoJSON featurecollection, these are not separate concepts - it has an id, title, description etc which, coincidentally, are also part of the STAC collection object schema. It just has also a "features" member containing all the individual features. STAC doesn't put its properties in a separate "stac" object inside the collection, it just defines the properties at the root of the object - some of which are the same ones defined by the OGC spec (id, title). I don't see why we can't call it a FIBOA collection and take the same approach, just don't choose properties that clash with OGC Features API, reuse them, and prefix anything that's expressly only applicable to FIBOA itself like you already are (fiboa_version). Job done. FIBOA doesn't need to directly depend on STAC or refer to a second separate collection, it just needs to define a JSON document with fields that are largely the same as STAC, whilst also being compatible with an OGC API FeatureCollecrion like STAC is. No external files. If you do r want to mandate that a FiBOA collection is a FeatureCollection fair enough, you'll need to include a reference to a separate collection inside every feature but that's no problem. I'm also confused by the suggestion of referencing additional files which is the exact thing that isn't already part of the existing APIs I'm suggesting to try to be compatible with - you have to create a new endpoint just to provide some other file to describe the same thing you're already describing - the (feature)collection. Since we, and anyone else implementing GeoJSON FeatureCollections containing our boundaries (including OGC Features API) already have this collection object implemented, we can just add a few properties to it - just like we add them to the Feature part of the specification (and we don't do that in a special "fiboa" property, we just add them). It just seems like we are totally missing each other and don't actually have a common understanding of what the problem even being solved is and the basic objects in the specification (like, what even IS a FIBOA collection of it's not a collection of FIBOA features). If you want to find the documentation for the GFID API it's in the data survey, I think digifarm's is too but not sure. I think that we can quite easily make our API (specifically the GET /boundaries and GET /boundary-references) FIBOA-compliant without adding a separate collection concept or 'file' to contain it. The FeatureCollection would just have two or three extra properties. The download exports are in the same format so again, can already be compliant as a pure FeatureCollection. We can also easily make an OGC Features API using exactly the same payload structure with some further properties without any clashes, and if we wanted to we could make downloads that split the collection and features into a separate collection JSON file with the same exact metadata as the featurecollection alongside millions of other individual JSON Feature files. |
Indeed. It feels like it would be more time-efficient to talk about this in one of the next fiboa call so that we can clarify individual questions and misunderstandings directly with examples. I really want to get us on the same page here. It might be that we disagree in certain parts, but I think we are actually not as far apart.... :-) |
Results from the discussion yesterday: No relation with STAC (removed any mentions in the spec, see fbb0b76), we can re-use properties if we see a fit and they are scalar, but that applies to all existing standards, not just STAC. We generally try to keep property values simple (i.e. scalars), which e.g for providers in STAC is not the case (array of objects), so it's not a good fit. For provider we'll create an extension. We'll only allow one value generally unless there's a common usecase to provide multiple usecases. For provider for example we don't necessarily see a need. The general discussion around collection level properties vs. feature properties will be held in fiboa/schema#3 and #26. |
Currently the spec requires collection-level metadata (such as the version and extensions) to be a STAC Collection.
We need to discuss whether this is a good idea. It predefines a couple of fields for us that we don't need to care about any longer, but also requires a temporal extent for example.
Additionally, the current wording of the spec requires to embed the STAC Collection in GeoParquet file-level metadata and in the GeoJSON FeatureCollection. The aim is to keep files complete without requiring an external dependency (except for extensions?).
The embedding feels a bit weird to me, so I recommend to also provide the STAC Collections separately with an asset pointing to the GeoParquet / GeoJSON FeatureCollection.
cc @cholmes
The text was updated successfully, but these errors were encountered: