-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List non-common fields in Collection (Summaries) #413
Comments
I think this is useful. I've always seen a Collection as a definition for the Items that are in it. So a user should be able to look at a Collection and get all the info they need to know about the Items without having to get Items to look at the structure. I think a name like "available_properties" is better than "varying" though, and don't think it needs a prefix. |
Regarding the name: I understand "available_properties" as "all properties available", but I currently only list the properties that are not actually listed with fixed values in "properties", so that's why I chose varying. But I'm really open for a new name, because varying was just a synonym I googled for uncommon, which felt even worse. (Non-native speaker issues, I guess)... Happy to remove the prefix. Edit: |
How about |
Just spent a good 15 minutes thinking about this... It seems to me like the ideal answer might be to have just a 'properties' object, that can contain:
So you could just have:
Though then I'd maybe want to rename it 'properties_definition' instead of 'properties'. That obviously doesn't work as an extension. If we do want to start as an extension (which in general I think is a good idea), then I think I actually like the varying:properties original suggestion the best (or variable:properties which is what I thought up before seeing the original). other_properties also works for me. I don't like the name of the extension as 'non-common' though. I'd call it 'varying properties definition extension' if we go with varying:varying. If we go with other_properties then I'd probably call it the 'other properties in collections' extension or something like that. Like I think the name of the field we use should be the same or else quite close to the name of the extension (and both should describe as best they can what the extension is about). |
Regarding the naming: I have released the openEO API based on the draft we have currently so it's not baked in there for some time and we'll try it out in the next months to see how it works. So for now we are using other_properties, but I'm not quite happy with it. It doesn't describe the field very well. varying_properties or variable_properties would be much more intuitive, I guess. So I'd be happy to revert the change. Based on the field name we can then find a name for the extension. @cholmes I had similar ideas when I was drafting the extensions. It feels better to have them all on the same level and I would love to define it that way, but I scrapped it as it has some major drawbacks:
|
Hmmm... yeah, those are some major drawbacks. I do like the idea of a lightweight way to say what properties are in collection. But the other route is to define a fuller item definition. Of course when we say 'item schema', then it seems to beg the question of why not just use JSON Schema? It feels to me that the 'heavyweight' solution would be that a collection can/should reference the JSON Schema that is valid for the collection. Like how the boundless server generates the schema for its collections - https://stac.boundlessgeo.io/stac/schema/landsat-8-l1 A static catalog would just make that a static reference. It'd be the combination of the extension schemas, that would fully describe all the fields. In JSON Schema I imagine you can also specify the 'commons' constraints - like just say that the "eo:gsd" is 10, instead of any number. I think the common properties would still make sense to have, as to me the purpose of that is to reduce the repeating fields in each item. But when the purpose is to let clients know what to query against then maybe the heavier JSON Schema definition makes sense? |
I'm personally not interested in an Item Schema and so I guess that term was not correct. What I tried to achieve here is probably better called an Item Summary. What values does it offer in items. Give a summary about the content, either a range or the values I can expect and query for. I fear that JSON Schema makes things very complicated as it allows a whole lot of stuff. Of course, it offers enum, minimum and maximum for what we currently describe as extent and range, but do we really want complex schemas with anyOf, not, etc pp in the collection? That's not easily parseable and readable. |
Yeah, I hear you on json schema getting complicated. I just fear that it's a slippery slope from item summary to item schema. Like someone will want to express anyOf / not in their summary, and we'll reinvent json schema. Alternatively I suppose we could say that only simple schemas are allowed? Like just use a subset of JSON Schema. I'm in no way set on JSON Schema for this, and like I said, I do see a space for a lightweight thing, an item summary. Do you feel the summary needs to be inside the collection json? It could potentially be cleaner to just have an Item Summary as a separate file, that has all the properties (common and varying). Or I suppose you could have Item Summary in the collection, but have it include all the properties. I guess the crux of the issue is whether it's better to not repeat, leveraging the common properties as the way to define part of the Item Summary, or it's better to have a clean Item Summary that lists all the values, with some duplication of the common fields. I think I lean a bit to the latter (but likely could be convinced otherwise), that an Item Summary extension should be stand alone, and should define all that is needed, doing its thing instead of trying to mix purposes. |
Both in Data Cube and the Non-Commons extension I would only strictly only allow extents and a set of values for Summaries. Summaries should be easy to consume. If somebody want to define a full Item Schema, that should be a different extension and that could then just refer to JSON Schema. Yes, I need the Summary in the Collections. I need an easy way that users can look up what they can query for. That's what I designed the extension for. Whether they are merged with the common properties or not, is a design decision we can discuss. I see pros and cons for both. Probably depends highly on how we want the commons extension to work. Or we duplicate stuff, as you explained. In this case, I could drop the commons extension completely for openEO, as we could just specify everything as Summary ;-) Adding JSON Schema or specifying things in a separate file would mean that I'd not use the extension and just use what we have now as proprietary extension specifically designed for openEO. It's just that I need what we have now. We can discuss details like names and improve smaller bits or add a new field, but I don't want to invent a big allrounder extension for Item Schemas here. |
Cool. I'm definitely happy for a lightweight extension in collections. And yes, I figured that openEO dropping the commons extension would be the logical outcome, and I think that may be ok. Thanks for humoring me - I just want to be sure that we clearly define the scope, as I can quite easily see someone coming and asking for just a little bit more in the Summary. And then that happens two or three times and suddenly we're making our own schema language. Let's talk through the pros and cons of aligning with commons in the next working session. |
Dropping the commons extension for collection-only STACs would be a logical next step as currently we don't have any other place to describe the collections further with EO/SAR/... fields. But then we have two places to look for the data, currently it is just one place, which is easier to implement. That's quite a tough decision we shouldn't take lightly and discuss in a bigger round. I'll have an eye on not reinventing the wheel (JSON Schema). |
We discussed this on the call (with @matthewhanson @hgs-msmith @jbants @joshfix @mojodna ) and agreed the best way forward would be to make the 'Item Summary' extension, which handles both common and non-common properties. And accept some duplication with the 'commons in core' functionality of 'merging'. We didn't go deep into the having 'two places to look for data', but my reaction is that they are for different purposes. The summary provides what fields to query on, and the 'properties' (commons in core) provides the fields to 'merge' in. I also am not sure if there will be huge overlap, as I'd see static catalogs going for the commons stuff, and dynamic catalogs going for the Item Summary. In a dynamic one the server will do the merge. And in the static one there's less people needing a 'summary' to figure out what to query, since there's no querying. |
I've been perusing through old issues and came across this one: And it occurred to me that defining the extensions used in the collection solves this problem, as it defines what properties would be in use. If my collection uses the
Knowing which extensions are used in a Collection can definitely be useful, the question is if you had that info would you still feel you would need a summary of properties in the Items? |
There have been a couple references to an extensions property for the spec along the lines of @m-mohr raised a good point about links to specific extensions. If someone defined a custom
|
We could also just put them in the
If you defined a custom extension you would need to provide a link to it. |
I feel we are drifting a bit off here and discussing also things that should be discussed in #278, also because my initial use case changed slightly. I wrote in the first post:
After implementing this, I realized that it is in fact quite useful to list the actual values or extents, but it's not easy for non-primitive data types. Additionally, querying in openEO terms means to query using openEO processes for processing decisions, not querying against the STAC search API. Also, we don't necessarily have (public) Items, but we still need to query against fields that can't be moved to the collections (commons extension) as they are not common. I'm all in to link to solve #278 (may it be schemas or something different), but I still need the Summary extension #416. For me it could also just be named "Allow all fields in collections so that collections are more useful standalone Extension" ;-) but it seemed like others have a similar need for querying against the API so we could just make it more universally usable by calling it Item Summary extension (or so). Now we need to make sure whether we need to split forces für #413 and #278 or not. I could also just make #416 a proprietary openEO extension if that seems to make more sense and you could work on #278 by defining how to link schemas. Regarding linking schemas: We shouldn't link to GitHub, but make more permanent links such as https://www.stac.cloud/schemas/v0.6.2/extensions/eo.json or so. The master branch changes with every version. Any maybe this is getting a bit too complicated with schemas? Not all fields in the schema may be available? This extension aimed towards giving the user easy access to what to expect behind a collection, the schema is very rough and a users would usually just read the written spec instead. Not sure which client software would parse the actual schemas and make sense of them. Edit: Oh, I missed @cholmes comment. Have you spoken about a name for the field? I'm fine with also allowing common fields as summaries. We still need to figure out how to encode that well. Edit 2: Having item summaries would solve issues such as #216 to some extent for collections, too. |
I'm just trying to update the PR and it is not too easy as I'm wearing two hats here.
Writing the extension spec to suite both needs is not so easy. For example the field name. For (1) There are more issues here, but I'd like to discuss it in a call instead of writing a whole paper here about it. Seems like a bigger strategic decision, which is also influenced by discussions regarding the common metadata model, extensions supported, commons extension etc. |
Discussed on sprint, see notes: https://docs.google.com/document/d/1evZHrn1kOdLTIOFaJ2_Z3G3C7MwB8N-ORG8xKCtpAD0/edit I'll come up with a PR. |
Merged to dev. |
In openEO we need to list also the non-common fields in Collections, i.e. all properties that are available in the Items but have different values. We don't necessarily need to list the actual values (or extents), but we need to inform the user what he can query against. This is probably similar to what we plan to add for the assets as "asset definition" or "asset schema". So a collection with an "item schema" could something as simple as:
I need to define this anyway for openEO in the next days, so the question I have is whether that could be something that is useful also for other users so we can combine efforts and standardize. Otherwise I'd probably start of as a proprietary extension.
This could be something that may also be useful for STAC API users that don't want to look into Items to find out what they could query against.
The text was updated successfully, but these errors were encountered: