Multi-document yamls are inconvenient to use in ODC's cli tools #984

jeremyh · 2020-07-06T04:56:34Z

I tried to update a set of products recently that were stored in a multi-document yaml:

Running datacube product update products.yaml.gz threw an error because one of the products didn't exist yet in the datacube.
Running datacube product add products.yaml.gz failed because some of the products were different to those stored in the datacube (they need to be updated, not added!)

The only way I know to fix this is to split up the file temporarily and add them separately.

I wish I could tell it: "update ODC to match my whole file" without caring about whether they're updates or additions.

Examples of multi-doc usage are in DEA's metadata-types and products, and some datasets on the NCI.

All document commands have this inconvenience:

 datacube product add    
 datacube product update    

 datacube dataset add    
 datacube dataset update    

 datacube metadata add      
 datacube metadata update

(sourced from #983, cc @omad )

The text was updated successfully, but these errors were encountered:

Kirill888 · 2020-07-06T05:02:32Z

I think a more proper solution is to support

Selecting one part of multi-part document for both add/update
Treating each one as a separate invocation and have an option to continue on failure
Never just die with exception in user facing CLIs

Kirill888 · 2020-07-06T05:02:56Z

sorry wrong button press

Kirill888 · 2020-07-06T05:11:33Z

So you'd like "update or add" functionality? Or like "sync to this state if safe". Should it be

separate command (sync?)
option for add
option for update
option to both add and update
combined add and update into one and remove add|update

I think middle three are redundant and hence confusing, I would probably go with last one, but that requires changing CLI API in non-backward compatible ways.

omad · 2020-07-06T05:30:48Z

On YAMLs:

Multi-document YAMLs add significant complexity to several of our operations
They are useful for storing related resources together,
but, they can be confusing when editing or referring to since it's not obvious if or how many extra documents are in the same file.

Looking elsewhere for inspiration:

Kubernetes uses YAML for configuration, including multiple documents in one file, and even multiple record types in the same file
The kubectl apply command takes your suggested approach of make the database match what's in my file, regardless of whether it's adding/update, while also checking that it's a safe change.
If we're going to keep supporting multiple document YAMLs, I think we should support this operation. But I'm not sure what name to use, and I'm not sure if we should keep add and update.

WRT backward compatibility, we need to get over the fear of breaking things. So long as we do it in a sane way.

The obvious course here is to add new command sync or apply in one release, and mark add and update as deprecated, and then remove them in a future release.

Crazy Damien idea that should not be seen

Let's implement an ODC Kubernetes Operator, which you interact with to update an ODC Database via Kubernetes tools

jeremyh · 2020-07-06T05:40:00Z

I like apply.

I wouldn’t rush to remove the old ones as there’s lots of bash scripts, beginners-guides, power point presentations and tutorials that have them.

I do like the idea of supporting different record types in one file, like @omad’s example of Kubernetes. I’ve seen many people tripped up by calling the wrong command (datacube product add some-dataset.yaml etc), and not understanding why it failed. Datacube could just detect the type itself. One simple way to do this would be adding $schema field to product/metadata-types docs, similar to eo3 datasets.

jeremyh · 2020-07-06T05:52:36Z

Another option is not to remove the old ones entirely, but just make them aliases for the new command. In effect, retiring the more niche options/flags that they take. Beginner tutorials usually say datacube product add without any options.

(and I could imagine it occasionally still making a bash script clearer to write datacube product add somefile.yaml than datacube apply somefile.yaml if you have several separate steps. But that’s not a strong preference...)

Kirill888 · 2020-07-06T06:48:51Z

+1 for apply
not sure about datacube apply vs datacube product|metadata|dataset apply
do agree that having metadata and product in one yaml would be nice. Skipping non-product documents when calling datacube product apply is what I would expect, but we don't have type field, so there is no telling if this document meant to be metadata, product or dataset definition.

Kirill888 · 2020-07-06T06:53:39Z

we can of course implement guess_document_type(doc: Dict[str, Any]) -> Enum[Metadata|Product|Dataset|Unknown] but not sure how that would look exactly, or how robust it would be.

mpaget · 2020-07-06T08:18:42Z

For my 2 cents: I think datacube product|metadata|dataset apply is the right target. The concept of separate product and dataset yamls is still valid (fairly entrenched and not broken). Mixing two types into one doc (and then untangling them) seems unnecessarily duplicative and complicated. Less sure about the use cases for metadata and product types in one yaml, though.

Whereas datacube product|metadata|dataset apply just requires the add|update logic, possibly via an if in DB: update, else: add switch.

With regard to (a potential case for) mixing dataset and product types in one yaml:
Following my eo3 adventures the other week, I wrote an Assembler that ties the product.yaml name and measurements to the corresponding dataset.yaml elements, through metadata:name and note_measurements being driven by the product.yaml. Through our test cases so far, this effectively (and easily) enough ties the dataset.yaml to the right product.yaml - which may be a practical (if intermediatary) step towards closer ties between dataset and product rather than being in the same multi-doc yaml.

BTW: very happy to share the Assembler - adapted from eodataset3's - as I think it could add value to eo3 uptake in the ODC community. But that's a topic for another thread - Rob may speak to it in the next SC.

Kirill888 · 2020-07-07T00:23:37Z

@mpaget mixing dataset and product definitions in one yaml file is not something we are proposing. Although I can see it being useful in very few cases, something like a DEM stored in one giant file, so it would be one product one dataset sort of deal. Also this would not change auto-matching logic at all, one would still need to make sure that dataset and product definitions are "compatible", so metadata.product.name is still a thing you'll need to do even if everything is stored in one file (not recommended except, maybe in the case of single dataset products?).

stale · 2020-11-04T00:31:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-03-04T08:48:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Kirill888 closed this as completed Jul 6, 2020

Kirill888 reopened this Jul 6, 2020

stale bot added the wontfix label Nov 4, 2020

Kirill888 removed the wontfix label Nov 4, 2020

stale bot added the wontfix label Mar 4, 2021

Kirill888 added pinned and removed wontfix labels Mar 8, 2021

omad added usability api cli datacube cli labels May 23, 2023

SpacemanPaul added v2 and removed usability api labels May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-document yamls are inconvenient to use in ODC's cli tools #984

Multi-document yamls are inconvenient to use in ODC's cli tools #984

jeremyh commented Jul 6, 2020 •

edited

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020

omad commented Jul 6, 2020

jeremyh commented Jul 6, 2020 •

edited

jeremyh commented Jul 6, 2020 •

edited

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020 •

edited

mpaget commented Jul 6, 2020 •

edited

Kirill888 commented Jul 7, 2020

stale bot commented Nov 4, 2020

stale bot commented Mar 4, 2021

Multi-document yamls are inconvenient to use in ODC's cli tools #984

Multi-document yamls are inconvenient to use in ODC's cli tools #984

Comments

jeremyh commented Jul 6, 2020 • edited

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020

omad commented Jul 6, 2020

jeremyh commented Jul 6, 2020 • edited

jeremyh commented Jul 6, 2020 • edited

Kirill888 commented Jul 6, 2020

Kirill888 commented Jul 6, 2020 • edited

mpaget commented Jul 6, 2020 • edited

Kirill888 commented Jul 7, 2020

stale bot commented Nov 4, 2020

stale bot commented Mar 4, 2021

jeremyh commented Jul 6, 2020 •

edited

jeremyh commented Jul 6, 2020 •

edited

jeremyh commented Jul 6, 2020 •

edited

Kirill888 commented Jul 6, 2020 •

edited

mpaget commented Jul 6, 2020 •

edited