Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove link type #290

Merged
merged 20 commits into from
Mar 25, 2021
Merged

Conversation

matthewhanson
Copy link
Member

@matthewhanson matthewhanson commented Mar 18, 2021

Related Issue(s): #
#286 , also resolves #285

Description:
This PR removes the link_type field and the LinkType class, along with the need to loop through all links to set them as RELATIVE or ABSOLUTE.
Instead, when a link href is requested it returns an absolute link unless all the following are true:

  • The link is a hierarchical link ('root', 'child', 'parent', 'item')
  • The link has an owner (i.e., it is part of a STAC object and not a standalong Link instance)
  • The owner has a root catalog
  • The root catalog.catalog_type is CatalogType.RELATIVE_PUBLISHED

When all are true the link relative to the owner is returned.

There are some additional implications and related changes:

  • Catalog.catalog_type is now always defined, never None. It defaults to ABSOLUTE_PUBLISHED
  • A user can just set Catalog.catalog_type = Catalog.Type.X...subsequent saves will generate relative links for all saved objects
  • Specifying CatalogType to normalize_and_save and save is no longer needed, in fact it should be deprecated. It's been left in for backward compatibility. The catalog_type of the root should be favored.
  • An implication/assumption is that unless you are working with a stand-alone Item or Collection, there will always be a root catalog.
  • Fixed Remove link_types #286 so that now you can seamlessly save any node in a tree which will save it from that point down

I've not yet updated tests, instead I'm putting this WIP PR up for discussion since it's a big internal change. Afterward I can update tests

PR Checklist:

  • Code is formatted (run scripts/format)
  • Tests pass (run scripts/test)
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

@lossyrob
Copy link
Member

Discussed this with Matt in a video chat; thinking through these and related changes, I think this will be a big improvement and the assumptions are sound. 👍

@matthewhanson
Copy link
Member Author

In updating the tests I came across a test case that is using a relative link for a hierarchical link, which would no longer be supported:
https://github.com/stac-utils/pystac/blob/develop/tests/data-files/label/label-example-1.json#L97

While moving assets along with the metadata might be a common case (and relative links for assets still work), I don't think using relative hrefs in a link should be a recommended practice. Relative links for an asset that's in the same location as the metadata makes sense, but a source link for labels points to another Item which could be in some random part of the tree and having a relative link to that is rather awkward.

I'm fine with not supporting this use case right now. Alternately we could just keep all non-hierarchical links as is when renormalizing....since a renormalization is typically going to be over the whole catalog and the relative link will still be valid.
I want to avoid trying to programmatically update relative links when renormalizing, for one it means we'll need to move back to using a link_type, but also because I think there's too many things that could go wrong that in practice this is not a valuable feature.

@lossyrob

@matthewhanson matthewhanson marked this pull request as ready for review March 19, 2021 18:03
@lossyrob
Copy link
Member

@matthewhanson using relative links to go to different parts of a Catalog tree is necessary for self-contained ML datasets. Otherwise you'll have to use absolute links to link a Label Item to it's source imagery, and then you can no longer copy the catalog around and have it be valid. I think this is a use case we need to support.

This is the sort of use case that causes cycles in the graph that is otherwise a tree - an annoying one to support but one I think is worth utilizing if we consider STAC to be something that can allow referencing of Items in a deeper way then parent->child (e.g., a derived from link).

Is there a way to avoid Link Type and also allow for these types of relative links? Where does the problem? I don't follow where in the normalization process this breaks things.

@matthewhanson
Copy link
Member Author

Good point, basically we don't want to convert relative non-hierarchical links to absolute URLs.

If we just leave relative links the way they are, if you normalize_hrefs those relative links will still be valid.

It's a problem if you wanted to move the item from one place in the catalog to another, but that wouldn't work now would it? You would use remove_item and add_item, but unless you manually update the relative link it won't be correct in it's new location.

Or if you remove the item linked to or move it, but again, that also would cause the same issue currently.

@matthewhanson
Copy link
Member Author

Leaving the links alone actually is the same way that relative assets are treated, so I think we should keep them consistent.

@lossyrob
Copy link
Member

If you are normalizing HREFs, it will resolve links recursively via resolve_links, which uses _object_links, which can be contributed to by extensions, a la the label extension specifying that the "source" link is an object link that should be resolved. So for example you are normalizing a catalog ROOT that has two subcatalogs IMAGES and LABELS, and LABELS has a single item LabelItem that has the "label" extension, and IMAGES has a single item SourceItem that is just an image.

One way the traversal could happen is with LABELS going first:

  • ROOT calls LABELS.normalize_hrefs
  • LABELS resolves the child item link calls LabelItem.normalize_hrefs
  • This causes the "source" link to get resolved
  • ROOT calls IMAGES.normalize_hrefs
  • IMAGES resolves the child item link for SourceItem - which was previously resolved and so pulls from the cache. This item and the "source" linked item are the same item. SourceItem.normalize_href is called, which modifies the source item HREF, and that is reflected in both the LabelItem's link and the SourceItem's self href.

Another traversal reads IMAGES first, but similarly resolves SourceItem, and when the "source" link of the LabelItem is read, it resolves to the cached item, and they are still refering to the same object.

It seems like your changes continue to resolve the object links, so I think the above would still apply here? The link for any of the _object_link links should not be updated by the linking object, because those are pending modification (or are already resolved and modified) by the normalize_href call in another part of the catalog.

I might be confusing myself, and others with this, but at least I think the "objects can get resolved via non-hierarchical links" is applicable to your idea of using resovled-ness as a flag determining whether or not to save something

@matthewhanson
Copy link
Member Author

I've updated this to handle all links defined by extensions _object_links.
This appears to emulate the previous behavior, except that CatalogType is not required to be specified on save. Instead the root catalog of any STAC object determines relative vs absolute based on it's catalog_type field.

Copy link
Member

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I made a change to a tutorial for adding a custom extension just to add a note that the _object_links() for an extension also influence if those links will be relative or absolute based on catalog type.

Merge away!

🎉

@codecov-io
Copy link

codecov-io commented Mar 25, 2021

Codecov Report

Merging #290 (e18d4de) into develop (5a492ca) will decrease coverage by 0.03%.
The diff coverage is 98.03%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #290      +/-   ##
===========================================
- Coverage    93.68%   93.65%   -0.04%     
===========================================
  Files           33       33              
  Lines         4057     4004      -53     
===========================================
- Hits          3801     3750      -51     
+ Misses         256      254       -2     
Impacted Files Coverage Δ
pystac/extensions/base.py 84.21% <ø> (ø)
pystac/catalog.py 95.46% <93.75%> (-0.21%) ⬇️
pystac/__init__.py 100.00% <100.00%> (ø)
pystac/item.py 97.91% <100.00%> (-0.29%) ⬇️
pystac/link.py 98.07% <100.00%> (+0.64%) ⬆️
pystac/stac_object.py 96.23% <100.00%> (-0.35%) ⬇️
pystac/extensions/scientific.py 98.02% <0.00%> (+0.65%) ⬆️
pystac/extensions/single_file_stac.py 94.02% <0.00%> (+1.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a492ca...e18d4de. Read the comment docs.

@lossyrob
Copy link
Member

We should probably cut a new release with this change and the coupe of fixes - the last 1.0.0-beta.2 release. Want to set up a time to pair on that so there's two of us who can release?

@matthewhanson
Copy link
Member Author

@lossyrob Let's plan for Monday to release, I've got another PR (much smaller) to add self hrefs to add_items and add_child #284 and possibly another to optimize generate_subcatalogs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove link_types Saving a branch of a catalog with Normalized hrefs saves a self link
3 participants