DM-43315: Implement additional collection chain operations #990

dhirving · 2024-04-04T23:00:23Z

Added additional collection chain methods to the top-level Butler interface: extend_collection_chain, remove_from_collection_chain, and redefine_collection_chain. These methods are all "atomic" functions that can safely be used concurrently from multiple processes.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

codecov · 2024-04-04T23:12:14Z

Codecov Report

Attention: Patch coverage is 98.20359% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 88.97%. Comparing base (20e04d2) to head (3ffbb0e).

Files	Patch %	Lines
python/lsst/daf/butler/script/collectionChain.py	91.66%	1 Missing and 1 partial ⚠️
...thon/lsst/daf/butler/registry/collections/_base.py	98.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #990      +/-   ##
==========================================
+ Coverage   88.95%   88.97%   +0.02%     
==========================================
  Files         341      341              
  Lines       44042    44123      +81     
  Branches     9069     9078       +9     
==========================================
+ Hits        39177    39260      +83     
+ Misses       3550     3549       -1     
+ Partials     1315     1314       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhirving · 2024-04-05T21:27:31Z

python/lsst/daf/butler/_butler.py

+    ) -> None:
+        """Add children to the end of a CHAINED collection.
+
+        If any of the children already existed in the chain, they will be moved


Moving existing children is less obviously useful/correct than it is for prepend. Not sure what would be better though, and this is at least consistent.

dhirving · 2024-04-05T21:33:00Z

python/lsst/daf/butler/registry/collections/_base.py

+        # registry.setCollectionChain(
+        #     parent,
+        #     registry.getCollectionChain(parent)
+        # )


Not sure whether me not doing anything to address this is "efficient" or merely "lazy". The most realistic scenario where this could happen is someone trying to use a collection chain as a queue... after around 32,000 prepends this will happen if you only remove from the end of the chain.

I should probably just write a migration script to make that column 32-bit.

Collection chains are very frequently LIFO stacks, but I can't think of any context in which one would be used as a FIFO queue. So I think that migration script can be very low priority.

python/lsst/daf/butler/_butler.py

TallJimbo · 2024-04-09T15:43:01Z

python/lsst/daf/butler/_butler.py

@@ -1770,3 +1803,69 @@ def prepend_collection_chain(
        transactions short.
        """
        raise NotImplementedError()
+
+    @abstractmethod
+    def extend_collection_chain(


I'm not sure which I prefer, but I'll throw it out there as food for thought: what about merging the prepend_, extend_, redefine_, and remove_from_ methods into a single modify_ method with a mode: Literal["prepend", "extend", "remove", "set"] argument? That's closer to the CLI and it slightly reduces the number of Butler methods.

I also had that thought... but I'm concerned we're going to eventually add options that only apply to some subset of the modes. Then we'll end up in the situation we're at with a lot of the other Butler methods where it takes multiple pages of documentation/checks to explain which sets of parameters are legal together.

One other idea I had was maybe naming them collection_chain_* instead of *_collection_chain, so they group together a little better in the docs. But the rest of the butler methods are verb-first.

python/lsst/daf/butler/_butler.py

TallJimbo · 2024-04-09T15:49:23Z

python/lsst/daf/butler/registry/collections/_base.py

+        # registry.setCollectionChain(
+        #     parent,
+        #     registry.getCollectionChain(parent)
+        # )


Collection chains are very frequently LIFO stacks, but I can't think of any context in which one would be used as a FIFO queue. So I think that migration script can be very low priority.

TallJimbo · 2024-04-09T16:13:57Z

python/lsst/daf/butler/registry/collections/_base.py

+
+        child_records = self.resolve_wildcard(
+            CollectionWildcard.from_names(child_collection_names), flatten_chains=False
+        )


It seems a little wasteful to query for child rows both here and in the sanity check, though I don't know if that's likely to matter.

But more importantly, it seems like we need to move these queries after the row-level locks. As is, couldn't another process modify the chain in between the queries for its contents and the point at which we acquire the lock? Or is it that the atomic operations are precisely those that don't use the results of those queries? If that's the case, I think some comments would help.

It seems a little wasteful to query for child rows both here and in the sanity check, though I don't know if that's likely to matter.

It's definitely wasteful but I think not to an extent that is worth worrying about... in any case it's no slower than the previous version because all of this was re-used from the existing implementation.

The sanity check one is restricted to only type==CHAINED and is recursive so it's sort of a different query. I think the way these two queries are done will change up some more when we add the recursive query thing to resolve_wildcard... all we need from the second query is the IDs corresponding to the names and it does more work than necessary right now.

As is, couldn't another process modify the chain in between the queries for its contents and the point at which we acquire the lock?

These aren't queries for the chain contents -- they're queries about the user-provided list of new child collections, and do not involve the parent collection.

The lock is only on the parent collection. So moving these inside the lock wouldn't buy us anything and would make the critical section longer.

For the child name -> ID query, the only concurrent change that affects anything is deleting one of the children (or delete and replace with a new collection of the same name). In those cases the foreign key constraint will prevent us from inserting a child that existed at the time we looked up its ID but no longer exists.

The sanity check thing is hairier, and as I mentioned in the comment does not guarantee that cycles cannot exist. To guarantee no cycles we'd have to recursively lock all child chained collections as well. The locking order is tricky to get right and locking multiple collections makes the availability issues with this approach worse. It's isomorphic to the locking for the denormalized, flattened chain table we discussed the other week. (That table would actually be a good way to enforce this... we could have a check constraint that parent != child.) But in any case just locking the parent doesn't help any -- this query wants to know if the parent is a recursive child of any of the children. Locking the parent only prevents its children from changing and we don't care about the parent's children.

I think some comments would help.

Yeah I'll add a bit more info here.

Reshuffle which parts of the logic are executed in setCollectionChain vs update_chain to make update_chain more similar to prepend_collection_chain.

Factor out the common pieces of update_chain and prepend_collection_chain to a context manager method.

Group all of the top-level collection chain methods together

This makes it somewhat more concurrency safe -- at least now it won't nuke any concurrent changes by doing a complete re-write of the collection. Also added a nicer error message for out-of-bound indexes.

To complete the new collection chain modification interface for Butler, add a function to replace setCollectionChain.

Co-authored-by: Jim Bosch <jbosch@astro.princeton.edu>

dhirving force-pushed the tickets/DM-43315 branch from bca6e94 to 7d96ed7 Compare April 5, 2024 21:24

dhirving commented Apr 5, 2024

View reviewed changes

dhirving marked this pull request as ready for review April 5, 2024 21:40

TallJimbo approved these changes Apr 9, 2024

View reviewed changes

dhirving and others added 12 commits April 9, 2024 11:21

Make update_chain consistent with prepend

3a3101d

Reshuffle which parts of the logic are executed in setCollectionChain vs update_chain to make update_chain more similar to prepend_collection_chain.

Deduplicate collection modification

f8cfbc9

Factor out the common pieces of update_chain and prepend_collection_chain to a context manager method.

Re-organize collection chain methods

f110440

Group all of the top-level collection chain methods together

Pull out reusable code to test collection chain

a0e4779

Implement atomic collection chain remove

139bf42

Use atomic remove for collection chain CLI

b65fc4e

Use atomic remove for collection pop CLI

4d1f138

This makes it somewhat more concurrency safe -- at least now it won't nuke any concurrent changes by doing a complete re-write of the collection. Also added a nicer error message for out-of-bound indexes.

Add atomic collection chain extend

be68211

Add redefine chain method to Butler

2baf65a

To complete the new collection chain modification interface for Butler, add a function to replace setCollectionChain.

Add towncrier

fc0d98d

Fix type specifiers for sphinx

314d7fb

Co-authored-by: Jim Bosch <jbosch@astro.princeton.edu>

Add some comments to clarify chain modification

3ffbb0e

dhirving force-pushed the tickets/DM-43315 branch from 0d6769e to 3ffbb0e Compare April 9, 2024 18:21

dhirving merged commit 39b4492 into main Apr 9, 2024
18 checks passed

dhirving deleted the tickets/DM-43315 branch April 9, 2024 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-43315: Implement additional collection chain operations #990

DM-43315: Implement additional collection chain operations #990

dhirving commented Apr 4, 2024 •

edited

codecov bot commented Apr 4, 2024 •

edited

dhirving Apr 5, 2024

dhirving Apr 5, 2024

TallJimbo Apr 9, 2024

TallJimbo Apr 9, 2024

dhirving Apr 9, 2024 •

edited

TallJimbo Apr 9, 2024

TallJimbo Apr 9, 2024

dhirving Apr 9, 2024 •

edited

DM-43315: Implement additional collection chain operations #990

DM-43315: Implement additional collection chain operations #990

Conversation

dhirving commented Apr 4, 2024 • edited

Checklist

codecov bot commented Apr 4, 2024 • edited

Codecov Report

dhirving Apr 5, 2024

Choose a reason for hiding this comment

dhirving Apr 5, 2024

Choose a reason for hiding this comment

TallJimbo Apr 9, 2024

Choose a reason for hiding this comment

TallJimbo Apr 9, 2024

Choose a reason for hiding this comment

dhirving Apr 9, 2024 • edited

Choose a reason for hiding this comment

TallJimbo Apr 9, 2024

Choose a reason for hiding this comment

TallJimbo Apr 9, 2024

Choose a reason for hiding this comment

dhirving Apr 9, 2024 • edited

Choose a reason for hiding this comment

dhirving commented Apr 4, 2024 •

edited

codecov bot commented Apr 4, 2024 •

edited

dhirving Apr 9, 2024 •

edited

dhirving Apr 9, 2024 •

edited