DM-31366: Add confirmation request to butler prune-collection #561

n8pease · 2021-08-18T15:38:05Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

timj

Looks great. Only real comment is about reporting non-RUN collections that are being cleared out.

timj · 2021-08-18T15:40:47Z

python/lsst/daf/butler/cli/cmd/commands.py

+pruneCollection_willRemoveMsg = "The following collections will be removed:"
+pruneCollection_askContinueMsg = "Continue?"
+pruneCollection_didRemoveCollections = "Removed collections."
+pruneCollection_aborted = "Aborted."


What's the motivation for these being defined here and not embedded directly in the print messages below? I don't see them being used multiple times.

I think it promotes reuse in the future. I also think it's easier to proof read the user-facing text when it's in one place. If you want I can embed it though (?)

I see now that you did the same thing for pruneDatasets. I'm not sure it does promote reuse does it given the pruneCollection_ prefix and the fact that there also exists a pruneDatasets_askContinueMsg = "Continue?"? Wouldn't the case for reuse be significantly stronger if the "Continue?" variable was reused for both commands? Now there are two sets of these variables with no code reuse coming out of it. Can you at least assign the value of pruneCollection_askContinueMsg from the pruneDataset one?

The strongest argument for doing this would be if we were internationalizing the commands such that we could switch to Spanish based on the locale setting but I imagine there is some other infrastructure we should be using for that (gnu gettext?) Are you interested in I18N?

I don't think I want to use dialog snippets between commands, the need for subtle differences can become difficult (I've experienced this mostly in the shared option help text). But, it's an interesting point that both commands implement a verify-do/abort pattern that could be worth writing a separate shared implementation. Let's do it when we have a 3rd use case ;-). In the meantime I'll remove the separate definition here.
I crossed paths only a little with I18N ages ago when I was doing more automated testing work. I'm not specifically interested in it, but am also happy to do work that needs to get done.

timj · 2021-08-18T15:45:15Z

python/lsst/daf/butler/script/pruneCollection.py

+    confirm : `bool`
+        Get results for what would be removed and return the results for
+        display & confirmation, with a completion function to run after
+        confirmation.
    """


The return value needs to be documented. The confirm docs talk more about the return parameter than about the boolean itself.

timj · 2021-08-18T15:46:55Z

python/lsst/daf/butler/script/pruneCollection.py

+    def __init__(self, confirm):
+        # if `confirm == True`, will contain the astropy table describing data
+        # that will be removed.
+        self.removeTable = None


You are using type annotations elsewhere in this file so might be worth adding some here and to the pruneCollection definition below.

timj · 2021-08-18T15:54:23Z

python/lsst/daf/butler/script/pruneCollection.py

+            show_uri=False,
+        )
+
+        collections: Dict[str, int] = defaultdict(lambda: 0)


I think we may want to seed this dict with the output of registry.queryCollections() on that collection with includeChains because we want to be able to report that chain collections will be deleted as well as RUNs. If you prune a chained collection and that collection includes another chained collection I don't think it will turn up in the list at the moment (we should make it appear with ndatasets = 0 or something (or maybe saying "CHAIN" or "CALIBRATION"). Or else add in all the chain/calibration collections in at the end.

interesting, you're saying that butler.registry.queryDatasets will expand the first chained collection, but not subsequent chained collections?

No. I don't think I'm saying that at all. I'm saying that datasets are by definition only associated with RUN collections. You are therefore only reporting RUN collections in your summary even though there may be some CHAINED collections also being removed. I'm just asking that you report those chained collections.

@timj please have a look & make sure this does what you want now, thanks!

Perfect:

$ butler prune-collection ../pipelines_check/DATA_REPO_WITH_COMP/ demo_collection_exe Searching collections... The following collections will be removed: Collection Collection Type Number of Datasets ---------------------------- --------------- ------------------ demo_collection_exe CHAINED - demo_collection_exe/YYYYMMDD RUN 21 HSC/calib CALIBRATION - HSC/raw/all RUN 1 refcats RUN 2 Continue? [y/N]:

That's a complete report of the scary deletes that are about to happen.

timj · 2021-08-30T19:45:16Z

python/lsst/daf/butler/script/pruneCollection.py

-        raise TypeError(
-            f"Cannot prune {e.collectionType} collection {e.collectionType.name} with --purge.") from e
+    class CollectionInfo:
+        def __init__(self, count, type=None):


Can you please put a little bit of documentation in this class? In particular, type annotations. Note that if you turn mypy on for this file it really isn't going to be happy about self.count being an int or a string when code later on assumes that it will be just fine to increment it. Given that it's a private class you can get away with it but you'd probably want to write this differently once we go to mypy everywhere (in some future dream land).

Added docs & type annotations. Let me know if it looks ok? During development I temporarily turned off the disallow_untyped_defs exception for the scripts folder, and made sure there were no reported untyped defs for this file, so hopefully this file is totally taken care of now.

timj · 2021-08-31T16:20:54Z

python/lsst/daf/butler/script/pruneCollection.py

+    class CollectionInfo:
+        """Lightweight container to hold the type of collection and the number
+        of datasets in the collection if applicable."""
+        def __init__(self, count: Union[None, int], type: Union[str]):


You can write the first one as Optional[int] and why is the second one a union at all?

timj · 2021-08-31T16:21:38Z

python/lsst/daf/butler/script/pruneCollection.py

+        of datasets in the collection if applicable."""
+        def __init__(self, count: Union[None, int], type: Union[str]):
+            self.count = count
+            self.type: str = type


Does it need this type annotation if you are annotating the argument? You could simplify this by using a dataclass

The type annotation was vestigial, thanks for catching (too bad mypy does not warn for the duplicate decl).

I did not know about dataclass (and had lamented that namedtuple is (of course) immutable). Thanks for letting me know about that.

timj approved these changes Aug 18, 2021

View reviewed changes

n8pease force-pushed the tickets/DM-31366 branch from 749e31e to 9dc4d10 Compare August 30, 2021 17:39

timj reviewed Aug 30, 2021

View reviewed changes

n8pease force-pushed the tickets/DM-31366 branch from 9dc4d10 to f8f4825 Compare August 31, 2021 15:53

timj reviewed Aug 31, 2021

View reviewed changes

whitespace changes

70037da

n8pease force-pushed the tickets/DM-31366 branch from f8f4825 to 1c4257e Compare September 2, 2021 16:32

n8pease added 2 commits September 2, 2021 11:47

add confirmation step to 'butler prune-collection'

7f93694

add news fragment

67bb02b

n8pease force-pushed the tickets/DM-31366 branch from 1c4257e to 67bb02b Compare September 2, 2021 16:47

n8pease merged commit 4ccda24 into master Sep 2, 2021

n8pease deleted the tickets/DM-31366 branch September 2, 2021 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-31366: Add confirmation request to butler prune-collection #561

DM-31366: Add confirmation request to butler prune-collection #561

n8pease commented Aug 18, 2021 •

edited

timj left a comment

timj Aug 18, 2021

n8pease Aug 27, 2021

timj Aug 27, 2021

n8pease Aug 30, 2021

timj Aug 18, 2021

timj Aug 18, 2021

timj Aug 18, 2021

n8pease Aug 19, 2021

timj Aug 19, 2021

n8pease Aug 30, 2021

timj Aug 30, 2021

timj Aug 30, 2021

n8pease Aug 31, 2021

timj Aug 31, 2021

timj Aug 31, 2021

n8pease Sep 2, 2021 •

edited

DM-31366: Add confirmation request to butler prune-collection #561

DM-31366: Add confirmation request to butler prune-collection #561

Conversation

n8pease commented Aug 18, 2021 • edited

Checklist

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n8pease Sep 2, 2021 • edited

Choose a reason for hiding this comment

n8pease commented Aug 18, 2021 •

edited

n8pease Sep 2, 2021 •

edited