DM-38210: Use butler.get() rather than deprecated getDirect() #227

timj · 2023-03-03T18:03:30Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

pruneCollections was deprecated and removed.

timj · 2023-03-30T20:55:53Z

python/lsst/ctrl/mpexec/singleQuantumExecutor.py

@@ -582,7 +582,7 @@ def writeMetadata(
                    ref = ref.unresolved()
                self.butler.put(metadata, ref)


@TallJimbo this fails if we don't unresolve the ref because of this test code for replace run in test_cmdLineFwk:

# re-run with --replace-run (--inputs is ignored, as long as it hasn't # changed) args.replace_run = True args.output_run = "output/run2" fwk.runPipeline(copy.deepcopy(qgraph), taskFactory, args)

and you get a resolved ref for run output/run1 being put into output/run2 and it failing because it wants to put it into output/run1.

@andy-slac Interestingly we also get another failure in the simplest possible test:

qgraph = fwk.makeGraph(self.pipeline, args) # run whole thing fwk.runPipeline(qgraph, taskFactory, args)

where the problem is:

'output/20230330T204714Z' != 'output/20230330T204715Z'

with error: No collection with name 'output/20230330T204714Z' found. which makes me wonder how we are generating two different versions of that timestamped output run.

Of course that second test failure with the timestamp is random because it depends on timing. I can also get testSimpleQGraphClobberOutputs to fail with the same timestamp mismatch as testSimpleQGraph. (it's the _metadata dataset each time)

If we don't unresolve refs then in runPipeline we should probably take output run from QuantumGraph. But I'm not sure how this will interact with all other options.

As I say on Slack, recalculating the graph each time in the --replace-run test does work. This is going to have to be required unless --replace-run calls some kind of method on the graph to recalculate all the output dataset refs with a new run (and then you lose provenance in the graph that is on disk).

Now that put accepts a DatasetRef without question, we can no longer reuse the same graph when we change the output run.

TallJimbo

I expected to see some changes that would prevent the --replace-run problem from happening in real usage (not just tests). Did I just misunderstand whether that was necessary?

If not, could you just have it raise if you pass --replace-run while providing an existing QG?

timj · 2023-03-31T19:17:40Z

I deliberately left the behavior the same (ie unresolving the DatasetRef with full butler). This allows replace-run to work as before. If I change this code to raise if the ref run and butler run are different that's a change in behavior that I assumed we wanted to avoid until @andy-slac tackled the removal of unresolved dataset ref (and is wrapped up in my comment on the jira ticket as future work).

TallJimbo · 2023-03-31T20:42:40Z

Ah, sounds good - so we don't need to change anything now, but we will in order to drop unresolved refs.

timj · 2023-03-31T22:54:46Z

tests/test_cmdLineFwk.py

@@ -762,7 +761,8 @@ def testSimpleQGraphReplaceRun(self):
        # changed)
        args.replace_run = True
        args.output_run = "output/run2"
-        fwk.runPipeline(copy.deepcopy(qgraph), taskFactory, args)
+        qgraph = fwk.makeGraph(self.pipeline, args)


I made this change to demonstrate the fix but I wonder if we are punting to fixing this properly with unresolved ref removal that maybe I should not include this change here. @andy-slac would you rather this test failed once you remove unresolved refs or are you happy that this is the right fix for replace-run testing anyhow and so should keep this fix?

I'm OK with this change. I guess once we have the ability to rewrite output run in a graph we will add a new unit test for that.

timj added 3 commits March 3, 2023 10:59

Use butler.get() rather than deprecated getDirect()

a4d777f

Use Butler.put instead of now deprecated putDirect

eb60108

Use the safer and more explicit removeRuns API

d712735

pruneCollections was deprecated and removed.

timj commented Mar 30, 2023

View reviewed changes

Replace-run test requires we recreate the graph now

e6a9c8b

Now that put accepts a DatasetRef without question, we can no longer reuse the same graph when we change the output run.

TallJimbo approved these changes Mar 31, 2023

View reviewed changes

timj commented Mar 31, 2023

View reviewed changes

timj merged commit 70ca830 into main Apr 3, 2023
9 checks passed

timj deleted the tickets/DM-38210 branch April 3, 2023 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38210: Use butler.get() rather than deprecated getDirect() #227

DM-38210: Use butler.get() rather than deprecated getDirect() #227

timj commented Mar 3, 2023 •

edited

timj Mar 30, 2023

timj Mar 30, 2023 •

edited

andy-slac Mar 30, 2023

timj Mar 30, 2023

TallJimbo left a comment

timj commented Mar 31, 2023

TallJimbo commented Mar 31, 2023

timj Mar 31, 2023

andy-slac Mar 31, 2023

		@@ -582,7 +582,7 @@ def writeMetadata(
		ref = ref.unresolved()
		self.butler.put(metadata, ref)

DM-38210: Use butler.get() rather than deprecated getDirect() #227

DM-38210: Use butler.get() rather than deprecated getDirect() #227

Conversation

timj commented Mar 3, 2023 • edited

Checklist

timj Mar 30, 2023

Choose a reason for hiding this comment

timj Mar 30, 2023 • edited

Choose a reason for hiding this comment

andy-slac Mar 30, 2023

Choose a reason for hiding this comment

timj Mar 30, 2023

Choose a reason for hiding this comment

TallJimbo left a comment

Choose a reason for hiding this comment

timj commented Mar 31, 2023

TallJimbo commented Mar 31, 2023

timj Mar 31, 2023

Choose a reason for hiding this comment

andy-slac Mar 31, 2023

Choose a reason for hiding this comment

timj commented Mar 3, 2023 •

edited

timj Mar 30, 2023 •

edited