DM-20845: Support re-run of pipetask on the same output collection #25

andy-slac · 2019-09-17T17:43:39Z

With --skip-existing option we now also support skipping Quanta at run time so that the same QGraph can be executed again after unfinished previous attempt and it will effectively restart execution from the point where it stopped previously. Just as before --skip-existing during QGraph generation means skipping Quanta whose outputs already exist.

New option --clobber-output can be used to override datasets that exist in output collection:

when generating QGraph all existing outputs are ignored (including regular outputs and initOutputs),
when executing QGraph the existing outputs are removed prior to Quantum execution.

Adds code to build simple quantum graph with a trivial task and mock Butler, but real in-memory sqlite registry. New unit test to execute that graph and check resulting task outputs.

TallJimbo · 2019-09-18T16:21:19Z

python/lsst/ctrl/mpexec/mpGraphExecutor.py

@@ -125,7 +129,7 @@ def _executeQuantaMP(self, iterable, butler, taskFactory):

            # Add it to the pool and remember its result
            _LOG.debug("Sumbitting %s", qdata)
-            args = (taskDef.taskClass, taskDef.config, qdata.quantum, butler, taskFactory)
+            args = (taskDef.taskClass, taskDef.config, qdata.quantum, butler, taskFactory, self.skipExisting)


This is getting to be a lot of positional arguments. Maybe pass as kwargs instead (and maybe even require some/all of them to be kwargs in the signature)?

I changed the method to accept kwargs only.

TallJimbo · 2019-09-18T16:25:02Z

python/lsst/ctrl/mpexec/preExecInit.py

+        types against the types of objects produced by tasks. Ideally we
+        would like to check that object data is identical too but presently
+        there is no generic way to compare objects. In the future we can
+        potentially introduce some extensible mechanism for that.


I'm now wondering if we should just explicitly avoid this whole problem with initOutputs and say that you have to use --skip-init-writes if you want to use --skip-existing when running (but maybe not when just building a graph).

Potentially we can enable --skip-init-writes ourselves. The real question is do we want to verify that existing initOutputs in the butler are compatible with what pipeline expects. I think there is some usefulness in that, even if we cannot currently verify object contents for all types of objects. It may also be better to verify that initOutputs exist when we do --skip-init-writes to avoid crashes downstream.

I was thinking that forcing the user to pass --skip-init-writes explicitly was a way of emphasizing to them that no checking would be done and hence they were responsible for guaranteeing consistency.

TallJimbo · 2019-09-18T16:26:23Z

python/lsst/ctrl/mpexec/singleQuantumExecutor.py

@@ -67,6 +70,9 @@ def execute(self, taskClass, config, quantum):
            Single Quantum instance.
        """
        self.setupLogging(taskClass, config, quantum)
+        if self.skipExisting and self.quantumOutputsExist(quantum):
+            _LOG.info("Quantum execution skipped due to existing outputs.")


It would be good to include quantum.dataId and quantum.taskName in this log message.

TallJimbo · 2019-09-18T16:34:42Z

python/lsst/ctrl/mpexec/preExecInit.py

+                    if ref is not None:
+                        # It is not enough to remove dataset from collection,
+                        # it has to be removed from butler too.
+                        self.butler.remove(ref)


Actually deleting the dataset is probably okay if it was originally added to only this collection. But it's actually a big problem if it's already part of some other collection, and it wouldn't even be necessary if it was originally part of some other collection. Of course, the usual case is the one you're guarding against, where not deleting it will definitely cause problems, so what you have here is probably the best thing to do right now. But it suggests to me that we really cannot let templates depend only on collection and not run, and when we fix that we should only disassociate (and maybe then "garbage collect" unassociated datasets) here.

TallJimbo · 2019-09-18T16:41:41Z

python/lsst/ctrl/mpexec/preExecInit.py

                    # check if it is there already
                    _LOG.debug("Retrieving InitOutputs for task=%s key=%s dsTypeName=%s",
                               task, name, attribute.name)
                    objFromStore = self.butler.get(attribute.name, {})
                    if objFromStore is not None:
-                        # types are supposed to be identical
+                        # Types are supposed to be identical.
+                        # TODO: Check that object contets is identical too.


Typo: contents

TallJimbo · 2019-09-18T16:48:13Z

tests/testUtil.py

@@ -207,6 +220,15 @@ def makeSimpleQGraph(nQuanta=5, pipeline=None):
    pipeline : `~lsst.pipe.base.Pipeline`
        If `None` then one-task pipeline is made with `AddTask` and
        default `AddTaskConfig`.
+    butler : `~lsst.daf.butler.Butler`, optional
+        Data butler instance, this should be an instance retruned from a


typo: returned

This option allows one to restart execution after a failure in the middle of graph execution (and after fixing an issue). It skips writing initOutputs that already exist. It also skips execution of all quanta that already have their output data in output collection.

andy-slac added 4 commits September 17, 2019 01:15

Update example code for changes in pipelines API

8b36294

Add actual handling of --profile option

c1f7eca

Remove unused method from util module

48b3933

Extend unit test to support QGraph execution.

93d4f2f

Adds code to build simple quantum graph with a trivial task and mock Butler, but real in-memory sqlite registry. New unit test to execute that graph and check resulting task outputs.

andy-slac force-pushed the tickets/DM-20845 branch from 02eafa4 to 6d012c6 Compare September 17, 2019 17:46

TallJimbo approved these changes Sep 18, 2019

View reviewed changes

andy-slac force-pushed the tickets/DM-20845 branch from bb086f9 to db2f9e1 Compare September 18, 2019 20:59

andy-slac added 2 commits September 18, 2019 16:07

Add support for --clobber-output option

11362ae

Switch _executePipelineTask to keyword-only args

feda286

andy-slac force-pushed the tickets/DM-20845 branch from db2f9e1 to feda286 Compare September 18, 2019 21:07

andy-slac merged commit ae73b45 into master Sep 18, 2019

timj deleted the tickets/DM-20845 branch April 23, 2020 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-20845: Support re-run of pipetask on the same output collection #25

DM-20845: Support re-run of pipetask on the same output collection #25

andy-slac commented Sep 17, 2019

TallJimbo Sep 18, 2019

andy-slac Sep 18, 2019

TallJimbo Sep 18, 2019

andy-slac Sep 18, 2019

TallJimbo Sep 18, 2019

TallJimbo Sep 18, 2019

TallJimbo Sep 18, 2019

TallJimbo Sep 18, 2019

TallJimbo Sep 18, 2019

DM-20845: Support re-run of pipetask on the same output collection #25

DM-20845: Support re-run of pipetask on the same output collection #25

Conversation

andy-slac commented Sep 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment