-
Notifications
You must be signed in to change notification settings - Fork 0
DM-39655: Use a list of "fallback" pipelines in Prompt Processing #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The TODO proposed enforcing that there be only one coadd type in the central repo. Since the repo will be /repo/embargo in practice, such enforcement will not be possible.
This change removes the requirement that the output run be part of a long-lived chain in the local repo, allowing more flexibility in how the output run is determined in the future.
This change makes all use of the output run collection explicit.
This change does not affect MiddlewareInterface, which was only using the run explicitly, but it does break several tests that assumed the Butler always had a run.
The class now stores the datestamp on construction, and generates the run names on demand. This change gives the same behavior, but will be much easier to adapt to a design where potentially multiple runs are attempted.
The output chain, which combined an output run, raw inputs, and a defaults that excluded raws, was originally based on the ap_verify dataset organization, which in turn imitated the collection style created by pipetask. However, a distinction between raw and default collections will not be needed in production, and the pipetask layout is difficult to reconcile with a workflow that may use a different run for every visit. The output runs were removed from the "output" collection on e38bda3. This commit removes the output collection itself, replacing it with direct use of the defaults collection. make_local_repo and prep_butler now guarantee that the run collection is part of the default chain in the dev environment. The removal of the <instrument>/prompt chain is a behavioral change, though one that should only affect the local repo and not the central repo.
Moving the choice of pipeline to the caller makes it possible to prepare a choice of pipelines depending on procedural rather than object state.
Moving the choice of pipeline to the caller makes it possible to get the run(s) corresponding to any or all pipelines that it may need.
The test actually triggers (and tests for) a failure to define visits; quantum graph generation is not even attempted before the looked-for exception is raised.
Pipeline execution now loops over the list of available pipelines, until it finds one that can be run. All other operations can simply be applied to all pipelines.
This involves renaming the get_pipeline_file method to get_pipeline_files, and replacing None return values with an empty list. I've not tried to support more generic sequences in the spec, since it's much harder to test and unlikely to be needed in an internal class.
This is a breaking change to the pipeline config format, which now always maps visit parameters to lists of files rather than single files.
Since runs and configs are generated for *all* pipelines before attempting graph generation, and runs are named based on the pipeline file, pipelines with the same name would interfere with each other. Allowing different subsets of a pipeline (using # notation) should be safe, but I'm disallowing it to avoid confusion over exactly what got run.
These pipelines can be used for visits that aren't compatible with the full AP pipeline, or when faster processing is desired.
in_collections=output_run, | ||
) | ||
if exports: | ||
exported = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one pipeline yields outputs and the outputs are exported, is there a scenario that another pipeline has anything to be exported?
I'm thinking if we still need to go to the next item of the for loop after exported
is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect more than one collection to contain outputs, but I'm not confident that that will always be true in the presence of failures or future code changes (e.g., doing pre-execution as a separate step, or expanding fallback to more situations). This way seems more robust.
``(survey="<survey>")=[<pipelines>]``. The zero or more pipelines | ||
are comma-delimited, and each pipeline path may contain environment | ||
variables. The list may be replaced by the keyword "None" to mean | ||
no pipeline should be run. No key or value may contain the "=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In slaclab/rubin-usdf-prompt-processing@537d212 I see you replaced all the None
's with []
for those surveys with no pipelines to run. From this PR it looks like either way should work. As I misunderstanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's correct -- the replacement of None
with []
was not strictly necessary. See the commit message for 537d212. 🙂
"see previous logs for details.") | ||
continue | ||
# Past this point, partial execution creates datasets. | ||
# Don't retry -- either fail (raise) or break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that for now, the only scenario to fallback is a failure in making the quantum graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least for now. Do you know of any other scenarios where this is a good strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got another one that might be relevant here: quantum graph builds fine, but the needed datasets aren't really available so it fails. For example, the needed refcats aren't in the central repo in the searched collections. It can show as something like
FileNotFoundError: Not enough datasets (0) found for non-optional connection
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this happens only with PrerequisiteInput
? Or does it also affect templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like only with PrerequisiteInput
https://github.com/lsst/pipe_base/blob/main/python/lsst/pipe/base/connections.py#L852-L855
I haven't traced down the middleware enough to understand whether the qgraph would be built successfully in this case (I don't know when adjustQuantum
happens). When I encountered this error, I don't think the PP fallback pipeline ran so I'd guess the qgraph was built it seems the qgraph failed to build. But it's not generating an empty qgraph which is the case the PP fallback pipeline is designed for. [I'm not sure of anything..]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like, despite the docstring, adjustQuantum
is actually run right before the quantum is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion on Slack! I've filed DM-41596 to address this case.
This PR adds support for multiple pipelines that may be attempted for the same visit, in sequence. This required a significant reorganization of
MiddlewareInterface
, moving decisions that depend on the choice of pipeline to as late in the workflow as possible. However, the new code has fewer implicit dependencies, making it more robust to future changes.