Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-35494 Allow execution butler creation to transfer datasets #195

Merged
merged 2 commits into from Jul 14, 2022

Conversation

PaulPrice
Copy link
Contributor

@PaulPrice PaulPrice commented Jul 8, 2022

Requires lsst/pipe_base#261

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@PaulPrice PaulPrice requested a review from timj July 8, 2022 02:47
@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

Merging #195 (d6a2db6) into main (269e72b) will increase coverage by 0.03%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main     #195      +/-   ##
==========================================
+ Coverage   82.42%   82.45%   +0.03%     
==========================================
  Files          47       47              
  Lines        3750     3751       +1     
  Branches      675      675              
==========================================
+ Hits         3091     3093       +2     
+ Misses        481      480       -1     
  Partials      178      178              
Impacted Files Coverage Δ
python/lsst/ctrl/mpexec/cli/script/qgraph.py 40.00% <ø> (ø)
python/lsst/ctrl/mpexec/cmdLineFwk.py 61.93% <ø> (ø)
python/lsst/ctrl/mpexec/taskFactory.py 87.09% <0.00%> (+2.72%) ⬆️
python/lsst/ctrl/mpexec/cli/opt/optionGroups.py 100.00% <100.00%> (ø)
python/lsst/ctrl/mpexec/cli/opt/options.py 100.00% <100.00%> (ø)
python/lsst/ctrl/mpexec/mock_task.py 88.09% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269e72b...d6a2db6. Read the comment docs.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that BPS is going to want to be a bit cleverer for this and we should be including an option to specify the execution butler datastore root. Writing all the data into the place where we also write the execution butler could cause confusion for when jobs copy the execution butler to each node. It's also going to be a bit odd that the execution butler is copied locally and all the jobs write their outputs to the original execution butler location. Decoupling the location of the execution butler from the location of the execution datastore seems like it should be an option. Please add a --execution-datastore-root (or similar, bearing in mind that this option will have to exist for graph-backed butler to work for you cc/ @andy-slac ) option which can default to empty and imply it will go to the execution butler root if --transfer is set. It's a shame that transfer=auto can't trigger based on this datastore root override but then there would be no way to do what you want of copying the files into the execution butler tree.

@@ -415,6 +415,18 @@
),
is_flag=True,
)

transfer_option = MWOptionDecorator(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that this duplicates lsst.data.butler.cli.opt.options.transfer_option. Instead we could use the standard option and set a default of direct and interpret that to mean no transfer. The option in daf_butler seems to imply that default=None should work but I don't know what click will do with that. You can override the help by calling transfer_option(help="xyz")

@@ -88,6 +89,8 @@ def __init__(self) -> None:
ctrlMpExecOpts.qgraph_dot_option(),
ctrlMpExecOpts.save_execution_butler_option(),
ctrlMpExecOpts.clobber_execution_butler_option(),
ctrlMpExecOpts.datastore_root_option(),
transfer_option(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not wanting to override the help text?

@@ -415,6 +415,15 @@
),
is_flag=True,
)

datastore_root_option = MWOptionDecorator(
"--datastore-root",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--target-datastore-root?

@@ -135,6 +137,9 @@ def qgraph( # type: ignore
QuantumGraph.
clobber_execution_butler : `bool`
It True overwrite existing execution butler files if present.
transfer : `str`
Transfer mode for execution butler creation. The default option,
"none", means that no files will be transferred.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is none coming in as an option? It's not the available in the standard list. auto will not quite work if the source and destination are both POSIX because it will try symlink before it tries copy if the target and source roots differ. That might be fine though since you are likely going to want an explicit "copy" here with a new root. "direct" would leave all the existing files where they are but let you write new files to the new root.

You could make "auto" the default if no root is specified and "copy" the default if a new root is given.

@@ -135,6 +137,12 @@ def qgraph( # type: ignore
QuantumGraph.
clobber_execution_butler : `bool`
It True overwrite existing execution butler files if present.
target_datastore_root : `str` or `None`
URI location for the execution butler's datastore.
transfer : `str` or `None`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is ever going to be None is it?

@PaulPrice PaulPrice force-pushed the tickets/DM-35494 branch 2 times, most recently from eee35fb to 9876a84 Compare July 14, 2022 18:11
In some instances, it is helpful/important to transfer files
into the execution butler (e.g., original butler datastore is
on a slow disk). Added option:
    pipetask qgraph ... --target-datastore-root=<root>
to allow transfer of files into a new datastore. Also added
a --transfer option for fine-grained control of the transfer
mode, which defaults to "copy" if there's a new datastore.
Fixed a few types to make mypy happy:
python/lsst/ctrl/mpexec/taskFactory.py:98: error: Config has no attribute connections
python/lsst/ctrl/mpexec/taskFactory.py:112: error: Argument config to PipelineTask has incompatible type Config; expected Optional[PipelineTaskConfig]
python/lsst/ctrl/mpexec/mock_task.py:121: error: Need type annotation for failCondition
python/lsst/ctrl/mpexec/mock_task.py:130: error: Need type annotation for failException
python/lsst/ctrl/mpexec/mock_task.py:165: error: PipelineTaskConfig has no attribute dataIdMatch
python/lsst/ctrl/mpexec/mock_task.py:167: error: PipelineTaskConfig has no attribute failException
@PaulPrice PaulPrice merged commit 1ad71ff into main Jul 14, 2022
@PaulPrice PaulPrice deleted the tickets/DM-35494 branch July 14, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants