New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-35494 Allow execution butler creation to transfer datasets #195
Conversation
Codecov Report
@@ Coverage Diff @@
## main #195 +/- ##
==========================================
+ Coverage 82.42% 82.45% +0.03%
==========================================
Files 47 47
Lines 3750 3751 +1
Branches 675 675
==========================================
+ Hits 3091 3093 +2
+ Misses 481 480 -1
Partials 178 178
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that BPS is going to want to be a bit cleverer for this and we should be including an option to specify the execution butler datastore root. Writing all the data into the place where we also write the execution butler could cause confusion for when jobs copy the execution butler to each node. It's also going to be a bit odd that the execution butler is copied locally and all the jobs write their outputs to the original execution butler location. Decoupling the location of the execution butler from the location of the execution datastore seems like it should be an option. Please add a --execution-datastore-root
(or similar, bearing in mind that this option will have to exist for graph-backed butler to work for you cc/ @andy-slac ) option which can default to empty and imply it will go to the execution butler root if --transfer
is set. It's a shame that transfer=auto
can't trigger based on this datastore root override but then there would be no way to do what you want of copying the files into the execution butler tree.
@@ -415,6 +415,18 @@ | |||
), | |||
is_flag=True, | |||
) | |||
|
|||
transfer_option = MWOptionDecorator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that this duplicates lsst.data.butler.cli.opt.options.transfer_option
. Instead we could use the standard option and set a default of direct
and interpret that to mean no transfer. The option in daf_butler seems to imply that default=None
should work but I don't know what click will do with that. You can override the help by calling transfer_option(help="xyz")
@@ -88,6 +89,8 @@ def __init__(self) -> None: | |||
ctrlMpExecOpts.qgraph_dot_option(), | |||
ctrlMpExecOpts.save_execution_butler_option(), | |||
ctrlMpExecOpts.clobber_execution_butler_option(), | |||
ctrlMpExecOpts.datastore_root_option(), | |||
transfer_option(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you not wanting to override the help text?
@@ -415,6 +415,15 @@ | |||
), | |||
is_flag=True, | |||
) | |||
|
|||
datastore_root_option = MWOptionDecorator( | |||
"--datastore-root", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--target-datastore-root
?
@@ -135,6 +137,9 @@ def qgraph( # type: ignore | |||
QuantumGraph. | |||
clobber_execution_butler : `bool` | |||
It True overwrite existing execution butler files if present. | |||
transfer : `str` | |||
Transfer mode for execution butler creation. The default option, | |||
"none", means that no files will be transferred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is none
coming in as an option? It's not the available in the standard list. auto
will not quite work if the source and destination are both POSIX because it will try symlink before it tries copy if the target and source roots differ. That might be fine though since you are likely going to want an explicit "copy" here with a new root. "direct" would leave all the existing files where they are but let you write new files to the new root.
You could make "auto" the default if no root is specified and "copy" the default if a new root is given.
f547692
to
7f3a830
Compare
@@ -135,6 +137,12 @@ def qgraph( # type: ignore | |||
QuantumGraph. | |||
clobber_execution_butler : `bool` | |||
It True overwrite existing execution butler files if present. | |||
target_datastore_root : `str` or `None` | |||
URI location for the execution butler's datastore. | |||
transfer : `str` or `None` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is ever going to be None
is it?
eee35fb
to
9876a84
Compare
In some instances, it is helpful/important to transfer files into the execution butler (e.g., original butler datastore is on a slow disk). Added option: pipetask qgraph ... --target-datastore-root=<root> to allow transfer of files into a new datastore. Also added a --transfer option for fine-grained control of the transfer mode, which defaults to "copy" if there's a new datastore.
9876a84
to
2c3f865
Compare
Fixed a few types to make mypy happy: python/lsst/ctrl/mpexec/taskFactory.py:98: error: Config has no attribute connections python/lsst/ctrl/mpexec/taskFactory.py:112: error: Argument config to PipelineTask has incompatible type Config; expected Optional[PipelineTaskConfig] python/lsst/ctrl/mpexec/mock_task.py:121: error: Need type annotation for failCondition python/lsst/ctrl/mpexec/mock_task.py:130: error: Need type annotation for failException python/lsst/ctrl/mpexec/mock_task.py:165: error: PipelineTaskConfig has no attribute dataIdMatch python/lsst/ctrl/mpexec/mock_task.py:167: error: PipelineTaskConfig has no attribute failException
Requires lsst/pipe_base#261
Checklist
doc/changes