New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25385 make a pipetask command based on click #58
Conversation
c67f6f6
to
443052a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay but I'd like @andy-slac to take a look.
|
||
class show_option: # noqa: N801 | ||
|
||
defaultHelp = ("Dump various info to standard output. Possible items are: `config', " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be better as a """
quoted string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timj as far as I can tell doing it that way puts the newlines and whitespace (if each new line does not begin at column 0) in the value of the string (is there a way to prevent that?). Click does it's own text wrapping for help strings and depends on console width. The extra whitespace messes that up. I guess I could format the """
str for narrowest expected console, but I'm not sure that would be a real win over just doing it this way & letting the wookie win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's disappointing. The help text is very hard to read in that context. The alternative is to use the triple-quoted string and run it through a helper function that replaces every newline with a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that and textwrap.dedent
FTW
_ACTION_ADD_INSTRUMENT) | ||
|
||
|
||
def build(config_file=[], config=[], delete=[], instrument=[], order_pipeline=None, pipeline_dot=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python advises against using mutables as defaults because those mutables can be changed by this function and that would change the defaults to everyone. Since you are only using these for iteration you can use a tuple as a default value.
The pipeline instance that was built. | ||
""" | ||
|
||
class Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cute. We stuff random properties into an empty class? Couldn't we use a simple dataclass here so that we can do some kind of checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wanting to avoid managing the opaque args
data structure much/any. Of course it's the idiom that comes from the argparse impl, IMHO it should be replaced entirely by member vars to the function or a smaller bunch of classes that contain related data members. I was hoping to avoid making that big of a change right now. But, doing one or the other is probably the Right Thing to do. I'll ponder, it might be worth a discussion on Slack... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know any of the history (so this trick was used in the implementation by @andy-slac and you migrated it over?). At the very least a comment like you just made in the class description for Args would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the args
object with a bunch of parser-defined properties is what you get from python's argparse
module, e.g. the top example at https://docs.python.org/3/library/argparse.html. It's a common pattern (anti-pattern IMHO) to then pass around that object to your impl functions and let them pluck the members they need out of it.
I can at least add a comment 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to get rid of the argparse.Namespace
thing at some point (that is the class that is currently passed around and it does the same thing as Args
). I was hoping to get more input from other user about how pipetask API should look like and refactor things accordingly but I do not see any users that need that API, everyone seems to be happy with pipetask CLI 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some fixes needed, please check comments:
- options/actions that build pipeline should be executed in the order in which they appear on command line
- unit test for click should be separate from unit test for framework API
- small bunch of docstring and help/metavar comments
bin.src/pipetask2
Outdated
|
||
from lsst.ctrl.mpexec.cli.pipetask import main | ||
|
||
if __name__ == '__main__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double quotes for all strings.
def __call__(self, f): | ||
return click.option("--order-pipeline", | ||
required=self.required, | ||
help=self.help)(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a boolean flag, should it have is_flag=True
parameter?
def __call__(self, f): | ||
return click.option("-p", "--pipeline", | ||
required=self.required, | ||
help=self.help)(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metavar="PATH"
? Or type=click.Path(...)
?
return click.option("--pipeline-dot", | ||
help=self.help, | ||
required=self.required, | ||
type=click.Path(exists=True, writable=True))(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does exists=True
mean it has to exist? This is for output only, it does not need to exist, but will be overwritten if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does mean that. I'll fix.
def __call__(self, f): | ||
return click.option("-s", "--save-pipeline", | ||
help=self.help, | ||
required=self.required)(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is an output file, either give it metavar
or type=Path
?
|
||
|
||
@click.command(cls=PipetaskCLI, context_settings=dict(help_option_names=["-h", "--help"])) | ||
@log_level_option() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess adding option here means that you have to do:
pipetask2 --log-level=DEBUG build ....
but cannot do
pipetask2 build --log-level=DEBUG ....
for current pipetask
we intentionally define all options for each sub-command (so you can do latter but not former). Wt should keep the same behavior for pipetask2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. It was a deliberate decision in butler to move the log level before the subcommand. butler and pipetask should agree. Why do we require that --log-level
comes after? We can change our minds on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipetask initially implemented common options to appear before the command and command-specific options after the command (much like git
and other tools) but that caused confusion and I had to make it uniform. Announcement for this was here: https://community.lsst.org/t/pipetask-command-line-interface-changes/3923, I don't remember where complaints about initial command line were, on slack or some ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to agree on how butler and pipetask works. @TallJimbo do you have a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that other tools with subcommands (certainly including git
- at least the way I always use it) allow options that are not specific to the subcommand to come either before or after, and if we can do that with Click, there's no problem.
I believe argparse
requires such options to always go before the subcommand, which is a UI disaster in this case: it requires the user to remember whether some option is specific to a subcommand, even though it's also used by (i.e. "specific to"), say, 4 of the 5 other subcommands, and hence it's really demanding that the user know the interface of all of the subcommands they aren't trying to call. Telling argparse
that all options are specific to subcommands and hence must go after the subcommand was a workaround for this: it's at least not hard to remember "all options go after the subcommand".
Do we know if Click has the same behavior as argparse
? If so, I think we have no choice but to do the same workaround, even if it means fighting Click, unless we can make the case that some options are so completely universal that it's easy to remember that they always must go before the subcommand. But I hope it doesn't come to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to tell Click to allow an option in either place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to tell Click to allow an option in either place?
Kind of: we can allow the command (pipetask
) and subcommands (e.g. build
) to both take an option. If those options need to be mutually exclusive then we have to hand roll that in implementation.
But, I had thought of --log-level
and butler --help
as being specific to the butler
(or pipetask
) command. butler --help
definitely is, and we do support butler <subcommand> --help
. As you'd expect, the later results in totally different output from the former. butler --help
will prevent any further subcommands from running (it prints help and exits). The logging setting will affect logging from butler
execution as well as execution of the subcommand it calls, it didn't seem like we needed subcommands to specify their own help level.
I don't think it makes sense for some subcommand options to go before the subcommand on the CLI and some to go after it. When there are common subcommands we define each in a shareable way, and it is very simple to add a shared option to the subcommand that accepts it, making it so that all options can go after the subcommand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if every one else thinks it should be butler command --log-level
then change it to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow options that are not specific to the subcommand to come either before or after, and if we can do that with Click, there's no problem.
Click does not seem to want to support e.g. butler create ~/foo --log-level DEBUG
in an automatic sort of way, without making --log-level
an option of create
. There might be a way to code around this but if possible I think we should not fight it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm definitely not proposing allowing subcommand-specific options before the subcommand (just the converse). And I agree --help
currently works the way it should.
--log-level
is one of the few things I might imagine being fine to require before the subcommand, as it could be sufficiently global. I'm not sure it actually is that global, and that's a question of whether logging control for task execution is really the same as logging control for butler (and really whether it's just one logging control for everything). If it is, fine - but if not we may need to actually split this up into e.g. --butler-log-level LEVEL
or --task-log-level LEVEL
or something more general like --log-level NAME=LEVEL
, before we think about what goes before or after the subcommand.
args.pipeline_actions.append(_ACTION_CONFIG(c)) | ||
|
||
for c in config_file: | ||
args.pipeline_actions.append(_ACTION_CONFIG_FILE(c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a big issue here. The -c
and -C
actions have to be executed in the order in which they appear on the command line, i.e.:
pipetask build -C override1.py -c param1=val1 -C override2.py -c param2=val2
is not the same as
pipetask build -c param1=val1 -c param2=val2 -C override1.py -C override2.py
I think you new parser implicitly re-orders those options to look like latter case, but we have to keep the relative order of -c/-C actions.
The same applies to other actions, i.e.:
pipetask build ... --delete task --task task
is not the same as
pipetask build ... --task task --delete task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pondering that this might be a serious click inhibitor...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it should be possible to do same sort of smart trick in click, similar to what was done in argparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively we could drop all these options that build pipelines and force everyone to do YAML. I do not know how many people are sill using -t/-c/-C but I remember there was a talk that YAML is our future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something for @natelust or @MichelleGower to comment on I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can capture the command-line inputs to the command. It's not built into the api but I can subclass, the Click authors claim it's designed and supported to be extensible this way.
The pipeline instance that was built. | ||
""" | ||
|
||
class Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to get rid of the argparse.Namespace
thing at some point (that is the class that is currently passed around and it does the same thing as Args
). I was hoping to get more input from other user about how pipetask API should look like and refactor things accordingly but I do not see any users that need that API, everyone seems to be happy with pipetask CLI 🙁
try: | ||
pipeline = f.makePipeline(args) | ||
except Exception as exc: | ||
raise ClickException(f"Failed to build pipeline: {exc}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ClickException(...) from exc
is how modern Python wants it to be.
tests/test_cmdLineFwk.py
Outdated
self.click = False | ||
self._testMakePipeline() | ||
self.click = True | ||
self._testMakePipeline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that I want click tests here. This unit tests is for the framework API and should not depend on command line parsing. It currently prepares parsed command line options itself, but it does not use existing parser for that. I do not want to mix things, it may well be that API refactoring will happen and it will look very different and may even move to some other common package together with test. In that sense testing click stuff should certainly be in this package but it's better to keep that separate from API testing even if it means some code duplication.
21c08ac
to
90f422b
Compare
for pipelineAction in (task_option.optionKey, delete_option.optionKey, config_option.optionKey, | ||
config_file_option.optionKey, instrument_parameter.optionKey): | ||
kwargs.pop(pipelineAction) | ||
kwargs['pipeline_actions'] = makePipelineActions(ctx.obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit hackish. Could you instead just define a callback for each option that updates a common list? Or maybe click already has a way to append values from multiple options to the same parameter? Then you could use the same approach as I did in CmdLineParser - use those _ACTION...
as type converters for individual options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The click api really doesn't support any of these approaches. I think the best way would be to pivot to chaining subcommands, and making each of the action options a type of subcommand. Discussing that in Slack now.
pipelineActions.append(_ACTION_CONFIG_FILE(args[i+1])) | ||
elif args[i] in instrumentFlags: | ||
pipelineActions.append(_ACTION_ADD_INSTRUMENT(args[i+1])) | ||
return pipelineActions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is args
an unparsed command line? You may need a bit more care while parsing it this way, there may be complicated cases which simple iteration does not handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tuple of command line options and values, not entirely "unparsed". It's checked by code that comes before it, once we've gotten here there shouldn't be anything wildly unexpected. There's unit test code in tests/test_cliUtils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a docstring tho, that should have been done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's parsed it could still be potentially problematic in some weird corner cases. Suppose there is a very odd option which can accept any string value and someone passes string "-c" to that option:
pipeline build --foo -c -t TaskClass
That would probably break something.
this allows us to keep pipetask while rewriting it in click. eventually the exising pipetask command will be removed and pipetask2 will be renamed to pipetask.
ed50a1c
to
a8172c3
Compare
No description provided.