Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25385 make a pipetask command based on click #58

Merged
merged 6 commits into from Jul 15, 2020
Merged

Conversation

n8pease
Copy link
Contributor

@n8pease n8pease commented Jun 21, 2020

No description provided.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay but I'd like @andy-slac to take a look.


class show_option: # noqa: N801

defaultHelp = ("Dump various info to standard output. Possible items are: `config', "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be better as a """ quoted string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timj as far as I can tell doing it that way puts the newlines and whitespace (if each new line does not begin at column 0) in the value of the string (is there a way to prevent that?). Click does it's own text wrapping for help strings and depends on console width. The extra whitespace messes that up. I guess I could format the """ str for narrowest expected console, but I'm not sure that would be a real win over just doing it this way & letting the wookie win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's disappointing. The help text is very hard to read in that context. The alternative is to use the triple-quoted string and run it through a helper function that replaces every newline with a space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that and textwrap.dedent FTW

_ACTION_ADD_INSTRUMENT)


def build(config_file=[], config=[], delete=[], instrument=[], order_pipeline=None, pipeline_dot=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python advises against using mutables as defaults because those mutables can be changed by this function and that would change the defaults to everyone. Since you are only using these for iteration you can use a tuple as a default value.

The pipeline instance that was built.
"""

class Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cute. We stuff random properties into an empty class? Couldn't we use a simple dataclass here so that we can do some kind of checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wanting to avoid managing the opaque args data structure much/any. Of course it's the idiom that comes from the argparse impl, IMHO it should be replaced entirely by member vars to the function or a smaller bunch of classes that contain related data members. I was hoping to avoid making that big of a change right now. But, doing one or the other is probably the Right Thing to do. I'll ponder, it might be worth a discussion on Slack... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know any of the history (so this trick was used in the implementation by @andy-slac and you migrated it over?). At the very least a comment like you just made in the class description for Args would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the args object with a bunch of parser-defined properties is what you get from python's argparse module, e.g. the top example at https://docs.python.org/3/library/argparse.html. It's a common pattern (anti-pattern IMHO) to then pass around that object to your impl functions and let them pluck the members they need out of it.
I can at least add a comment 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to get rid of the argparse.Namespace thing at some point (that is the class that is currently passed around and it does the same thing as Args). I was hoping to get more input from other user about how pipetask API should look like and refactor things accordingly but I do not see any users that need that API, everyone seems to be happy with pipetask CLI 🙁

Copy link
Collaborator

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fixes needed, please check comments:

  • options/actions that build pipeline should be executed in the order in which they appear on command line
  • unit test for click should be separate from unit test for framework API
  • small bunch of docstring and help/metavar comments


from lsst.ctrl.mpexec.cli.pipetask import main

if __name__ == '__main__':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double quotes for all strings.

def __call__(self, f):
return click.option("--order-pipeline",
required=self.required,
help=self.help)(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a boolean flag, should it have is_flag=True parameter?

def __call__(self, f):
return click.option("-p", "--pipeline",
required=self.required,
help=self.help)(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metavar="PATH"? Or type=click.Path(...)?

return click.option("--pipeline-dot",
help=self.help,
required=self.required,
type=click.Path(exists=True, writable=True))(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does exists=True mean it has to exist? This is for output only, it does not need to exist, but will be overwritten if it exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does mean that. I'll fix.

def __call__(self, f):
return click.option("-s", "--save-pipeline",
help=self.help,
required=self.required)(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this is an output file, either give it metavar or type=Path?



@click.command(cls=PipetaskCLI, context_settings=dict(help_option_names=["-h", "--help"]))
@log_level_option()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess adding option here means that you have to do:

pipetask2 --log-level=DEBUG build ....

but cannot do

pipetask2 build --log-level=DEBUG ....

for current pipetask we intentionally define all options for each sub-command (so you can do latter but not former). Wt should keep the same behavior for pipetask2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. It was a deliberate decision in butler to move the log level before the subcommand. butler and pipetask should agree. Why do we require that --log-level comes after? We can change our minds on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipetask initially implemented common options to appear before the command and command-specific options after the command (much like git and other tools) but that caused confusion and I had to make it uniform. Announcement for this was here: https://community.lsst.org/t/pipetask-command-line-interface-changes/3923, I don't remember where complaints about initial command line were, on slack or some ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to agree on how butler and pipetask works. @TallJimbo do you have a preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that other tools with subcommands (certainly including git - at least the way I always use it) allow options that are not specific to the subcommand to come either before or after, and if we can do that with Click, there's no problem.

I believe argparse requires such options to always go before the subcommand, which is a UI disaster in this case: it requires the user to remember whether some option is specific to a subcommand, even though it's also used by (i.e. "specific to"), say, 4 of the 5 other subcommands, and hence it's really demanding that the user know the interface of all of the subcommands they aren't trying to call. Telling argparse that all options are specific to subcommands and hence must go after the subcommand was a workaround for this: it's at least not hard to remember "all options go after the subcommand".

Do we know if Click has the same behavior as argparse? If so, I think we have no choice but to do the same workaround, even if it means fighting Click, unless we can make the case that some options are so completely universal that it's easy to remember that they always must go before the subcommand. But I hope it doesn't come to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to tell Click to allow an option in either place?

Copy link
Contributor Author

@n8pease n8pease Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to tell Click to allow an option in either place?

Kind of: we can allow the command (pipetask) and subcommands (e.g. build) to both take an option. If those options need to be mutually exclusive then we have to hand roll that in implementation.

But, I had thought of --log-level and butler --help as being specific to the butler (or pipetask) command. butler --help definitely is, and we do support butler <subcommand> --help. As you'd expect, the later results in totally different output from the former. butler --help will prevent any further subcommands from running (it prints help and exits). The logging setting will affect logging from butler execution as well as execution of the subcommand it calls, it didn't seem like we needed subcommands to specify their own help level.

I don't think it makes sense for some subcommand options to go before the subcommand on the CLI and some to go after it. When there are common subcommands we define each in a shareable way, and it is very simple to add a shared option to the subcommand that accepts it, making it so that all options can go after the subcommand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if every one else thinks it should be butler command --log-level then change it to that.

Copy link
Contributor Author

@n8pease n8pease Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow options that are not specific to the subcommand to come either before or after, and if we can do that with Click, there's no problem.

Click does not seem to want to support e.g. butler create ~/foo --log-level DEBUG in an automatic sort of way, without making --log-level an option of create. There might be a way to code around this but if possible I think we should not fight it.

Copy link
Member

@TallJimbo TallJimbo Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm definitely not proposing allowing subcommand-specific options before the subcommand (just the converse). And I agree --help currently works the way it should.

--log-level is one of the few things I might imagine being fine to require before the subcommand, as it could be sufficiently global. I'm not sure it actually is that global, and that's a question of whether logging control for task execution is really the same as logging control for butler (and really whether it's just one logging control for everything). If it is, fine - but if not we may need to actually split this up into e.g. --butler-log-level LEVEL or --task-log-level LEVEL or something more general like --log-level NAME=LEVEL, before we think about what goes before or after the subcommand.

args.pipeline_actions.append(_ACTION_CONFIG(c))

for c in config_file:
args.pipeline_actions.append(_ACTION_CONFIG_FILE(c))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a big issue here. The -c and -C actions have to be executed in the order in which they appear on the command line, i.e.:

pipetask build -C override1.py -c param1=val1 -C override2.py -c param2=val2

is not the same as

pipetask build -c param1=val1 -c param2=val2 -C override1.py -C override2.py 

I think you new parser implicitly re-orders those options to look like latter case, but we have to keep the relative order of -c/-C actions.

The same applies to other actions, i.e.:

pipetask build ... --delete task --task task

is not the same as

pipetask build ... --task task --delete task

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pondering that this might be a serious click inhibitor...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it should be possible to do same sort of smart trick in click, similar to what was done in argparse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively we could drop all these options that build pipelines and force everyone to do YAML. I do not know how many people are sill using -t/-c/-C but I remember there was a talk that YAML is our future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for @natelust or @MichelleGower to comment on I think.

Copy link
Contributor Author

@n8pease n8pease Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can capture the command-line inputs to the command. It's not built into the api but I can subclass, the Click authors claim it's designed and supported to be extensible this way.

The pipeline instance that was built.
"""

class Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to get rid of the argparse.Namespace thing at some point (that is the class that is currently passed around and it does the same thing as Args). I was hoping to get more input from other user about how pipetask API should look like and refactor things accordingly but I do not see any users that need that API, everyone seems to be happy with pipetask CLI 🙁

try:
pipeline = f.makePipeline(args)
except Exception as exc:
raise ClickException(f"Failed to build pipeline: {exc}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ClickException(...) from exc is how modern Python wants it to be.

self.click = False
self._testMakePipeline()
self.click = True
self._testMakePipeline()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I want click tests here. This unit tests is for the framework API and should not depend on command line parsing. It currently prepares parsed command line options itself, but it does not use existing parser for that. I do not want to mix things, it may well be that API refactoring will happen and it will look very different and may even move to some other common package together with test. In that sense testing click stuff should certainly be in this package but it's better to keep that separate from API testing even if it means some code duplication.

@n8pease n8pease force-pushed the tickets/DM-25385 branch 3 times, most recently from 21c08ac to 90f422b Compare June 30, 2020 18:42
for pipelineAction in (task_option.optionKey, delete_option.optionKey, config_option.optionKey,
config_file_option.optionKey, instrument_parameter.optionKey):
kwargs.pop(pipelineAction)
kwargs['pipeline_actions'] = makePipelineActions(ctx.obj)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit hackish. Could you instead just define a callback for each option that updates a common list? Or maybe click already has a way to append values from multiple options to the same parameter? Then you could use the same approach as I did in CmdLineParser - use those _ACTION... as type converters for individual options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The click api really doesn't support any of these approaches. I think the best way would be to pivot to chaining subcommands, and making each of the action options a type of subcommand. Discussing that in Slack now.

pipelineActions.append(_ACTION_CONFIG_FILE(args[i+1]))
elif args[i] in instrumentFlags:
pipelineActions.append(_ACTION_ADD_INSTRUMENT(args[i+1]))
return pipelineActions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is args an unparsed command line? You may need a bit more care while parsing it this way, there may be complicated cases which simple iteration does not handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tuple of command line options and values, not entirely "unparsed". It's checked by code that comes before it, once we've gotten here there shouldn't be anything wildly unexpected. There's unit test code in tests/test_cliUtils.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a docstring tho, that should have been done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it's parsed it could still be potentially problematic in some weird corner cases. Suppose there is a very odd option which can accept any string value and someone passes string "-c" to that option:

pipeline build --foo -c -t TaskClass

That would probably break something.

this allows us to keep pipetask while rewriting it in click.
eventually the exising pipetask command will be removed and
pipetask2 will be renamed to pipetask.
@n8pease n8pease merged commit e4df7f2 into master Jul 15, 2020
@andy-slac andy-slac deleted the tickets/DM-25385 branch November 13, 2020 21:43
@andy-slac andy-slac restored the tickets/DM-25385 branch November 13, 2020 21:43
@andy-slac andy-slac deleted the tickets/DM-25385 branch November 13, 2020 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants