Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement fish_getopt (or argparse) command #4190

Closed
krader1961 opened this issue Jul 6, 2017 · 29 comments
Closed

implement fish_getopt (or argparse) command #4190

krader1961 opened this issue Jul 6, 2017 · 29 comments
Assignees
Milestone

Comments

@krader1961
Copy link
Contributor

krader1961 commented Jul 6, 2017

Nine months ago I asked if we should implement a fish version of the getopt command. At the time I was assured we would have a DocOpt solution by now. It's not clear that a DocOpt based solution will be implemented in the near future. And after recently looking at several of our functions that do option parsing I can no longer live with the current state of affairs. For parity with builtin commands we need a way to use wgetopt_long() from fish script.

We most definitely do not want to model our solution on the GNU getopt command. Not least of which because it requires eval'ing the output and eval is evil. Nor do I care for the bash/zsh getopts command. It makes option parsing way too hard for a script by modeling the behavior too closely on the internal getopt_long() implementation. And thus requiring the script to have a lot of boilerplate code just to handle error conditions that can be dealt with by the implementation.

By making this a builtin we can do some useful things like create locally scoped variables in response to parsing the arguments. We can also avoid the need to specify a list of short flags and another list of long flags. The core idea is that every long flag has a corresponding short flag along with an indication of whether the option is a boolean or has an optional or mandatory argument. Also, whether the flag can appear more than once. Short flags don't have to have a corresponding long flag. The short flag associated with a long flag can be exposed or hidden as a valid flag.

Each option specification is composed of

  1. a short flag letter (which is mandatory in the specification),
  2. a / if it can be used as a short flag else - if it is only a short version of the long flag but it can't be used as a short flag (this is optional if there is no long flag name),
  3. an optional long flag name,
  4. nothing if the flag is a boolean that takes no argument, else
  5. a plus-sign if it requires one value and the flag can be used more than once, else
  6. a single colon if it requires one value and the flag can be used at most once, else
  7. two colons if it takes an optional value and the flag can be used at most once.

Each short flag letter will result in a var name of the form _flag_X, where X is the short flag letter. It will be set with local scope (i.e., as if the script had done set -l _flag_X) if the corresponding flag (whether short or long) is seen in the arguments. If the flag is not a boolean the flag var will have zero or more values corresponding to the values collected when the args are parsed. If the flag is a boolean the value is a count of how many times the flag was seen. If the flag was not seen the flag var will not be set.

In the following examples if a flag is not seen when parsing the arguments then the corresponding _flag_X var will not be set.
Some examples:

  1. h/help means that both -h and --help are valid. The flag is a boolean and can be used more than once. If either flag is used then _flag_h will be set to the count of how many times the flags were seen.

  2. h-help means that only --help is valid. The flag is a boolean and can be used more than once. If the long flag is used then _flag_h will be set to the count of how many times the long flag was seen.

  3. n/name: means that both -n and --name are valid. It requires a value and can be used at most once. If the flag is seen then _flag_n will be set with the single mandatory value associated with the flag.

  4. n/name:: means that both -n and --name are valid. It accepts an optional value and can be used at most once. If the flag is seen then _flag_n will be set with the value associated with the flag if one was provided else it will be set with no values.

  5. n-name+ means that only --name is valid. It requires a value and can be used more than once. If the flag is seen then _flag_n will be set with the values associated with each occurrence of the flag.

  6. x means that only -x is valid. It is a boolean can can be used more than once. If it is seen then _flag_x will be set to the count of how many times the flag was seen.

  7. x:, x::, and x+ are similar to the n/name examples above but there is no long flag alternative to the short flag -x.

  8. x- is not valid since there is no long flag name and therefore the short flag, -x, has to be usable. This is obviously true whether or not the specification also includes one of :, ::, +.

After parsing the arguments the argv var is set (with local scope) to any values not already consumed during flag processing.

If an error occurs during fish_getopt processing it will exit with a non-zero status. The specific value will indicate the nature of the problem. The specific exit status codes are TBD. The implementation will write error messages to stderr consistent with the builtin commands. For example, if a flag that requires an argument is seen without an argument then the equivalent of calling builtin_missing_argument() will occur and $status will be set to a distinct, non-zero, value.

Flag specifications can be specified as a single value with each flag separated by a comma. Alternatively, you can specify each flag individually via -o or --option. For example,

fish_getopt 'h/help,1-search,n/name:' $argv

or

fish_getopt -o 'h/help' -o '1-search' -o 'n/name:' $argv

In the above examples _flag_h will be set to the number of times -h or --help is seen, _flag_1 will be set to the number of times --search is seen, and _flag_n will be set if -n or --name is seen and is guaranteed to have a single value.

@krader1961 krader1961 added this to the fish 2.7.0 milestone Jul 6, 2017
@krader1961 krader1961 self-assigned this Jul 6, 2017
@krader1961
Copy link
Contributor Author

See src/builtin_history.cpp for an example of where we have long options with no short option analog available to the user. For example, history --search has no short flag analog. In this proposal that maps to fish_getopt -o '1-search' to create a _flag_1 var if --search is seen. But the short flag -1 is not valid.

@krader1961
Copy link
Contributor Author

krader1961 commented Jul 6, 2017

See also issue #3417 opened nine months ago and closed because it was believed a DocOpt based solution would be implemented three months ago if not earlier. And the original idea is now 4.6 years old.

I love the idea of a more human friendly syntax for specifying how command arguments are processed. But it doesn't appear that a "DocOpt" implementation is in our near future. Whereas a builtin based on our wgetopt_long() function as described above is something I think I can complete in a couple of weeks.

@faho
Copy link
Member

faho commented Jul 6, 2017

The design in general looks solid. I especially like integrating it directly with the shell, saving stuff in variables.

One thing I don't quite like is how prominent the short options are. For one, I'd prefer if the variable names used the long version (if there is one, of course).

Also,

Flag specifications can be specified as a single value with each flag separated by a comma. Alternatively, you can specify each flag individually via -o or --option. For example,

This desperately needs "--" as a separator, since it's about parsing options. So what I'd suggest is actually just to make "--" mandatory. Every argument before it is a flag specification, and every one after it is an argument to check. That would remove the need for "-o".

@krader1961
Copy link
Contributor Author

krader1961 commented Jul 6, 2017

I'd prefer if the variable names used the long version (if there is one, of course).

That's okay by me but I'm ambivalent. I was using the short flag for consistency with how we use wgetopt_long() in the builtin_*() functions. But that isn't really needed here. But there is one problem with using the long flag in the corresponding var name: it can contain characters, like- (dash), that aren't legal in var names. Obviously we can map them using the same heuristic as string escape --style=var. But if we always use the short flag letter then there isn't any confusion.

This desperately needs "--" as a separator, since it's about parsing options....

Agreed, 100%. And no bundling of the option specs by separating them by commas. That was simply a nod to how the getopt command does it. Far better to just use this pattern:

set -l options 'h/help' 'n/name:'
fish_getopt $options -- $argv
or return

if set -q _flag_h
   printf "help\n"
   return 0
end

@krader1961
Copy link
Contributor Author

krader1961 commented Jul 6, 2017

Regarding using the short or long name when creating the _flag_* var: let's do both.

Also, if a flag can only occur once we'll actually let it be used more than once but only the last value seen will be saved.

We'll also support a --require-order flag that corresponds to starting the wgetopt_long() short options string with a +. Also known as "posixly-correct". We don't currently use the return-in-order behavior (short options starting with -) anywhere in our code. And it is a very unusual option parsing mode so we won't implement that at this time.

@zanchey
Copy link
Member

zanchey commented Jul 9, 2017

Of all the option parsing code in the world, the Python argparse module has always struck me as having the best design. getopt's weird DSL is impenetrable; although the proposed syntax is better, I wonder if we could make this more descriptive.

What about using the same sort of options that complete uses? It's pretty verbose, but perhaps the syntax could be tightened up a little.

@krader1961
Copy link
Contributor Author

Well, I already have a working implementation -- in under 400 lines of code. What I've outlined here is similar in flavor to Python's argparse. We obviously can't do it the way that module does because fish script doesn't support OOP patterns.

I started writing a man page last night. I'm going to rewrite the two functions (umask and trap) that currently use getopt today then create a PR. That way there are concrete examples of using the new implementation people can examine. We can then discuss whether that's the pattern we want (short of a DocOpt based solution) or if it should be altered.

@zanchey
Copy link
Member

zanchey commented Jul 10, 2017

If we wanted to do both, we could add a fish_opt function that had a more descriptive syntax, then use it like this:

fish_getopt --options (fish_opt --short h --long help [--long-only] [--required] [--multi]) (fish_opt --short l --long list ...) ...

The fish_opt function would produce the DSL string.

@krader1961
Copy link
Contributor Author

Not a fan of that idea, @zanchey. While we don't want fish script to be as terse as a typical Perl program I think the level of verbosity you're proposing actually hinders comprehension once you have more than one option. Take a look at the PR I opened which includes a couple of functions converted to the new argparse command (I decided I didn't like my original name for it).

@krader1961
Copy link
Contributor Author

Having said that I'm not opposed to implementing a fish_opt command that produces the DSL as output. I don't think I would recommend it over just writing the DSL specs by hand since 99% of the use cases involve just two patterns: boolean vars and vars that require exactly one value. With both types of vars having a short and long flag. That is, h/help and m/mode: are going to be practically the only two styles of option spec you're ever likely to see. The other variations are there solely for parity with what we can do using wgetopt_long() from builtin commands.

Also, I think this pattern is preferable to trying and put it all on one command line:

set -l options (fish_opt --short h ...)
set options $options (fish_opt --short p --long print ...)
argparse $options -- $argv
or return

@krader1961 krader1961 changed the title implement fish_getopt function implement fish_getopt (or argparse) command Jul 10, 2017
@krader1961
Copy link
Contributor Author

I implemented a fish_opt function and in the process found that an assert() I had in the C++ code was incorrect. Updated PR in a few minutes that includes the fix for the argparse C++ code and a fish_opt implementation.

@krader1961
Copy link
Contributor Author

@faho raised a couple of points in this comment on PR #4204 that are better addressed here.

I have two concerns:

The first is how this would deal with incompatible options - see e.g. our abbr function, which takes options like "--add" and "--remove". These currently cannot be specified simultaneously, so if two of them are given it should be an error. Of course it's possible to do something like set -q _flag_add; and set -q _flag_remove, but that doesn't scale particularly well. Is there a possible addition here to make that easier?

I guess the question is how often does this pattern occur in fish functions? Obviously it does occur. For example, you'll also find it in the fish_opt function I added last night to that PR in response to @zanchey's suggestion above. A related pattern is where at least one of several flags must be present. Do these occur often enough to warrant complicating the argparse implementation? Yes for the first case. Maybe for the second.

I propose -x / --exclusive flags that take an argument of comma separated short or long flags that are mutually exclusive. It can be used more than once to define multiple sets of mutually exclusive flags.

We could also add a -m / --mandatory flag that also takes a comma separated list of short or long flags at least one of which must be specified. But I'm going to defer doing that since this is harder to justify.

On a related note it might be nice to have a way to validate the value associated with a flag. We often want to ensure a value is a valid int within a certain range of values or is a string from a finite set of values. Including a function name in the option spec is the obvious solution. The function is called when the flag value is parsed and it returns zero if the value is valid else one. We would provide a few functions to handle common cases such as validating the value is an int. The user is free to create functions for more complex validation.

Another is how this works with the popular "subcommand" argument. Imagine you implemented git with this. It has some global options that can be given before the subcommand, and then loads of subcommand-specific options.

The solution here is to use the --require-order flag and do the arg parsing in two steps. The first one handles global options that appear before the subcommand. The second is then done based on the subcommand seen. For example:

set -l global_opts 'h/help' 'v/verbose' 'l/long'
argparse --require-order --min-args 1 $global_opts -- $argv
or return

set -l subcommand $argv[1]
set -e argv[1]
set -l sub-opts
switch $subcommand
    case sub1:
        set sub-opts 'x' 'w/why'
    case sub2:
        set sub-opts 'a/any' 't/token:'
    case '*':
         echo 'Unrecognized sub command "$subcommand'" 2>&1
         return 1
end

argparse $sub-opts -- $argv
or return

P.S., The argparse command would benefit from adding an explicit function scope option to set as proposed in issue #565 .

@krader1961
Copy link
Contributor Author

Adding support for a --exclusive flag has added roughly 100 lines (~30% more lines) to the implementation. And that is just the core implementation without unit tests. I still think this feature is worthwhile but it is a good example for why such changes require discussion.

@faho
Copy link
Member

faho commented Jul 11, 2017

@faho raised a couple of points in this comment on PR #4204 that are better addressed here.

Yes, sorry!

I propose -x / --exclusive flags that take an argument of comma separated short or long flags that are mutually exclusive. It can be used more than once to define multiple sets of mutually exclusive flags.

Yes, that'll work.

On a related note it might be nice to have a way to validate the value associated with a flag. We often want to ensure a value is a valid int within a certain range of values or is a string from a finite set of values. Including a function name in the option spec is the obvious solution.

The obvious drawback of that is that we only have global functions. So if you just need to validate one thing you're still polluting the global namespace.

It might be possible to allow arbitrary code that will just be called with the value as an additional argument (as if called as somecode $argv).

So you'd do argparse 'n/some-number(test 0 -lt)' to validate that the "--some-number" option was given a positive integer. Since this can only have one argument, quoting is unnecessary. If you just give it a function name, it'll pass that argument as $argv.

Anyway, this doesn't need to be included now.


The solution here is to use the --require-order flag and do the arg parsing in two steps.

Ah, so "--require-order" requires the order of "options before non-option arguments"?

That should be renamed since it's not intuitive that it could also be used for this purpose, but I can't come up with a great name either. "--until-first-nonoption"? "--options-before-arguments"?

--min-args 1

What does that do? I can't find it in the current code.

@krader1961
Copy link
Contributor Author

krader1961 commented Jul 11, 2017

The --require-order terminology is from the wgetopt_long implementation. I used it only because I also couldn't think of a good phrase. If you grep 'short_options = L"\\+' src/* you'll see the places in the builtins where we use this behavior.

The --min-args and --max-args will be in the commit after I add the --exclusive flag. Consider the share/functions/fish_opt.fish script which has this block after calling argparse:

    if set -q argv[1]
        printf (_ "%s: Too many arguments") fish_opt >&2
        return 1
    end

Most commands have (or should have) an explicit check that they were handed no non-flag arguments, or exactly one, or at least one, etc. So --max-args and --min-args allows that boilerplate code to be omitted by moving the check into argparse.

As for flag value validation that will definitely not be part of this change. I'll open a new issue to discuss how that feature might work.

@krader1961
Copy link
Contributor Author

@ridiculousfish, or anyone else, if you want to object to this please do so immediately. Otherwise I'm going to merge this in the next 48 hours. Just as soon as I finish the documentation and add more unit tests. I would be happy to see a DocOpt based solution become our standard for both builtin and fish script commands. However, that doesn't appear likely in the near future given that #478 has been open almost five years. Furthermore, it is unclear that a DocOpt based solution will maintain compatibility with current argument parsing behavior. Practicality trumps purity.

My solution obviously maintains compatibility with existing behavior for builtin commands since it does not change how builtin commands parse their arguments. It can change how fish script commands parse their arguments but only if they are modified to use this new builtin. And even then the change in behavior is almost certainly for the better by being more consistent with how builtin commands parse their arguments.

@faho
Copy link
Member

faho commented Jul 12, 2017

The --require-order terminology is from the wgetopt_long implementation. I used it only because I too couldn't think of a good phrase.

"--stop-at-nonoption"?

So --max-args and --min-args allows that boilerplate code to be omitted by moving the check in to argparse.

Nice!

@maxnordlund
Copy link
Contributor

maxnordlund commented Jul 12, 2017

I've been waiting on the DocOpt for quite some time and agree that this is a realistic way forward. But I found the syntax unclear. It would be better to make it more like how the flags are used, and us common glob patterns for bool/single/multi:

fish_getopt 'h/help?' 'v/verbose*' 'n/name=' 'o/output=+' -- $argv

fish_getopt 'h/help?' '{add,commit}=subcommand' -- $argv
# Or just use $argv[1] instead of the =subcommand above
switch $subcommand
  case add:
    fish_getopt 'p/patch' -- $argv
  case commit
    fish_getopt 'a/all' -- $argv
end

fish_getopt 'x' 'y?' 'z=' -- $argv

@faho
Copy link
Member

faho commented Jul 12, 2017

It would be better to make it more like how the flags are used, and us common glob patterns for bool/single/multi:

Ooh, I like that idea!

So, with that we have these multiplicities:

  • Zero or one - expressed as "?"
  • Exactly once - this would most closely match "no glob character" - "h/help" would be mandatory once
  • Arbitrary (0..infinity) - "*"
  • At least once - "+"

If something takes an argument, you add a "=" to the spec - "output=" takes an argument. If the argument is optional, you use "=?". "*" and "+" don't really make sense here.

So "o/output+=?" would mean at least one output option with an optional argument.

I like this syntax much more, but I see two issues:

  • The default is mandatory once, which isn't all that usual, but I can't see a better analogue to glob syntax

  • For at-least-once or arbitrary with an optional argument, what do we set the flag variable to for occurences that lack the argument? E.g. if we have "o/output+=?", if someone passes command -o=something -o -o=somethingelse. It might not be a problem to make this undetectable - if you use the presence of the flag to turn on a default and the argument to use something else, you don't care about the flag without the argument.

@maxnordlund
Copy link
Contributor

maxnordlund commented Jul 12, 2017

So, with that we have these multiplicities:
...

Yes, that's what I meant. Also since you got it right it fits right in with the least surprise principle.

The default is mandatory once, which isn't all that usual, but I can't see a better analogue to glob syntax

Yeah, but I'm just gonna assume it's common enough to be fine for now. One way would be to default boolean flags to optional (which kinda already are) and just use ? for flags with arguments.

For at-least-once or arbitrary with an optional argument, what do we set the flag variable to for occurences that lack the argument? E.g. if we have "o/output+=?", if someone passes command -o=something -o -o=somethingelse. It might not be a problem to make this undetectable - if you use the presence of the flag to turn on a default and the argument to use something else, you don't care about the flag without the argument.

I would expect the empty string, or perhaps 0. If you use the empty string though it would simplify the flags with arguments specifier.

  • "=" for mandatory single argument
  • "=?" for one optional argument
  • "=+" for at least one argument
  • "=*" for arbitrary arguments
  • "~="/"~=+" for -o and -o=stuff

But it looks really funky, and reversing them looks even funkier "~+="/"+~=".

Another idea is to use parentheses "(n/name=)+" "n/name=?" "(n/name=?)+"

@faho
Copy link
Member

faho commented Jul 12, 2017

One way would be to default boolean flags to optional (which kinda already are) and just use ? for flags with arguments.

I think that would be even more surprising.

Another possibility is to not offer mandatory-once here at all. It's not all that common (mostly it's used for options that are really subcommands, e.g. pacman -S), and if it is really needed it's still possible to test later. It's actually not offered in the current implementation IIUC.

"o/output" would be an optional boolean, and "o/output=" would be an optional value. "o/output=?" would be an optional flag with an optional argument. "o/output*" would be an optional boolean that can be given more than once, and "o/output*=" would be a flag that can be given more than once, but needs a value every time.

Come to think of it, "at least once" is also rarely used. I can't come up with any examples off the top of my head.

"=+" for at least one argument

I can't recall any command using that. Traditionally, command --flag arg1 arg2 would interpret only "arg1" as argument to "--flag" and "arg2" as a general argument to command.

(Maybe reverse them "?=" "+=" "*=")

What does "o/output*=" mean then? I imagine "o/output*=?" means the flag can be given zero to infinity times and each occurence can have an argument. "o/output*=" means the flag can be given zero to infinity times and each one must have an argument.

@maxnordlund
Copy link
Contributor

Sounds good, all the required stuff could be solved with actual flags to fish_getopt itself, like with the exclusive option. Though I like the {add,commit,pull} syntax as well.

@krader1961
Copy link
Contributor Author

The default is mandatory once, which isn't all that usual, but I can't see a better analogue to glob syntax.

Yeah, this is a deal breaker. Also, the default (no special modifier chars after the flag name) has to be a simple boolean flag since that is far and away the most common use case.

I think much of this alternative syntax is pushing too much logic into the argparse implementation. Keep in mind that the primary goal is 100% compatibility with wgetopt_long() used by all our builtins. I'm also not going to implement subcommands in the manner proposed. That adds a lot of complexity both in terms of the implementation and documenting the behavior for zero benefit. It doesn't give us anything we can't already do in a straightforward manner using the --require-order (or whatever we end up calling it) flag.

I don't have any objection to

  • using = rather than : to mean one mandatory value and only the last instance is saved,
  • =? rather than :: to mean it optionally takes a value and only the last instance is saved, and
  • =+ rather than + to mean each instance of the flag requires a value and we save each instance.

The : and :: syntax was simply for consistency with wgetopt_long(). Something 95% of people writing fish script won't care about.

@maxnordlund
Copy link
Contributor

Yes subcommands is far off but I in favour of thinking a ahead a bit to try to avoid shooting oneself in the foot.

@krader1961
Copy link
Contributor Author

@maxnordlund, See this comment for how subcommands can be handled in a straightforward manner using the current design. Keep in mind too that fish script that has subcommands is rare. So absent a compelling argument it doesn't make sense to implement support for it in argparse.

@maxnordlund
Copy link
Contributor

Yes I read that earlier, my thinking was to use the {} syntax to designate exclusive options, which together with multiple step parsing would allow for subcommands without to much hassle.

krader1961 added a commit that referenced this issue Jul 13, 2017
krader1961 added a commit that referenced this issue Jul 13, 2017
krader1961 added a commit that referenced this issue Jul 13, 2017
This implements a `fish_opt` command that provides a way for people
to create option specs for the `argparse` command as an alternative to
creating such strings by hand.

Fixes #4190
krader1961 added a commit that referenced this issue Jul 13, 2017
krader1961 added a commit that referenced this issue Jul 13, 2017
Convert our two functions that use `getopt` to use our new `argparse`
builtin.

Fixes #4190
krader1961 added a commit that referenced this issue Jul 13, 2017
This implements some unit tests for the new `argparse` command and fixes
a couple of bugs those tests brought to light.

Fixes #4190
@krader1961
Copy link
Contributor Author

...my thinking was to use the {} syntax to designate exclusive options, which together with multiple step parsing would allow for subcommands without to much hassle.

I decided earlier to implement an alternative mechanism for specifying mutually exclusive flags: the --exclusive flag which can be repeated to define independent sets of mutually exclusive flags. That, however, is independent of support for subcommands.

Now that I have merged my implementation for the argparse command we need to start changing all fish functions to use it. Rather than continuing to use their own idiosyncratic methods for parsing their arguments.

@wilriker
Copy link

wilriker commented Nov 27, 2017

I really like this new feature! I was waiting for something like this for ages now, so thanks for providing something like this!

One question though: is there a nice way to define a default value for a variable if a flag is not set? At least I do not see anything as part of argparse so how would you do this with the least amount of code?

Best regards
Manuel

@faho
Copy link
Member

faho commented Nov 27, 2017

One question though: is there a nice way to define a default value for a variable if a flag is not set? At least I do not see anything as part of argparse so how would you do this with the least amount of code?

@wilriker: As always, set -q var; or set -l var defaultvalue. So if you had e.g. a --path=/some/path option, you'd do set -q _flag_path; or set -l _flag_path /default/path.

Note that, because of the way local scope currently works, you need to use or and not if not set -q _flag_path.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants