Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for users to elide ':' or '=' when CLI authors pass a #7297

Merged
merged 7 commits into from
Mar 8, 2018
Merged

Add ability for users to elide ':' or '=' when CLI authors pass a #7297

merged 7 commits into from
Mar 8, 2018

Conversation

c-blake
Copy link
Contributor

@c-blake c-blake commented Mar 4, 2018

non-empty partial symbol table. Behavior should be identical to the
old behavior if empty partial symbol tables are passed. "Partialness"
of the symbol table refers to the fact that one need only specify
option keys that are toggles/booleans/do not take arguments, hence
the "NoArg" suffixes in shortNoArg and longNoArg.

commandLineParams() returns seq[TaintedString], so use that consistently
in getopt() and initOptParser(seq[TaintedString]) dropping the taint at
the quoting stage just as with the paramStr() logic.

Fix capitalization inconsistency of cmdLongOption.

Export OptParser.cmd and OptParser.pos so that, at least in principle,
users of this API can handle "--" option processing termination or some
"git-like" sub-command stop word with a separate option sub-syntax.
{ Eg., case p.key of "": echo "trailing non-option args: ", p.cmd[p.pos..^1]
or case p.kind of cmdArgument: if p.key == "mysubcmd": .... } Really,
searching for the last delimiter before p.pos is probably needed to frame
the trailing text..Not the nicest API, but still possible with effort.

This is a follow up to https://forum.nim-lang.org/t/3592 where it sounded
like it would be welcome.

non-empty partial symbol table.  Behavior should be identical to the
old behavior if empty partial symbol tables are passed.  "Partialness"
of the symbol table refers to the fact that one need only specify
option keys that are toggles/booleans/do not take arguments, hence
the "NoArg" suffixes in shortNoArg and longNoArg.

commandLineParams() returns seq[TaintedString], so use that consistently
in getopt() and initOptParser(seq[TaintedString]) dropping the taint at
the quoting stage just as with the paramStr() logic.

Fix capitalization inconsistency of cmdLongOption.

Export OptParser.cmd and OptParser.pos so that, at least *in principle*,
users of this API can handle "--" option processing termination or some
"git-like" sub-command stop word with a separate option sub-syntax.
{ Eg., ``case p.key of "": echo "trailing non-option args: ", p.cmd[p.pos..^1]``
or ``case p.kind of cmdArgument: if p.key == "mysubcmd": ...``. }  Really,
searching for the last delimiter before p.pos is probably needed to frame
the trailing text..Not the nicest API, but still possible with effort.
@c-blake
Copy link
Contributor Author

c-blake commented Mar 4, 2018

There are obviously some judgement call choices in this idea. E.g., one could flip the sense of the partial table to have the CLI author specify which options take args instead of which do not take args. In my experience, programs with many option keys (like, e.g., the nim compiler itself) are about 2/3 option keys with args. So NoArgs is the "smaller" case needing less maintenance. It would not be hard to take either kind of list -- taking and not taking args, or some other parameter determining the logical sense of the tables and so on and so on, but that can get complicated to explain and CLI authors can mispredict which fork is easier for them (e.g., starting off with mostly toggles and evolving to mostly argument-taking). Just fixing on the "special" case of no arguments seemed simple, effective, and maybe on average ~3X easier than the typical full symbol table.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 4, 2018

Also, I am happy to add some material to the documentation comment/changelog.md and assume you would want that, but I figured first I would see if you liked the approach.

inShortState: bool
shortNoArg: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be of type set[char]. longNoArg should still be a seq to avoid the dependency I guess. Also @[] is easier to use.

@@ -78,11 +80,14 @@ when declared(os.paramCount):
# we cannot provide this for NimRtl creation on Posix, because we can't
# access the command line arguments then!

proc initOptParser*(cmdline = ""): OptParser =
proc initOptParser*(cmdline = "",
shortNoArg="", longNoArg: seq[string] = @[]): OptParser =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, shortNoArg should be set[char].

@Araq
Copy link
Member

Araq commented Mar 5, 2018

In my experience, programs with many option keys (like, e.g., the nim compiler itself) are about 2/3 option keys with args

Yes, plus IMO keys with args are to be preferred (--switch:on|off) for flexibility.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 5, 2018

I made the requested changes. Personally, I find string constants easier, but I can see both sides.

This is slightly off-topic, but while I have your parseopt.nim attention, I was wondering what you think of the idea of having more principled/documented quoting by using os.parseCmdLine to tokenize any cmdline strings into cmdline seqs/iterators before parseopt picks them apart into option keys, vals, and regular arguments. On Windows this would seem cleaner/more complete/less fragile. On Unix the creating process already does that tokenization. If avoiding two passes over the command is desired os.parseCmdLine could be turned into an {iterator, proc} pair.

@Araq
Copy link
Member

Araq commented Mar 5, 2018

I was wondering what you think of the idea of having more principled/documented quoting by using os.parseCmdLine to tokenize any cmdline strings into cmdline seqs/iterators before parseopt picks them apart

Yeah, probably we should do something like this...

@Araq
Copy link
Member

Araq commented Mar 5, 2018

Personally, I find string constants easier, but I can see both sides.

Well ... if we're serious about Unicode set[char] cannot work but then you need to use an UTF-8 iterator over the string constant...

@dom96
Copy link
Contributor

dom96 commented Mar 5, 2018

IMO we shouldn't overcomplicate parseopt like this. What happened to the stance of no symbol table in parseopt?

@c-blake
Copy link
Contributor Author

c-blake commented Mar 6, 2018

Well, it's only a fully optional partial symbol table with the only extra work being for boolean/toggle/no-argument option keys. If CLI authors want the old behavior, the defaults give it. If CLI authors want to provide CLI users with more POSIX-style option syntax then this PR enables that, too. Do you really want to force CLI authors wanting to provide more standard/common option syntax to use external to libraries/roll their own?

@Araq
Copy link
Member

Araq commented Mar 6, 2018

What happened to the stance of no symbol table in parseopt?

The PR seems simple enough and I think parseopt is not flexible enough for a standard library without this feature. Yes, I changed my mind.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 6, 2018

Ok. Well, I documented the changes for this PR. This should be purely added functionality not a breaking change.

Doing the full tokenization first with os.parseCmdLine as mentioned above in this thread would basically be a re-write of the parser against input token streams vs current input string. While we both think that is probably a good idea, it is also a much less simple PR.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 6, 2018

Oh, one thing the documentation reminded me to ask you about is whether you like NoVal/NoValue better than the current NoArg as suffices for short & long. "Argument" is often overloaded/used a couple of ways in this context (non-option arguments and argumetns to option keys) which can be potentially confusing.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 6, 2018

I went ahead and optimistically changed to NoVal and consider this PR good to go unless you really loved the "NoArg" naming. Other possibilities you might like better might be "Bool" or "Toggle" - if you prefer to refer more to the semantics of the option keys than the syntax of their parsing. It is a syntax optionality adjustment, though. The naming doesn't matter much to me, but we probably should not change once it's settled.

@mjoud
Copy link

mjoud commented Mar 7, 2018

This is a nice improvement but would it be possible to add some additional usage examples?

@c-blake
Copy link
Contributor Author

c-blake commented Mar 7, 2018

@mjoud - fair enough. The theme of the documentation for this module seems to be to keep things pretty terse. I added one for the final getopt iterator (and altered the existing getopt example to have 'l' be a value-free short option so they make for an easy side-by-side. Added a few more doc snippets near procs, too.

An example handling "--" would be pretty complex. I think it would be better long-term to just add such handling to the parser to switch over to a cmdArgument-only mode, but I have been trying to keep this PR small.

## (as provided by the ``OS`` module) is taken.
## (as provided by the ``OS`` module) is taken. If ``shortNoVal`` is
## provided command users need not delimit short option keys and values
## with a ':' or '='. If ``longNoVal`` is provided command users need not
Copy link

@ghost ghost Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is provided command users need not maybe this can be changed?

@Araq
Copy link
Member

Araq commented Mar 7, 2018

This requires a test case, apart from that it's fine.

@c-blake
Copy link
Contributor Author

c-blake commented Mar 7, 2018

Ok. Done.

It would be nice someday to re-do the parser to be token-oriented from os.parseCmdLine or os.commandParams rather than the current character-oriented way. I basically did that parser already in https://github.com/c-blake/cligen/blob/master/parseopt3.nim and am happy to donate it. It also emits better error messages for users mistyped commands. That requires more testing on Windows in RTL mode than is easy for me to do, though.

Also, this is way off-topic, but there is a comment in the code related to POSIX not making argv available that indicates you may be interested in a trick I figured out about 20 years ago. You can infer argv from the C global environ on virtually every platform (I tested like 7 or 8 OSes back in the late 90s when Unix diversity was higher). I don't know about Windows (where it isn't necessary), but all the various Unix kernels just lay out the command argument memory area the environment variable memory area. The only variation I saw was whether the array of pointers comes before or after the area of string data being pointed to (and only HP-UX varied from the pack in that way).

c-blake pushed a commit to c-blake/cligen that referenced this pull request Mar 7, 2018
@Araq
Copy link
Member

Araq commented Mar 8, 2018

all the various Unix kernels just lay out the command argument memory area the environment variable memory area.

Ok, but I don't want to make usage of this knowledge. :-)

@Araq Araq merged commit 551d7b7 into nim-lang:devel Mar 8, 2018
@c-blake
Copy link
Contributor Author

c-blake commented Mar 8, 2018

Yah...I suspected you would say that. :-)

zah pushed a commit to zah/grip-lang that referenced this pull request Mar 19, 2018
…m-lang#7297)

* Add ability for users to elide ':' or '=' when CLI authors pass a
non-empty partial symbol table.  Behavior should be identical to the
old behavior if empty partial symbol tables are passed.  "Partialness"
of the symbol table refers to the fact that one need only specify
option keys that are toggles/booleans/do not take arguments, hence
the "NoArg" suffixes in shortNoArg and longNoArg.

commandLineParams() returns seq[TaintedString], so use that consistently
in getopt() and initOptParser(seq[TaintedString]) dropping the taint at
the quoting stage just as with the paramStr() logic.

Fix capitalization inconsistency of cmdLongOption.

Export OptParser.cmd and OptParser.pos so that, at least *in principle*,
users of this API can handle "--" option processing termination or some
"git-like" sub-command stop word with a separate option sub-syntax.
{ Eg., ``case p.key of "": echo "trailing non-option args: ", p.cmd[p.pos..^1]``
or ``case p.kind of cmdArgument: if p.key == "mysubcmd": ...``. }  Really,
searching for the last delimiter before p.pos is probably needed to frame
the trailing text..Not the nicest API, but still possible with effort.

* Make requested changes from string to seq[char]
(see nim-lang#7297)

* Document new behavior and elaborate on some special cases.

* NoArg => NoVal to be less ambiguous/more clear.

* Add more documentation and an example snippet.

* Tweak language. Clarify still using ':'/'=' is ok.

* Add a test case for new NoVal behavior.
c-blake pushed a commit to c-blake/cligen that referenced this pull request Apr 25, 2018
``parseopt`` more token-oriented (see
    nim-lang/Nim#7297 (comment)
), but hasn't seemed interested in parseopt3 re-write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants