(WIP) Add support for quoting parameters #293

BeeeWall · 2018-03-01T22:27:29Z

This adds a class (CommandTokenzer) that takes text, and splits it based on customizable whitespace characters, and has support for quoting/escaping spaces with customizable quote and escape characters. The methods are intended to be similar to that of StreamTokenizer (hence the name). See here for some more details.

Note that currently the class is not used, as it is not finalized yet and (as I mentioned in the issue) I have to edit CommandLine.java with a normal text editor, not my IDE, so it is easier to keep it separate until completion.

codecov-io · 2018-03-01T22:29:23Z

Codecov Report

Merging #293 into master will decrease coverage by 2.99%.
The diff coverage is 22.95%.

@@            Coverage Diff             @@
##             master     #293    +/-   ##
==========================================
- Coverage     89.07%   86.07%    -3%     
- Complexity      281      296    +15     
==========================================
  Files             4        5     +1     
  Lines          3852     4035   +183     
  Branches        933      964    +31     
==========================================
+ Hits           3431     3473    +42     
- Misses          209      343   +134     
- Partials        212      219     +7

Impacted Files	Coverage Δ	Complexity Δ
...ain/java/picocli/interactive/CommandTokenizer.java	`22.95% <22.95%> (ø)`	`15 <15> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cfe6aa...e1a12cf. Read the comment docs.

remkop

I've done a quick review and added some comments.

Thanks for working on this! It's really starting to take shape!

remkop · 2018-03-02T09:20:47Z

src/main/java/picocli/CommandTokenizer.java

@@ -0,0 +1,371 @@
+package picocli;


Let's put this class in a new package picocli.interactive. I'm thinking that this package may grow in the future to contain additional functionality for interactive command line applications.

Except this is useful for normal as well as interactive programs. I'm planning (if you're OK with it) to add the finalized version in as an inner class of CommandLine, similar to everything else, and to help with the whole "one file" thing. But if I add anything else for interactive stuff, I'll do that in a separate package (and pull request).

In normal usage the operating system shell will take care of the tokenizing and quoting so this is really for interactive use as far as I can see. There are all kinds of interesting things that can be done for picocli-based interactive shells, including things like autocompletion. I really think this should live in a new picocli.interactive package.

Ok, will do. I wasn't sure if the OS handled all that or if picocli did (all my testing has been interactive, so I apparently had forgotten how non interactive apps worked 🙄).

remkop · 2018-03-02T09:22:10Z

src/main/java/picocli/CommandTokenizer.java

+import picocli.CommandLine.Option;
+import picocli.CommandLine.Parameters;
+
+public class CommandTokenizer {


Would be great to have some javadoc that describes what this class does, ideally with an example of how it can be used.

Good idea. I'm waiting until it's done though, in case we decide anything needs to be changed.

remkop · 2018-03-02T09:33:10Z

src/main/java/picocli/CommandTokenizer.java

+		// Remove blanks at start and end of a string
+		private boolean trimBlanks = true;
+
+		public CommandTokenizer(String cmd) {


Both constructors start parsing the specified input immediately before users have had a chance to configure the tokenizer. It may be better to postpone the call to parse until the user calls the tokens or nextToken method.

Oops, you're right. Didn't think about that, will fix.

remkop · 2018-03-02T09:47:17Z

src/main/java/picocli/CommandTokenizer.java

+			this.tokens = parse(input);
+		}
+
+		private String[] parse(String cmd) {


Can we add unit tests for this method and the other public parse method? We can make this method package-private to allow unit tests to call it.

Sure. I've never done unit testing before, but for something like this, I think it should be simple.
I might actually make them public, so that people can reuse the object. In that case, I would also add a no-arg constructor.

If you’ve never created unit tests before it will be a revelation. :-) Unit tests are a great way to shake out bugs and create confidence that future changes didn’t break anything. They are crucial to keeping software “soft”.

Please see the picocli tests for examples.

BeeeWall · 2018-03-27T01:46:29Z

Huh, somehow the javadocs broke, and that is failing Travis. Not sure how to fix it.

BeeeWall · 2018-03-27T01:46:50Z

I think I included all the parameters and returns.

remkop · 2018-03-27T10:07:28Z

The problem seems to be the @see javadoc tags. You probably should not surround the reference with curly braces. See the documentation for the @see javadoc tag.

BeeeWall · 2018-03-28T22:33:14Z

Currently, I only have unit tests for the String parse method, but I will add tests for the Scanner method.

BeeeWall · 2018-03-28T22:37:34Z

...and the javadoc broke again 😤

remkop · 2018-03-29T15:51:17Z

You'll get there! 😄

BeeeWall · 2018-03-29T16:09:52Z

Any idea what broke the javadoc this time? There are a ton of errors in the log, and I've never used javadoc before.

remkop · 2018-03-29T21:31:03Z

(Away from PC). Most of those are warnings, not errors. If you search for “error” you’ll quickly find it.

remkop · 2018-03-29T23:39:34Z

I found a few:

error: self-closing element not allowed <br />
error: tag not allowed here: <tbody>
error: reference not found {@link #tokens()}

BeeeWall · 2018-03-30T19:07:15Z

I'm not sure how to fix the issue. During my manual testing, it should succeed. I'm not sure what JUnit is doing that I'm not (or vice-versa) that causes the issue. I'm pretty sure what is happening is it is splitting on the space in the comments test, but when I manually run parse on it, it doesn't split there. I've tried both the String and Scanner versions, to see if there is some difference, but it seems to be the same.

remkop · 2018-03-31T03:24:49Z

The test failure says:

picocli.interactive.CommandTokenizerTest > testComments FAILED
    java.lang.AssertionError: array lengths differed, expected.length=1 actual.length=2
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.internal.ComparisonCriteria.assertArraysAreSameLength(ComparisonCriteria.java:76)
        at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:37)
        at org.junit.Assert.internalArrayEquals(Assert.java:532)
        at org.junit.Assert.assertArrayEquals(Assert.java:283)
        at org.junit.Assert.assertArrayEquals(Assert.java:298)
        at picocli.interactive.CommandTokenizerTest.testComments(CommandTokenizerTest.java:51)

Unsure why. You can try printing the result of the parse to stdout so you can see what the actual results are in the Travis CI log.

BeeeWall · 2018-03-31T05:59:03Z

Yep, it's splitting on the space. Like I said, I have zero idea why that would happen, whenever I try the test input, it works, but in JUnit it isn't.

remkop · 2018-03-31T06:01:41Z

Could it have something to do with the difference in environment? Different behaviour when running on Linux?

BeeeWall · 2018-03-31T06:04:06Z

Possibly? I'm not using JUnit, because AIDE does not support it. I'm using a mini interactive shell that returns the tokenizer output.

BeeeWall · 2018-03-31T06:04:59Z

Maybe a difference between the JVM and the Dalvik VM (or ART, I think), because AIDE works by dexing stuff?

BeeeWall · 2018-03-31T17:42:10Z

The issue was that the code to trim blanks was in the Scanner method, not the String one. It's weird that switching to the String one in my manual testing didn't point that out, I must've still been using the Scanner one somehow.

remkop · 2018-04-01T01:25:33Z

Good catch!

saaadel · 2018-04-09T23:13:16Z

From:
#324 (comment)

Feedback:

Throws ArrayIndexOutOfBoundsException for empty command line or for command line with spaces only. I think it will be better to return empty array, ArgumentTokenizer works and returns zero length array.
if we will have multiple spaces between params - we will get this as empty params in the array. Example: "abc xyz" (three spaces between params) -> Returns: [abc, , , xyz]. FYI ArgumentTokenizer skips those repeatable spaces

Also question:
Why you have two methods with separate implementation?
new CommandTokenizer().parse(str)
new CommandTokenizer().parse(scanner)

I mean, why you not replaced parse(str) implementation with short code?

parse(str) {
  return parse(new Scanner(cmd))
}

BeeeWall · 2018-04-10T00:15:52Z

Will fix those issues, thanks!

Currently, the Scanner method actually calls the String method, it just has extra code to allow newlines to be escaped in an interactive shell. But I'll look into using less code for that, so that it is clearer that it is mostly the same.

saaadel · 2018-04-10T02:39:27Z

do you plan to add symbol escaping in windows notation? I mean, caret symbol and something. For example: ^" - escaped quotes (instead of a slash notation for *nix)
Maybe in another method? or with optional flag in the current methods?

remkop · 2018-04-10T04:16:53Z

Shouldn't this be possible in CommandTokenizer already by adding the caret ^ character to the escapePatterns, or am I missing something?

saaadel · 2018-04-10T04:58:11Z

if (ch < 0) {
	this.quoteChars.clear();
}

IMHO it's dirty way, add another methods to clear lists.

saaadel · 2018-04-10T05:13:39Z

@remkop

Shouldn't this be possible in CommandTokenizer already by adding the caret ^ character to the escapePatterns, or am I missing something?

Not so simple, he-he 😄

See there: https://ss64.com/nt/syntax-esc.html

simple chars:
^^ = ^
^" = "
^& = &
^| = |
etc.
but percents
%% = %
and exclamation marks, boom! 💥 (for extension mode only - we need bool flag here)
^^! = !
Escaping the pipeline: symbol escaped twice! (for second command in the pipe, or more for next commands in the pipe). For example: abc | echo ^^^&

When a pipe is used, the expressions are parsed twice. First when the expression before the pipe is executed and a second time when the expression after the pipe is executed. So to escape any characters in the second expression double escaping is needed:

^^^& = & (escaped twice if second command in the pipe)

Note: abc|xyz is correct pipe with two commands, without spaces between. it's correct for *nix too.

PS. I think it will be better to drop pipes parsing in first version, because it's complex for windows and *nix. So just drop the last rule with variable escape length.

BeeeWall · 2018-04-10T12:28:50Z

I'm not dealing with pipes yet, that would probably be really tough, because I think I'd have to change System.out, parse it to a String, and probably more. But just using a caret as an eecapePattern is already possible if you add it, as remkop said, and the rest seems to mostly be pipe related. The percent and exclamation points would require me to build them into the code, which would require yet another flag, and would not really be able to be customized, I don't think. I might be able to make a Map with "special escapes" or something, but not sure.

BeeeWall · 2018-04-10T12:36:13Z

If you look, there are already methods that take a boolean, which clear and set to defaults (depending on if it is false or true, respectively). I had actually planned to remove the whole "wipe if under 0" thing, if you look I actually did remove it for commentChar, so I must've forgotten. Either way, the methods take a char, which is unsigned, so it can't ever even be under 0, that was when I was using ints (because StreamTokenizer does).

saaadel · 2018-04-11T03:41:30Z

About flags, I think it will be better to move all of them to the factory, that returns actual parser via interface. So, we can extend the factory with custom flags, and to add new parser implementations for new shells

remkop · 2018-04-12T00:38:56Z

I’m a bit confused. I thought we were talking about the tokenizer parser. Are you talking about the picocli.CommandLine.Interpreter parser? That class is not published API.

remkop · 2018-04-12T01:12:04Z

When I read POSIX parser I assumed you meant a parser for the POSIX.1-2017 Utility Argument Syntax. This is implemented in picocli.CommandLine.Interpreter.

I realize now you may have been talking about POSIX.1-2017 Token Recognition instead.

BeeeWall · 2018-04-12T03:44:21Z

I should probably actually read that, I have just been doing this based off of my current shell knowledge and never even thought of the POSIX standard.
Like I said, I'll look into some sort of special escapes, which could allow for some changes, and the variable number of escape characters would probably be implemented as an option somewhere else, if I end up creating pipe support (a full interactive shell will probably take a while, to get stuff like pipes done).
When everything is done for the tokenizer, etc, I'll probably create presets for Windows and POSIX syntaxes.

saaadel · 2018-04-12T04:14:16Z

Yep, I meant about token recognition for POSIX shells.

remkop · 2018-04-12T04:34:19Z

@PorygonZRocks your approach sounds reasonable to me. Adding Presets to imitate the behaviour of specific OS shells would be a nice convenience layer on top of the tokenizer class.

BeeeWall · 2018-09-29T13:24:15Z

Oops, I forgot about this. Anyway, I'm going through the code, and just realized that it may be better to just have the methods take a String rather than a char. Would you like me to keep the char methods in addition to the String ones, or get rid of them?

…eset

remkop · 2018-09-30T00:19:06Z

I recently did some work on an interactive shell when helping Micronaut migrate their interactive CLI to picocli. Their CLI was based on JLine 2. JLine has support for command history, command line completion, ANSI and more. Have you had a chance to look at it?

I found that JLine and picocli complement each other very well, there is no overlap in features. JLine is well done and I don't want to rewrite JLine in picocli. Instead, I am planning to provide some classes and documentation to help users integrate JLine and picocli, starting with a completer and some examples.

It turns out that JLine includes a tokenizer. I'm not sure yet what to do with this PR. Can you take a look at the above and let me know your thoughts?

BeeeWall · 2018-09-30T00:42:30Z

That sounds a lot better, for some reason I had thought JLine was made with native libraries.

BeeeWall · 2018-09-30T00:45:48Z

I may still look at creating a simpler way to use the two to create an interactive shell easily, but looks like this tokenizer isn't needed.

remkop · 2018-09-30T01:15:16Z

I’m thinking to rename the module to picocli-jline2-shell to make it more generic. That way other functionality to support interactive shells can be added later.

Suggestions for such features or documentation are welcome!

remkop requested changes Mar 2, 2018

View reviewed changes

remkop mentioned this pull request Mar 7, 2018

Command Line History Accessible via Up Arrow #296

Closed

remkop mentioned this pull request Apr 4, 2018

How to parse single string, not string array? #324

Closed

remkop mentioned this pull request May 18, 2018

Interactive CLI #380

Closed

PorygonZRocks added 14 commits September 23, 2018 14:29

Add CommandTokenizer class

208a4e5

Make parse methods public, don't call parse in constructor

1b13bed

Add Javadoc for CommandTokenizer and move it to new package

204a03c

Change some chars to Strings (should fix TravisCI)

9a2cd1d

Remove braces from @see tags

58ddd3e

Add unit tests for parsing a String

92b81d1

Change <br /> to <br>, fix table, fix incorrect method reference

7956365

Add summary for defaults table

d644325

Tests should not be static

983df08

Change curChar to String (should fix checking if contains curChar)

3c3a8e7

Fix endindex in substring for curChar and preChar

0f94b0f

Print out array for comments test (for troubleshooting Travis)

dca1063

Trim blanks from Strings, not just Scanners

ffd3c96

Remove a bunch of unneeded code, move resetSyntax farther down

eaa3a23

BeeeWall force-pushed the commandtokenizer branch from f3aa509 to eaa3a23 Compare September 23, 2018 19:42

PorygonZRocks added 2 commits September 29, 2018 10:12

Add String versions of *Char methods, add Bourne shell and batch presets

5513816

Change batch preset, rename Bourne preset to POSIX, add PowerShell pr…

e1a12cf

…eset

BeeeWall closed this Sep 30, 2018

BeeeWall deleted the commandtokenizer branch September 30, 2018 00:48

(WIP) Add support for quoting parameters #293

(WIP) Add support for quoting parameters #293

Conversation

BeeeWall commented Mar 1, 2018 • edited

codecov-io commented Mar 1, 2018 • edited

Codecov Report

remkop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BeeeWall commented Mar 27, 2018

BeeeWall commented Mar 27, 2018

remkop commented Mar 27, 2018 • edited

BeeeWall commented Mar 28, 2018

BeeeWall commented Mar 28, 2018

remkop commented Mar 29, 2018

BeeeWall commented Mar 29, 2018

remkop commented Mar 29, 2018

remkop commented Mar 29, 2018

BeeeWall commented Mar 30, 2018 • edited

remkop commented Mar 31, 2018

BeeeWall commented Mar 31, 2018 • edited

remkop commented Mar 31, 2018

BeeeWall commented Mar 31, 2018

BeeeWall commented Mar 31, 2018 • edited

BeeeWall commented Mar 31, 2018

remkop commented Apr 1, 2018

saaadel commented Apr 9, 2018 • edited

BeeeWall commented Apr 10, 2018 • edited

saaadel commented Apr 10, 2018 • edited

remkop commented Apr 10, 2018

saaadel commented Apr 10, 2018

saaadel commented Apr 10, 2018 • edited

BeeeWall commented Apr 10, 2018

BeeeWall commented Apr 10, 2018

saaadel commented Apr 11, 2018

remkop commented Apr 12, 2018 • edited

remkop commented Apr 12, 2018 • edited

BeeeWall commented Apr 12, 2018

saaadel commented Apr 12, 2018

remkop commented Apr 12, 2018

BeeeWall commented Sep 29, 2018

remkop commented Sep 30, 2018

BeeeWall commented Sep 30, 2018

BeeeWall commented Sep 30, 2018

remkop commented Sep 30, 2018

BeeeWall commented Mar 1, 2018 •

edited

codecov-io commented Mar 1, 2018 •

edited

remkop commented Mar 27, 2018 •

edited

BeeeWall commented Mar 30, 2018 •

edited

BeeeWall commented Mar 31, 2018 •

edited

BeeeWall commented Mar 31, 2018 •

edited

saaadel commented Apr 9, 2018 •

edited

BeeeWall commented Apr 10, 2018 •

edited

saaadel commented Apr 10, 2018 •

edited

saaadel commented Apr 10, 2018 •

edited

remkop commented Apr 12, 2018 •

edited

remkop commented Apr 12, 2018 •

edited