Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Add support for quoting parameters #293

Closed
wants to merge 16 commits into from

Conversation

BeeeWall
Copy link

@BeeeWall BeeeWall commented Mar 1, 2018

This adds a class (CommandTokenzer) that takes text, and splits it based on customizable whitespace characters, and has support for quoting/escaping spaces with customizable quote and escape characters. The methods are intended to be similar to that of StreamTokenizer (hence the name). See here for some more details.

Note that currently the class is not used, as it is not finalized yet and (as I mentioned in the issue) I have to edit CommandLine.java with a normal text editor, not my IDE, so it is easier to keep it separate until completion.

@codecov-io
Copy link

codecov-io commented Mar 1, 2018

Codecov Report

Merging #293 into master will decrease coverage by 2.99%.
The diff coverage is 22.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##             master     #293    +/-   ##
==========================================
- Coverage     89.07%   86.07%    -3%     
- Complexity      281      296    +15     
==========================================
  Files             4        5     +1     
  Lines          3852     4035   +183     
  Branches        933      964    +31     
==========================================
+ Hits           3431     3473    +42     
- Misses          209      343   +134     
- Partials        212      219     +7
Impacted Files Coverage Δ Complexity Δ
...ain/java/picocli/interactive/CommandTokenizer.java 22.95% <22.95%> (ø) 15 <15> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cfe6aa...e1a12cf. Read the comment docs.

Copy link
Owner

@remkop remkop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a quick review and added some comments.

Thanks for working on this! It's really starting to take shape!

@@ -0,0 +1,371 @@
package picocli;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this class in a new package picocli.interactive. I'm thinking that this package may grow in the future to contain additional functionality for interactive command line applications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except this is useful for normal as well as interactive programs. I'm planning (if you're OK with it) to add the finalized version in as an inner class of CommandLine, similar to everything else, and to help with the whole "one file" thing. But if I add anything else for interactive stuff, I'll do that in a separate package (and pull request).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normal usage the operating system shell will take care of the tokenizing and quoting so this is really for interactive use as far as I can see. There are all kinds of interesting things that can be done for picocli-based interactive shells, including things like autocompletion. I really think this should live in a new picocli.interactive package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do. I wasn't sure if the OS handled all that or if picocli did (all my testing has been interactive, so I apparently had forgotten how non interactive apps worked 🙄).

import picocli.CommandLine.Option;
import picocli.CommandLine.Parameters;

public class CommandTokenizer {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have some javadoc that describes what this class does, ideally with an example of how it can be used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'm waiting until it's done though, in case we decide anything needs to be changed.

// Remove blanks at start and end of a string
private boolean trimBlanks = true;

public CommandTokenizer(String cmd) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both constructors start parsing the specified input immediately before users have had a chance to configure the tokenizer. It may be better to postpone the call to parse until the user calls the tokens or nextToken method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right. Didn't think about that, will fix.

this.tokens = parse(input);
}

private String[] parse(String cmd) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add unit tests for this method and the other public parse method? We can make this method package-private to allow unit tests to call it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've never done unit testing before, but for something like this, I think it should be simple.
I might actually make them public, so that people can reuse the object. In that case, I would also add a no-arg constructor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’ve never created unit tests before it will be a revelation. :-) Unit tests are a great way to shake out bugs and create confidence that future changes didn’t break anything. They are crucial to keeping software “soft”.

Please see the picocli tests for examples.

@BeeeWall
Copy link
Author

Huh, somehow the javadocs broke, and that is failing Travis. Not sure how to fix it.

@BeeeWall
Copy link
Author

I think I included all the parameters and returns.

@remkop
Copy link
Owner

remkop commented Mar 27, 2018

The problem seems to be the @see javadoc tags. You probably should not surround the reference with curly braces. See the documentation for the @see javadoc tag.

@BeeeWall
Copy link
Author

Currently, I only have unit tests for the String parse method, but I will add tests for the Scanner method.

@BeeeWall
Copy link
Author

...and the javadoc broke again 😤

@remkop
Copy link
Owner

remkop commented Mar 29, 2018

You'll get there! 😄

@BeeeWall
Copy link
Author

Any idea what broke the javadoc this time? There are a ton of errors in the log, and I've never used javadoc before.

@remkop
Copy link
Owner

remkop commented Mar 29, 2018

(Away from PC). Most of those are warnings, not errors. If you search for “error” you’ll quickly find it.

@remkop
Copy link
Owner

remkop commented Mar 29, 2018

I found a few:

  • error: self-closing element not allowed <br />
  • error: tag not allowed here: <tbody>
  • error: reference not found {@link #tokens()}

@BeeeWall
Copy link
Author

BeeeWall commented Mar 30, 2018

I'm not sure how to fix the issue. During my manual testing, it should succeed. I'm not sure what JUnit is doing that I'm not (or vice-versa) that causes the issue. I'm pretty sure what is happening is it is splitting on the space in the comments test, but when I manually run parse on it, it doesn't split there. I've tried both the String and Scanner versions, to see if there is some difference, but it seems to be the same.

@remkop
Copy link
Owner

remkop commented Mar 31, 2018

The test failure says:

picocli.interactive.CommandTokenizerTest > testComments FAILED
    java.lang.AssertionError: array lengths differed, expected.length=1 actual.length=2
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.internal.ComparisonCriteria.assertArraysAreSameLength(ComparisonCriteria.java:76)
        at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:37)
        at org.junit.Assert.internalArrayEquals(Assert.java:532)
        at org.junit.Assert.assertArrayEquals(Assert.java:283)
        at org.junit.Assert.assertArrayEquals(Assert.java:298)
        at picocli.interactive.CommandTokenizerTest.testComments(CommandTokenizerTest.java:51)

Unsure why. You can try printing the result of the parse to stdout so you can see what the actual results are in the Travis CI log.

@BeeeWall
Copy link
Author

BeeeWall commented Mar 31, 2018

Yep, it's splitting on the space. Like I said, I have zero idea why that would happen, whenever I try the test input, it works, but in JUnit it isn't.

@remkop
Copy link
Owner

remkop commented Mar 31, 2018

Could it have something to do with the difference in environment? Different behaviour when running on Linux?

@BeeeWall
Copy link
Author

Possibly? I'm not using JUnit, because AIDE does not support it. I'm using a mini interactive shell that returns the tokenizer output.

@BeeeWall
Copy link
Author

BeeeWall commented Mar 31, 2018

Maybe a difference between the JVM and the Dalvik VM (or ART, I think), because AIDE works by dexing stuff?

@BeeeWall
Copy link
Author

The issue was that the code to trim blanks was in the Scanner method, not the String one. It's weird that switching to the String one in my manual testing didn't point that out, I must've still been using the Scanner one somehow.

@remkop
Copy link
Owner

remkop commented Apr 1, 2018

Good catch!

@saaadel
Copy link

saaadel commented Apr 9, 2018

From:
#324 (comment)

Feedback:

  • Throws ArrayIndexOutOfBoundsException for empty command line or for command line with spaces only. I think it will be better to return empty array, ArgumentTokenizer works and returns zero length array.
  • if we will have multiple spaces between params - we will get this as empty params in the array. Example: "abc xyz" (three spaces between params) -> Returns: [abc, , , xyz]. FYI ArgumentTokenizer skips those repeatable spaces

Also question:
Why you have two methods with separate implementation?
new CommandTokenizer().parse(str)
new CommandTokenizer().parse(scanner)

I mean, why you not replaced parse(str) implementation with short code?

parse(str) {
  return parse(new Scanner(cmd))
}

@BeeeWall
Copy link
Author

BeeeWall commented Apr 10, 2018

Will fix those issues, thanks!

Currently, the Scanner method actually calls the String method, it just has extra code to allow newlines to be escaped in an interactive shell. But I'll look into using less code for that, so that it is clearer that it is mostly the same.

@saaadel
Copy link

saaadel commented Apr 10, 2018

do you plan to add symbol escaping in windows notation? I mean, caret symbol and something. For example: ^" - escaped quotes (instead of a slash notation for *nix)
Maybe in another method? or with optional flag in the current methods?

@remkop
Copy link
Owner

remkop commented Apr 10, 2018

Shouldn't this be possible in CommandTokenizer already by adding the caret ^ character to the escapePatterns, or am I missing something?

@saaadel
Copy link

saaadel commented Apr 10, 2018

if (ch < 0) {
	this.quoteChars.clear();
}

IMHO it's dirty way, add another methods to clear lists.

@saaadel
Copy link

saaadel commented Apr 10, 2018

@remkop

Shouldn't this be possible in CommandTokenizer already by adding the caret ^ character to the escapePatterns, or am I missing something?

Not so simple, he-he 😄

See there: https://ss64.com/nt/syntax-esc.html

  • simple chars:
    ^^ = ^
    ^" = "
    ^& = &
    ^| = |
    etc.

  • but percents
    %% = %

  • and exclamation marks, boom! 💥 (for extension mode only - we need bool flag here)
    ^^! = !

  • Escaping the pipeline: symbol escaped twice! (for second command in the pipe, or more for next commands in the pipe). For example: abc | echo ^^^&

When a pipe is used, the expressions are parsed twice. First when the expression before the pipe is executed and a second time when the expression after the pipe is executed. So to escape any characters in the second expression double escaping is needed:

^^^& = & (escaped twice if second command in the pipe)

Note: abc|xyz is correct pipe with two commands, without spaces between. it's correct for *nix too.

PS. I think it will be better to drop pipes parsing in first version, because it's complex for windows and *nix. So just drop the last rule with variable escape length.

@BeeeWall
Copy link
Author

I'm not dealing with pipes yet, that would probably be really tough, because I think I'd have to change System.out, parse it to a String, and probably more. But just using a caret as an eecapePattern is already possible if you add it, as remkop said, and the rest seems to mostly be pipe related. The percent and exclamation points would require me to build them into the code, which would require yet another flag, and would not really be able to be customized, I don't think. I might be able to make a Map with "special escapes" or something, but not sure.

@BeeeWall
Copy link
Author

If you look, there are already methods that take a boolean, which clear and set to defaults (depending on if it is false or true, respectively). I had actually planned to remove the whole "wipe if under 0" thing, if you look I actually did remove it for commentChar, so I must've forgotten. Either way, the methods take a char, which is unsigned, so it can't ever even be under 0, that was when I was using ints (because StreamTokenizer does).

@saaadel
Copy link

saaadel commented Apr 11, 2018

About flags, I think it will be better to move all of them to the factory, that returns actual parser via interface. So, we can extend the factory with custom flags, and to add new parser implementations for new shells

@remkop
Copy link
Owner

remkop commented Apr 12, 2018

I’m a bit confused. I thought we were talking about the tokenizer parser. Are you talking about the picocli.CommandLine.Interpreter parser? That class is not published API.

@remkop
Copy link
Owner

remkop commented Apr 12, 2018

When I read POSIX parser I assumed you meant a parser for the POSIX.1-2017 Utility Argument Syntax. This is implemented in picocli.CommandLine.Interpreter.

I realize now you may have been talking about POSIX.1-2017 Token Recognition instead.

@BeeeWall
Copy link
Author

I should probably actually read that, I have just been doing this based off of my current shell knowledge and never even thought of the POSIX standard.
Like I said, I'll look into some sort of special escapes, which could allow for some changes, and the variable number of escape characters would probably be implemented as an option somewhere else, if I end up creating pipe support (a full interactive shell will probably take a while, to get stuff like pipes done).
When everything is done for the tokenizer, etc, I'll probably create presets for Windows and POSIX syntaxes.

@saaadel
Copy link

saaadel commented Apr 12, 2018

Yep, I meant about token recognition for POSIX shells.

@remkop
Copy link
Owner

remkop commented Apr 12, 2018

@PorygonZRocks your approach sounds reasonable to me. Adding Presets to imitate the behaviour of specific OS shells would be a nice convenience layer on top of the tokenizer class.

@remkop remkop mentioned this pull request May 18, 2018
@BeeeWall
Copy link
Author

Oops, I forgot about this. Anyway, I'm going through the code, and just realized that it may be better to just have the methods take a String rather than a char. Would you like me to keep the char methods in addition to the String ones, or get rid of them?

@remkop
Copy link
Owner

remkop commented Sep 30, 2018

I recently did some work on an interactive shell when helping Micronaut migrate their interactive CLI to picocli. Their CLI was based on JLine 2. JLine has support for command history, command line completion, ANSI and more. Have you had a chance to look at it?

I found that JLine and picocli complement each other very well, there is no overlap in features. JLine is well done and I don't want to rewrite JLine in picocli. Instead, I am planning to provide some classes and documentation to help users integrate JLine and picocli, starting with a completer and some examples.

It turns out that JLine includes a tokenizer. I'm not sure yet what to do with this PR. Can you take a look at the above and let me know your thoughts?

@BeeeWall
Copy link
Author

That sounds a lot better, for some reason I had thought JLine was made with native libraries.

@BeeeWall
Copy link
Author

I may still look at creating a simpler way to use the two to create an interactive shell easily, but looks like this tokenizer isn't needed.

@BeeeWall BeeeWall closed this Sep 30, 2018
@BeeeWall BeeeWall deleted the commandtokenizer branch September 30, 2018 00:48
@remkop
Copy link
Owner

remkop commented Sep 30, 2018

I’m thinking to rename the module to picocli-jline2-shell to make it more generic. That way other functionality to support interactive shells can be added later.

Suggestions for such features or documentation are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants