Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX: Invalid handling of parameter values starting with double quotes ("), e.g. --order-by='"quoted" value' #959

Closed
sandreas opened this issue Sep 10, 2022 · 29 comments · Fixed by #1048
Assignees
Labels
bug Something isn't working

Comments

@sandreas
Copy link

Important: This is a follow-up issue (#891), since it was closed without possibility to give feedback, reopen or answering my follow-up questions. I really think this is clearly a bug of spectre and it should be further investigated, because input values are MODIFIED without any developer interaction... It may also be possible, that this leads to a SECURITY FLAW because parameters are mistreated, but I'm no expert.

This also may not be reproducible in any OS (e.g. on Windows) AND also not via Debugger or IDE, since command line parameter handling is different. To reproduce the issue, I recommend using Linux and compile the given test program in release mode.

The last answer I got stated this about argument parsing:

Unfortunately, this is out of our hands. We are just parsing the args that .NET gets from its command line handler. In fact, on Windows I get different results from you with different examples.

I think, this is wrong, because at least something in spectre seems to Trim whitespaces (Details: #891 (comment)) beause the following example does not modify parameters, while spectre does.

// Program.cs, dotnet 6
    Console.WriteLine("== Environment.CommandLine ==");
    Console.WriteLine(Environment.CommandLine);
    Console.WriteLine("");

    Console.WriteLine("==  Environment.GetCommandLineArgs() ==");
    foreach(var a in Environment.GetCommandLineArgs()){
        Console.WriteLine(a);
    }
    Console.WriteLine("");

    Console.WriteLine("== args ==");
    foreach(var b in args){
        Console.WriteLine(b);
    }
    return 0;

Information

  • OS: Linux, Ubuntu 18.04 LTS
  • Version: 0.45
  • Terminal: xterm-256color, zsh

Describe the bug
Parameter values cannot start with double quotes ("). They are either replaced or handled completely wrong. Additionally, parameter values are TRIMMED (remove leading spaces), so adding a space to workaround this is not possible.

The following example even cuts off the parameter value and treats everything after the space as EXTRA argument, in the following case value is handled as positional argument and not part of --order-by:

cli-tester test --order-by '"quoted" value' input.mp3

This may be a security flaw under specific circumstances.

More examples of invalid handling:

# `quoted` instead of `"quoted" value` and cut off handling as positional argument
cli-tester test --order-by '"quoted" value' input.mp3
cli-tester test --order-by='"quoted" value' input.mp3

# `quoted` instead of `"quoted"` 
cli-tester test --order-by '"quoted"' input.mp3

# `quoted` instead of ` "quoted"` 
cli-tester test --order-by ' "quoted"' input.mp3

The following examples work like expected, if not starting with double quotes or single quotes are used

# `value "quoted" like expected
cli-tester test --order-by 'value "quoted"' input.mp3

# `'quoted' value` like expected
cli-tester test --order-by "'quoted' value" input.mp3

# `\"quoted" value' value` like expected
cli-tester test --order-by='\"quoted" value' input.mp3

To Reproduce

Expected behavior
I would expect double quotes (") and spaces ( ) CAN be a valid part of a parameter value and should not be replaced or parsed out in any way.

@sandreas sandreas added the bug Something isn't working label Sep 10, 2022
@phil-scott-78
Copy link
Contributor

I took a look some more in this. I still comes down to how each shell is gonna handle the quotes and each shell is wildly different. Cmd.exe, pwsh, and even running through the debugger all vary wildly in the scenarios you posted with no consistency. But, on your shell and only this shell it does include the quotes in the args and theoretically we could include them. Went ahead and created a unit test shows the failure.

    [Fact]
    public void Quote_Tests()
    {
        // Given
        var app = new CommandAppTester();
        app.Configure(config =>
        {
            config.PropagateExceptions();
            config.AddCommand<GenericCommand<FooCommandSettings>>("test");
        });

        // When
        var result = app.Run(new[]
        {
            "test", "\"Rufus\"",
        });

        // Then
        result.ExitCode.ShouldBe(0);
        result.Settings.ShouldBeOfType<FooCommandSettings>().And(foo => foo.Qux.ShouldBe("\"Rufus\""));
    }

This is a pretty extreme edge case - relying on shells to act consistently with quotes is simply a non-starter as the link in the previous issue talk about. If you or anyone else want to give fixing this bug a shot, consider it up for grabs though, and you are welcome to see if you can't get the parser updated to take this scenario into account. But personally, this is not a behavior I'd ever rely on for end users and I would highly recommend something other than nested quotes in the arguments if you are going to have an app running in multiple shells..

@sandreas
Copy link
Author

sandreas commented Sep 11, 2022

I took a look some more in this.

Thank you for your quick feedback.

I still comes down to how each shell is gonna handle the quotes and each shell is wildly different. Cmd.exe, pwsh, and even running through the debugger all vary wildly in the scenarios you posted with no consistency. But, on your shell and only this shell it does include the quotes in the args and theoretically we could include them. Went ahead and created a unit test shows the failure.

I'm really sorry, if I don't understand the inner workings of shells as much as you do - but I'm willing to learn something here. I'll take a look into this and find out, if I can do anything to support you to fix this behaviour without breaking anything else.

This is a pretty extreme edge case - relying on shells to act consistently with quotes is simply a non-starter as the link in the previous issue talk about. If you or anyone else want to give fixing this bug a shot, consider it up for grabs though, and you are welcome to see if you can't get the parser updated to take this scenario into account. But personally, this is not a behavior I'd ever rely on for end users and I would highly recommend something other than nested quotes in the arguments if you are going to have an app running in multiple shells..

This edge case comes from a real world scenario, otherwise I'm pretty sure, nobody would have found it :-) I'm developing a command line audio tagger named tone and the issue happened, when I tried to set a description for one of my audio files:

tone tag --meta-description='"The aliens will kill us all" said Mr. Jones...' 'an-audio-book.m4b'

But I'll investigate the behaviour of other command line parsers and make a detailed comparison. As I see it, somethere in the Parser or Tokenizer the quotes get removed. I'm on it.

I'll get back to you as soon as possible.

@sandreas
Copy link
Author

I think I found something to investigate. If you look at the code here and below, you see, that the following chars/strings are treated in a special way and I raised issues for handling exactly these wrong under specific circumstances:

and

and

So I think these are all Problems in the CommandTreeTokenizer... I'll do my best.

@sandreas
Copy link
Author

sandreas commented Sep 11, 2022

So after some further investigation, I think that the CommandTreeTokenizer does a few things that are problematic...

Lets say, that you have an array of arguments like this:

cli-tester test --order-by " " --order-by "-birthday" --order-by '"a quoted field" containing more text'

the tokenizer will:

  • consume the space without treating it as a value
  • on character - always call ScanOptions and treat the args value as Option, although it could also be a value, which is the case for -birthday
  • consume " without treating it as part of the value, which is the case for `"a quoted field" containing more text'

I think here is some work to do - the tokenizer should not silently ignore valid chars or make invalid assumptions about the context (e.g. assume that every args value starting with - is automatically an option)... this is something the lexer, parser or context should be aware of...

@sandreas
Copy link
Author

sandreas commented Sep 12, 2022

This is a pretty extreme edge case - relying on shells to act consistently with quotes is simply a non-starter as the link in the previous issue talk about.

@phil-scott-78
While I think a problem in the Tokenizer / Parser logic is no extreme edge case any more and fixing this would take me deep knowledge about how spectre console parses, tokenizes and transfers args to Settings, am I really the right person to look at this?

I did not find any unit tests for the tokenizer... did I miss them? If not, how can I ensure I don't break something when I try to fix this?

Sorry, I just want to prevent putting much work into something that is not useful to anyone.

@sandreas
Copy link
Author

sandreas commented Sep 26, 2022

@phil-scott-78 @patriksvensson
I would really love to get a reply here, I'm trying to help improving this great library... can someone of you at least confirm this problem is part of the Tokenizer / Parser and not an edge case and tell me what the next steps are?

@FrankRay78
Copy link
Contributor

I took a look some more into this. It still comes down to how each shell is gonna handle the quotes and each shell is wildly different.

I have tested this myself and can confirm that my Windows shell does not perform as you found yourself @sandreas

at least something in spectre seems to Trim whitespaces (Details: #891 (comment)) because the following example does not modify parameters, while spectre does.

I have found the following example does modify parameters on my local machine:

// Program.cs, dotnet 6
    Console.WriteLine("== Environment.CommandLine ==");
    Console.WriteLine(Environment.CommandLine);
    Console.WriteLine("");

    Console.WriteLine("==  Environment.GetCommandLineArgs() ==");
    foreach(var a in Environment.GetCommandLineArgs()){
        Console.WriteLine(a);
    }
    Console.WriteLine("");

    Console.WriteLine("== args ==");
    foreach(var b in args){
        Console.WriteLine(b);
    }
    return 0;

When run, it produces the following output, which clearly shows the quoting of the command line arguments are being modified before even being made available to the .Net console app:

image

As this in independent of spectreconsole, I struggle to see how consistent handling of various quoted arguments can be addressed within spectre itself, and as @phil-scott-78 pointed out above, spectre is at the mercy of the executing shell when it comes to the initial handling of arguments.

@sandreas
Copy link
Author

sandreas commented Oct 26, 2022

As this in independent of spectreconsole, I struggle to see how consistent handling of various quoted arguments can be addressed within spectre itself, and as @phil-scott-78 pointed out above, spectre is at the mercy of the executing shell when it comes to the initial handling of arguments.

I think we have a misunderstanding here. Windows shell does not support single quotes (') as enclosures but handles it as normal char (as I described here and you find more details here). This means that you have a very consistent behaviour in args handling but you have to call your console app different on Windows cmd to get the same results:

# windows
ConsoleApp.exe --order-by """quoted"" value" input.mp3

# linux
ConsoleApp --order-by '"quoted" value'
ConsoleApp --order-by "\"quoted\" value" # variant with escaped double quotes

So it's not about the windows shell does things WRONG, it just has different rules for calling your app. Same as * gets expanded on Linux by the SHELL BEFORE calling your app and not treated as string argument:

# linux
ConsoleApp *
# expands to files in ./, its like calling this for your app
# ConsoleApp input1.mp3 input2.mp3 

(e.g. args will be input1.mp3 input2.mp3and not *) while

# use the * as string argument that can be treated by the app
ConsoleApp '*'

will give you one single arg (*) and the shell will expand nothing. That's why you have to single quote the star if you are using the find command line tool:

find . -name '*.mp3'

Can you confirm the described behaviour?

As a result I would not even handle single quotes in spectre.console in any way. It's just not supported for quoting using Windows. If the user calls:

ConsoleApp --order-by '"quoted" value'

on Windows, he is calling it wrong - not the app is handling it wrong.

@FrankRay78
Copy link
Contributor

Ok, apologies if I totally misunderstood - I see your comments are about what happens once quotes have made their way into the argument (one way or another). I’ll take another more detailed look…..

@sandreas
Copy link
Author

sandreas commented Oct 26, 2022

Ok, apologies if I totally misunderstood - I see your comments are about what happens once quotes have made their way into the argument (one way or another). I’ll take another more detailed look…..

No need to apologize... As long as I get responses to my issue postings, I'm fine with writing whole essays about what I think needs to be improved ;-) Sometimes command line is very tricky because of the many layers involved.

There is a OPERATING SYSTEM, aTERMINAL, a SHELL and an APP - all with different options, encodings and possibilities.

One of my golden rules in software development is: Don't you ever fiddle around with strings - and boy prevent modifying them without a valid encoding/decoding pair of functions
If you take the strings as is, you won't get problems and everything is straightforward to handle and the most dangerous functions in most languages besides eval are Substring and Regex.Replace ;)

Since spectre.console is the most versatile and still best command line library I could find, I have a huge interest that this gets fixed. Thank you for your effort and for taking me seriously.

@FrankRay78
Copy link
Contributor

Ok, I can reproduce what you have reported across several bug reports and the discussion thread (#1025) with the following unit test:

    [Theory]
    [InlineData("\"\"")]
    [InlineData("\" \"")]

    // Special character handling
    [InlineData("-R")]
    [InlineData("-Rufus")]
    [InlineData("--R")]
    [InlineData("--Rufus")]

    // Double-quote handling
    [InlineData("\"Rufus\"")]
    [InlineData("\" Rufus\"")]
    [InlineData("\"-R\"")]
    [InlineData("\"-Rufus\"")]
    [InlineData("\" -Rufus\"")]

    // Single-quote handling
    [InlineData("'Rufus'")]
    [InlineData("' Rufus'")]
    [InlineData("'-R'")]
    [InlineData("'-Rufus'")]
    [InlineData("' -Rufus'")]
    public void Quote_Tests(string actualAndExpected)
    {
        // Given
        var app = new CommandAppTester();
        app.Configure(config =>
        {
            config.PropagateExceptions();
            config.AddCommand<GenericCommand<FooCommandSettings>>("test");
        });

        // When
        var result = app.Run(new[]
        {
            "test", actualAndExpected,
        });

        // Then
        result.ExitCode.ShouldBe(0);
        result.Settings.ShouldBeOfType<FooCommandSettings>().And(foo => foo.Qux.ShouldBe(actualAndExpected));
    }

All of the test cases (except for the 5 single quote scenarios) fail, see:

image

I don't mind having a look into what may be required to fix these failing tests, which if successful, would be a massive step forward in addressing many edge case issues regarding command line argument handling.

@sandreas
Copy link
Author

sandreas commented Oct 27, 2022

Ok, I can reproduce what you have reported across several bug reports and the discussion thread (#1025) with the following unit test:

Awesome :-)

I don't mind having a look into what may be required to fix these failing tests, which if successful, would be a massive step forward in addressing many edge case issues regarding command line argument handling.

Great, thank you very much. Maybe we should keep in mind that fixing these unit-tests are not gonna fix the design problem, that EVERY argument is going through the method CommandTreeTokenizer.Parse:

This design problem does not only result in unnecessary parsings (Performance, Security), but not regarding the previous type of Token for an argument leads to other bugs:

So maybe it's worth also taking a look at:

  • var tokenizerResult = CommandTreeTokenizer.Tokenize(context.Arguments);
    Here, _configuration.Commands[*].Parameters (class CommandModel, IList<CommandInfo>, IList<CommandParameter>) is not regarded in any way, which in my opinion would be required to check, if the previous argument was one of these specific TokenTypes to check, if it has to be parsed or just used as is:
    • SUBCOMMAND (e.g commit in git commit)
    • FLAG (e.g. --force)
    • OPTION (e.g. --message or -m)
    • VALUE (e.g. "my message" in -m "my message")
    • ARGUMENT (e.g. some-file.txt in git add some-file.txt)
    • OPTION=VALUE (e.g. --message=MyMessage)
    • SHORTOPTION=VALUE (e.g. -mMyMessage)
    • MULTISHORTFLAG (e.g. -fd for --force --debug
  • Here I think the args array is parsed without knowing the TokenType of the arg before, so every arg is parsed by a tokenizer - even the VALUE args. As I described here, there are several assumptions that can be made depending on the TokenType of the arg before the current one and most of them do not need any parsing.

I'm very excited that now it is clear, what I meant :-) Thank you for your hard work.

@FrankRay78
Copy link
Contributor

As part of fixing the ‘flags cannot be set’ issue, I introduced a HadSeparator flag on command tree token which allows the CommandTreeParser some knowledge of context, see here: https://github.com/spectreconsole/spectre.console/blob/dded816d97f5a6cbf9c1831aa00fe46684d1ac5e/src/Spectre.Console.Cli/Internal/Parsing/CommandTreeToken.cs

however, this is more of a tactical implementation when compared to your design suggestion above. I don’t know the codebase well enough to refactor extensively yet, however the command tree parser looks easily backed by an interface and dependency injected.

For now, I’ll see what I can do tactically to fix the unit tests, so it can be released sooner than later. Then look at perhaps a new implementation of the parser following a statemachine or similar. Which can be injected and validated with existing unit tests, instead of attempting open heart surgery on the currrent codebase.

@sandreas
Copy link
Author

sandreas commented Oct 27, 2022

You sir, deserve a cookie!

Maybe it would be a good idea to implement the args parser framework / library agnostic.

class SubCommand {
    public string Name;
}
class Flag {
    public string Name;
    public string[] Aliases; // e.g. for shortName
}

class Option {
    public string Name;
    public string[] Aliases; // e.g. for shortName
    public string? Value; // not a real type because every arg is at first a string - this can be done by the parser
}
class Argument {
    public int index;
    public string? Value; // not a real type because every arg is at first a string - this can be done by the parser
}
class SubCommand {
    public IEnumerable<SubCommand> SubCommands {get; set;}
    public IEnumerable<Flag> Flags {get; set;}
    public IEnumerable<Option> Options {get; set;}
    public IEnumerable<Argument> Arguments {get; set;}
}
var tokenizer = new Tokenizer("=");
var subCommands = new List<SubCommand>();
// configure subcommands with arguments and stuff
var parser = new ArgsParser(subCommands, tokenizer);
var result = parser.Parse(args);

Just a thought...

@FrankRay78
Copy link
Contributor

See this @sandreas: https://gist.github.com/FrankRay78/de3e30dafbe44265cba2372e809326a1

Regarding handling of quotes, hyphens and spaces - do you see any cases I've missed?

@sandreas
Copy link
Author

Regarding handling of quotes, hyphens and spaces - do you see any cases I've missed?

Maybe. I don't see the following (maybe this is intended):

  • "\"quoted\" string" (Quoted string that does not end with quotes)
  • " \"quoted\"" (space then quoted string)
  • "" (empty string)
  • " " (space only)
  • " " (multiple spaces - think I found a markdown bug in githubs parser - this also shows only one space ;-))
  • "-" (dash only)
  • "-size" (simple string with dash)
  • "\t" (tab)
  • "\r\n\t" (multiple whitespace that should never happen)
  • "👋🏻" (an emoji)
  • "🐎👋🏻🔥❤️" (multiple emojis)
  • "\"🐎👋🏻🔥❤️\" is an emoji sequence" (multiple emojis quoted with additional text)

I'm sorry ¯\_(ツ)_/¯... If you think one of these cases is not necessary, just leave it out ;-)

@FrankRay78
Copy link
Contributor

I'm starting to suspect that unit testing the POSIX compliant parsing should trim each argument received in the string[] args array, to ensure we are testing as closely as possible to the expected behaviour in the wild. That's because the surrounding shell should split the command line arguments by space, before converting them into an array and passing them in to the .Net console app. That means the only place we see spaces within each arg is when they are contained within enclosed quotes.

@sandreas
Copy link
Author

sandreas commented Oct 29, 2022

I'm starting to suspect that unit testing the POSIX compliant parsing should trim each argument received in the string[] args array

Oh, please don't do that! Trying to fix shell behaviour within a library is not the right place and this is exactly what led to these problems. My suggestion stays the same: Do not modify the arguments in any way.

That means the only place we see spaces within each arg is when they are contained within enclosed quotes

And this is the correct behaviour...

Let's say you want to develop a string splitting command line app with a delimiter (unix cut):

echo "field1 field2 field 3" | cut -d " " -f 2
# field2

Result if you trim each arg: You can't write this app... it just trims away the space... If you really plan to do this, this should not be the default behaviour... maybe this could be a global setting (e.g. Tokenizer.TrimArguments = true), which I truly discourage...

@FrankRay78
Copy link
Contributor

Oh, please don't do that!

Ok fair enough. I'm just in the middle of mental gymnastics, trying to think of all the unit test cases to support. And what not to support.

My PR isn't far away now so you will be able to try it out with your Tone soon enough....

@sandreas
Copy link
Author

Ok fair enough. I'm just in the middle of mental gymnastics, trying to think of all the unit test cases to support. And what not to support.

No offense. I'm just trying to put in my two cents preventing possible mistakes from the past.

My PR isn't far away now so you will be able to try it out with your Tone soon enough....

Sounds awesome, thank you very much for your huge effort.

@FrankRay78
Copy link
Contributor

I've just created this PR #1036 @sandreas (and will fix any build errors that may appear.....)

As mentioned in the PR, the following three PR's when taken jointly represent a significant enhancement to existing command line parsing behaviour: #1036, #1029, #732

If it would be useful, I could merge them all into a local 'preview' branch for you to build and test Tone against?

@sandreas
Copy link
Author

sandreas commented Nov 1, 2022

I've just created this PR #1036 @sandreas (and will fix any build errors that may appear.....)

Awesome, thank you.

If it would be useful, I could merge them all into a local 'preview' branch for you to build and test Tone against?

Well, this is nice, but if you've built a little command line app and tested some of the the discussed use cases manually besides the unit tests, I don't think it is worth the effort... Personally I still don't think it is worth to write and maintain a complex Tokenizer for command line argument parsing but hey, I'm so happy that you took care of this, so I won't complain. Thank you so much, this bothered me for some months now. I can't wait to see the next release and integrate it into my tools.

The only suggestion for improvement I would like to make: The project maintainers should take issues like this more seriously and respond more quickly, if it is in their time allotment. I was really frustrated about how my issues were treated (or better ignored for months) until you took care of it and I was about to switch to https://github.com/Tyrrrz/CliFx at least for command line parsing.

@patriksvensson
Copy link
Contributor

@sandreas

The only suggestion for improvement I would like to make: The project maintainers should take issues like this more seriously and respond more quickly, if it is in their time allotment. I was really frustrated about how my issues were treated (or better ignored for months) until you took care of it and I was about to switch to https://github.com/Tyrrrz/CliFx at least for command line parsing.

We're doing this in our free time. We all have family and work outside of this project. There is no SLA for anything, and everything in this repository is licensed "AS IS".

@sandreas
Copy link
Author

sandreas commented Nov 1, 2022

We're doing this in our free time. We all have family and work outside of this project. There is no SLA for anything, and everything in this repository is licensed "AS IS".

@patriksvensson No offense... I got family, too and I know pretty well, what you are talking about. Just tried to help you with this great project and I felt like there is no interest in additional manpower... this is pretty subjective and I bet not everyone feels like that - maybe it was just a series of unfortunate events. I really hope you just take it as a feedback to improve something, not as criticism and hopefully we can work together in the future improving more stuff of spectre.console :-)

@FrankRay78
Copy link
Contributor

Having never done any real open-source development previously, this has been a real baptism of fire. I'm grateful for both of your help and can understand the different perspectives. Hopefully in the fullness of time, I could perhaps help with PR reviews and testing etc @patriksvensson to shoulder some of the load.

@FrankRay78
Copy link
Contributor

This is the combined PR @sandreas across the various command line parsing issues, in case you want to build and try it out: #1048

@sandreas
Copy link
Author

sandreas commented Nov 3, 2022

Thanks a lot @FrankRay78
Let's hope it gets accepted :-)

@FrankRay78
Copy link
Contributor

This issue can be closed/marked as complete @patriksvensson, now that PR #1048 has been successfully merged.

@patriksvensson
Copy link
Contributor

@FrankRay78 Sounds good to me 😉

@FrankRay78 FrankRay78 self-assigned this Dec 11, 2022
@FrankRay78 FrankRay78 linked a pull request Dec 11, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done 🚀
Development

Successfully merging a pull request may close this issue.

4 participants