Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: os.Args parsing is non-equivalent to Python's sys.argv commandLineToArgv on Windows #39311

Open
nu8 opened this issue May 29, 2020 · 6 comments
Open

Comments

@nu8
Copy link

@nu8 nu8 commented May 29, 2020

Bear with me. If you make a Python program like this:

import sys
s = sys.argv[1]
print(s)

You get this result with Windows:

C:\> python-2.6.2.amd64\python.exe a.py "a ""b"" c"
a "b" c

C:\> python-2.5.4.amd64\python.exe a.py "a ""b"" c"
a "b

The reason is because, with version 2.6.2 (2009), Python switched from
msvcr71.dll to msvcr90.dll. The newer version of the C Runtime, released in
2008, offers improved parsing of the command line arguments:

http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES

With the old version, you have to do this:

C:\> python-2.5.4.amd64\python.exe a.py "a ""b"""" c"
a "b" c

The reason I bring this up, is that is appears Go is still using the parsing
from the old version. If you make a program like this:

package main
import "os"
func main() {
   s := os.Args[1]
   println(s)
}

You get this result:

C:\> a.exe "a ""b"" c"
a "b

I found the source here:

go/src/os/exec_windows.go

Lines 158 to 173 in 8f4151e

// commandLineToArgv splits a command line into individual argument
// strings, following the Windows conventions documented
// at http://daviddeley.com/autohotkey/parameters/parameters.htm#WINARGV
func commandLineToArgv(cmd string) []string {
var args []string
for len(cmd) > 0 {
if cmd[0] == ' ' || cmd[0] == '\t' {
cmd = cmd[1:]
continue
}
var arg []byte
arg, cmd = readNextArg(cmd)
args = append(args, string(arg))
}
return args
}

I think it would make sense to update the parsing code.

@randall77
Copy link
Contributor

@randall77 randall77 commented May 29, 2020

This looks intentional. There's even a comment that says:

				// use "Prior to 2008" rule from
				// http://daviddeley.com/autohotkey/parameters/parameters.htm
				// section 5.2 to deal with double double quotes

Not sure why that was chosen. @alexbrainman may know, he wrote the code.

@odeke-em odeke-em changed the title commandLineToArgv invalid parsing os: os.Args parsing is non-equivalent to Python's sys.argv commandLineToArgv May 29, 2020
@odeke-em odeke-em changed the title os: os.Args parsing is non-equivalent to Python's sys.argv commandLineToArgv os: os.Args parsing is non-equivalent to Python's sys.argv commandLineToArgv on Windows May 29, 2020
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented May 29, 2020

Not sure why that was chosen. @alexbrainman may know, he wrote the code.

We designed Go command line parser to work exactly like CommandLineToArgv Windows API

https://docs.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw

Originally Go os package called CommandLineToArgv Windows API directly. But then we discovered that Shell32.dll (that contains the API) takes too long to load. So we replaced the DLL loading with pure Go code.

But we still test that our parser produces the same result as CommandLineToArgv Windows API. You can see it in os.TestCmdArgs. In fact you can try and add whatever string you like to the test, and, if the test pass, you know, the string is parsed like CommandLineToArgv Windows API.

I think it would make sense to update the parsing code.

I don't see why it makes sense to update parsing code. Can you explain more.

Thank you.

Alex

@nu8
Copy link
Author

@nu8 nu8 commented May 30, 2020

Windows provides two functions that return command line input, GetCommandLineW
and __p___wargv:

__p___wargv is an internal function, so it is only available via argv inside
a C main function. So Go must use GetCommandLineW/CommandLineToArgvW for
this purpose:

go/src/os/exec_windows.go

Lines 101 to 107 in 8f4151e

cmd := windows.UTF16PtrToString(syscall.GetCommandLine())
if len(cmd) == 0 {
arg0, _ := Executable()
Args = []string{arg0}
} else {
Args = commandLineToArgv(cmd)
}

That is unfortunate because the code for __p___wargv is different from the
code for CommandLineToArgvW. With regard to quote escaping, the code for
__p___wargv was improved with the release of Visual C++ 2008. So nothing says
Go must upgrade to this newer style of parsing, but I think it would be a good
idea. With the new parsing, you can quote everything, then escape Quote with an
extra Quote:

C:\> python-2.6.2.amd64\python.exe a.py "a "" b & c"
a " b & c

while with the old style, Quotes cannot be quoted, and must be double escaped:

C:\> python-2.5.4.amd64\python.exe a.py "a "\^"" b & c"
a " b & c
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented May 31, 2020

Windows provides two functions that return command line input, GetCommandLineW
and __p___wargv:

There are more than two ways to encode parameters on Windows. You provided the ref yourself

http://daviddeley.com/autohotkey/parameters/parameters.htm#WIN

So Go must use GetCommandLineW/CommandLineToArgvW for
this purpose:

Go does uses GetCommandLineW. But we don't use CommandLineToArgvW, we have written pure Go command line parser similar to CommandLineToArgvW.

That is unfortunate because the code for __p___wargv is different from the
code for CommandLineToArgvW

Yes that parser / rules are different. But I don't see why it is unfortunately. Either parser is fine.

What is more important, in my opinion, that rules don't change from version to version. Go users can keep compiling their programs with new version of Go, and their executable will behave the same way as before.

Alex

@nu8
Copy link
Author

@nu8 nu8 commented May 31, 2020

Since I discovered this issue recently, I have presented it to several groups.
And not one time, has someone said "oh, we knew about both parsing methods, but
chose to use the CommandLineToArgvW method because $reason.

So it seems that most people didnt realize two parsing styles were even
available. And if thats the case, thats fine. But since I discovered it, I felt
a duty to at least let the proper people know of the issue.

If you decide to do nothing with it, fine. But Microsoft improved the parsing
12 years ago for a reason. Its because the old parsing is painful to use beyond
the simplest examples.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Jun 2, 2020

But since I discovered it, I felt
a duty to at least let the proper people know of the issue.

Oh. Thank you very much for reporting this issue. This is my personal view that Go should not change the parser. But others might think differently.

If you decide to do nothing with it, fine. But Microsoft improved the parsing
12 years ago for a reason. Its because the old parsing is painful to use beyond
the simplest examples.

I will let others decide what to do here.

Thank you.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.