New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support compiler_param_file features on Windows #49
Conversation
For context for any other readers, this fixes #35. Worth reading that also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing, @LoSealL!
I'm delighted that this route is working and really appreciate your good efforts here.
I'd love it if we cleaned things up just a bit before merging, though. Comments inline
refresh.template.py
Outdated
@@ -147,6 +148,17 @@ def _get_headers_msvc(compile_args: typing.List[str], source_path: str): | |||
# End: template filled by Bazel | |||
)) | |||
|
|||
# Write header_cmd to a temporary file, so we can use it as a parameter to cl.exe, | |||
# because Windows cmd has a limitation of 8KB of command line length. So here we make a threshold of len(compile_args) < 8000 | |||
WIN_CMD_LIMITS = 8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth documenting the 8191--and why you chose a smaller limit? I assume because length calculate is an approximation of the real one with escaping https://docs.python.org/3/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows
Another, (maybe better) approach could be try-except whatever exception it throws when the command is too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add better doc here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S.g. Thoughts on the pro/con of the try-except approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-except will be conflict with the following logic:
for line in header_search_process.stderr.splitlines():
We can't detect whether the error comes from compile failure or command line overflow. Not sure if errorcode can help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I was thinking that there's probably a (Python) Error thrown if the line is longer than the command length limit (after escaping and all that). Like this one https://stackoverflow.com/questions/2381241/what-is-the-subprocess-popen-max-length-of-the-args-parameter? You could test by exceeding the limit to subprocess.run and seeing what it throws!
(Let's definitely not rely on parsing an error message out of stderr--your hardcoded 8000 is more reliable than that!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpsauer Tested on my Windows 10 21H2:
For powershell
, it can support user to type more than 8191 charactors on the console, while for cmd
, user can't type any more if command line reaches 8191 charactors.
If you execute command from a script (i.e. .ps1
or .cmd/.bat
), the cmd may ignore charactors that exceeds 8191 limitation, while powershell doesn't. For example:
CMD:
set a=1234...(more than 8191 chars)....EOL
:: will only show 1234....(to 8191 length), can't echo "EOL", and no error
echo %a%
PWSH:
$env:a="1234....(more than 8191 chars)...EOL"
# show full $a string, everything is OK
echo $env:a
But if your file path is too long (by default it can't longer than 254), most applications will fail with an error (just as your link) with both cmd and powershell console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure, but I think we're talking past each other somehow.
I meant that I'd guess Python would throw an Error if the args to subprocess.run were too long, even with check=False, rather then letting the error happen in the Windows shell. That is, is an error like the one linked thrown if you temporarily replace header_cmd
in the subprocess command with "a "*10000
(or similar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got your point.
Just test and subprocess.call
will return error code 1, with an stderr output "command line too long".
We may check return code 1 or stderr text (the text is in locale language)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/1bc92ddf-b79e-413c-bbaa-99a5281a6c90
msvc compiler (only tested cl.exe) normally returns error code 2 (such as command error, compile error...)
We will use error code 1 for command line too long problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, bummer. So no Python error, then. Sorry for pushing in the wrong direction.
Do whatever you think is safest and most robust here. If you're confident code 1 is always this command-length-issue, then do that. And if not, then let's fall back to your old solution. (Again, sorry.)
Thanks so much, @LoSealL! |
06577f3
to
b9587c6
Compare
b9587c6
to
a3d3651
Compare
refresh.template.py
Outdated
if header_search_process.returncode == 1: | ||
# Write header_cmd to a temporary file, so we can use it as a parameter file to cl.exe, because Windows cmd has a limitation of 8191 charactors. | ||
# See https://docs.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/command-line-string-limitation | ||
# To overcome this issue, we can call command parameters from a params file and use the '@' switch to pass the file name. | ||
# E.g. cl.exe @params_file.txt | ||
# If the return code is 1, it means the command line is too long (>=8191), so we write the command to a temp file and call from it. | ||
temp_params = tempfile.NamedTemporaryFile('w') | ||
# should skip cl.exe the 1st line | ||
temp_params.write('\n'.join(header_cmd[1:])) | ||
header_cmd = [header_cmd[0], f'@{temp_params}'] | ||
header_search_process = _search_headers(header_cmd) | ||
# close and delete the temp file we created | ||
temp_params.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for documenting things so well, and sharing code via the local fn!
A few last things:
- I didn't read the docs carefully enough on NamedTemporaryFile the first time. I'm now seeing
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
...which seems like it would cause problems. How do you think we should solve this? (Perhaps we do need to use tempfile.mkstemp after all...sorry.)
2) I think we want to be careful to make sure that we're flushing the write offer before calling subprocess run (or disabling buffering).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc in the source code says that NamedTemporaryFile does use mkstemp to create file... I'd like to double check that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None,
newline=None, suffix=None, prefix=None,
dir=None, delete=True, *, errors=None):
"""Create and return a temporary file.
Arguments:
'prefix', 'suffix', 'dir' -- as for mkstemp.
'mode' -- the mode argument to io.open (default "w+b").
'buffering' -- the buffer size argument to io.open (default -1).
'encoding' -- the encoding argument to io.open (default None)
'newline' -- the newline argument to io.open (default None)
'delete' -- whether the file is deleted on close (default True).
'errors' -- the errors argument to io.open (default None)
The file is created as mkstemp() would do it.
Returns an object with a file-like interface; the name of the file
is accessible as its 'name' attribute. The file will be automatically
deleted when it is closed unless the 'delete' argument is set to False.
"""
prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
flags = _bin_openflags
# Setting O_TEMPORARY in the flags causes the OS to delete
# the file when it is closed. This is only supported by Windows.
if _os.name == 'nt' and delete:
flags |= _os.O_TEMPORARY
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
try:
file = _io.open(fd, mode, buffering=buffering,
newline=newline, encoding=encoding, errors=errors)
return _TemporaryFileWrapper(file, name, delete)
except BaseException:
_os.unlink(name)
_os.close(fd)
raise
def _mkstemp_inner(dir, pre, suf, flags, output_type):
"""Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""
names = _get_candidate_names()
if output_type is bytes:
names = map(_os.fsencode, names)
for seq in range(TMP_MAX):
name = next(names)
file = _os.path.join(dir, pre + name + suf)
_sys.audit("tempfile.mkstemp", file)
try:
fd = _os.open(file, flags, 0o600)
except FileExistsError:
continue # try again
except PermissionError:
# This exception is thrown when a directory with the chosen name
# already exists on windows.
if (_os.name == 'nt' and _os.path.isdir(dir) and
_os.access(dir, _os.W_OK)):
continue
else:
raise
return (fd, _os.path.abspath(file))
raise FileExistsError(_errno.EEXIST,
"No usable temporary file name found")
Pasted the code here. It calls _mkstemp_inner
which is same as mkstemp
, and it uses a _RandomNameSequence
to generate temp file name. TMP_MAX
is 10000 to guarantee there's no name conflict.
Now the test code:
import os
import tempfile
import unittest
from concurrent.futures.thread import ThreadPoolExecutor
def _create_temp(bin):
with tempfile.NamedTemporaryFile('w') as f:
bin.append(f.name)
f.write('abc')
class TestNamedTemporaryFile(unittest.TestCase):
def test_mt_create(self):
TEMP_BIN = []
Q = 1000000
pool = ThreadPoolExecutor(os.cpu_count())
[pool.submit(_create_temp, TEMP_BIN) for _ in range(Q)]
pool.shutdown(wait=True)
self.assertEqual(len(TEMP_BIN), Q)
self.assertEqual(len(set(TEMP_BIN)), Q)
if __name__ == '__main__':
unittest.main()
Which is passed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, interesting!
I wonder what part prompted them to add that reopen warning to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Looks like the thing that causes the problem with NamedTemporaryFile vs mkstemp in the source above is O_TEMPORARY.
Explained over at https://stackoverflow.com/questions/15169101/how-to-create-a-temporary-file-that-can-be-read-by-a-subprocess
And flushing comes up there, too.
I also want to make sure we aren't accidentally waiting on each other. I'm still waiting on you fixing this one before merging, right? And you're not waiting on me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpsauer Ah, I got what you mean here, and there's indeed a potentail risk for cl.exe
to read an opened temp file (while it works for me though...). Let's delete it manually as this post did.
refresh.template.py
Outdated
if compile_only_flag not in compile_args: | ||
raise ValueError(f"{compile_only_flag} not found in compile_args: {compile_args}") | ||
source_index = compile_args.index(compile_only_flag) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think index will already throw a ValueError if not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just throws a KeyError, I think this exception will better let user know witch compile_args
is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[At least on my system]
>>>"".index("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
But regardless, this is going to lead to a hard crash, right? Like this isn't something we expect to show the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But w/o this two lines it still goes to a hard crash if both /c
or -c
not found...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree!
It'll always say "-c not found...", though, bc of the previous line, though, right?
Not super opposed or anything, I'm not sure how much this adds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! What about change typo to
if compile_only_flag not in compile_args:
raise ValueError(f"/c or -c (required for parsing sources) is not found in compile_args: {compile_args}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I really think this is okay without! They should always be present from Bazel, and we'd get a crash that user will report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will just delete them. Maybe I only encounter during my development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I'm silly. You're adding it so that the contents of the array itself will be in the error message for easy debugging, right? That's nice, because then people will actually report the compile command as part of the error, even if we don't anticipate this case arriving during normal use.
(Sorry I was slow to understand.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ To be clear, if easy, could you re-add your code from #49 (comment). Again, my apologies for being slow to understand what you were up to.
a3d3651
to
12d5082
Compare
12d5082
to
f90cfc3
Compare
@cpsauer I updated the commit |
Sweet. But perhaps we should take one of the more robust answers from that stack overflow post? |
You may have a deeper understanding of what's going on in windows and O_TEMPORARY? Perhaps it's opened safely by msvc anyway? Not sure if you saw the part of that post about being able to reopen it using O_TEMPORARY? Still do think we should make sure we flush, though. |
For |
What do you mean by referring more robust answer? Can you paste the code? And for CL error code, as this page said, it is the error code of Windows and there are lots of possible values... |
Hey @LoSealL: I finally got back to my Windows machine and took a shot at the remaining things. I think we definitely are all set on turning off compiler_param_file to aquery. ☑️ Simple, done, and clearly right. For spilling arguments to command files when things get too long: That said, I'm still far from confident that we're handling all the cases right.
Thoughts? Or ideas on how to do this well? I wonder if, e.g., SCons would have some good code to look at here. Also, separate, but any chance I could ask you to give the new And more generally, if there are other Windows issues you're seeing, I'd love to know. Thanks so much, |
@cpsauer Updated: I even can't reproduce my issue today, which I thought could caused by compile and link too many source files. Maybe it's solved by upgrading VS version? So for now, we can make a conclusion here:
|
Add more tests to sample repo, please check there. Bazel itself can't support path with space with
Currently no issues found here.
Sorry, I can't reproduce either today. So it's right to stick to your solution.
For temp file that write and read internally, utf-8 is OK, because we both write and read with utf-8. (CL.exe reads utf-8 normally). Issues only happen when python reads a non-utf file. (i.e. a native notepad saved file, or echo command flush to a file with non utf-8 codepage.)
|
Great! Thanks for working to clarify things, @LoSealL. I quickly added some escaping in case Bazel ever fixes its implementation. And then I merged this in. Could I ask you to proofread the latest version, test it, and report back? |
Sure, also my many thanks to your great support! |
And thank you for sticking with it! I know it's been a long adventure, and I really appreciate your help, start to finish |
Some last miscellaneous things:
Just to confirm: You can aquery successfully without the And in your test repo, I'm wondering if the following explains what was going on: |
Anyway, please do tell me if you have any concerns with what we've got here! I'd love to hear them. And mostly, thank you for all your efforts and for checking and testing! |
@LoSealL, could I ask one more Windows favor? [To explain the importance: I'm trying to speed up the header finding by examining dependency files Bazel has cached. On macOS/Linux, there are .d files, listing headers in makefile format, and I'd like to know if there's something similar on Windows. fe5e048 is the commit if you're curious.] |
@cpsauer No, there's no such |
Highlights:
--feature=compiler_param_file
on command line or .bazelrc file,_get_file
can read the actual compile args in the fileLowlights:
--feature=compiler_param_file
, user must build the target once, becausebazel aquery
won't actually generate the.params
file. (Which I think is a bazel limitation)