Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support compiler_param_file features on Windows #49

Merged
merged 8 commits into from
May 31, 2022
58 changes: 50 additions & 8 deletions refresh.template.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"""

import sys
import tempfile
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
if sys.version_info < (3,7):
sys.exit("\n\033[31mFATAL ERROR:\033[0m Python 3.7 or later is required. Please update!")
# 3.7 backwards compatibility required by @lummax in https://github.com/hedronvision/bazel-compile-commands-extractor/pull/27. Try to contact him before upgrading.
Expand Down Expand Up @@ -38,7 +39,7 @@
# Downside: We could miss headers conditionally included, e.g., by platform.
# Implementation: skip source files we've already seen in _get_files, shortcutting a bunch of slow preprocessor runs in _get_headers and output. We'd need a threadsafe set, or one set per thread, because header finding is already multithreaded for speed (same magnitudespeed win as single-threaded set).
# Anticipated speedup: ~2x (30s to 15s.)
# A better one would be to cache include information.
# A better one would be to cache include information.
# We could check to see if Bazel has cached .d files listing the dependencies and use those instead of preprocessing everything to regenerate them.
# If all the files listed in the .d file have older last-modified dates than the .d file itself, this should be safe. We'd want to check that bazel isn't 0-timestamping generated files, though.
# We could also write .d files when needed, saving work between runs.
Expand All @@ -50,7 +51,7 @@ def _print_header_finding_warning_once():
# Shared between platforms

# Just log once; subsequent messages wouldn't add anything.
if _print_header_finding_warning_once.has_logged: return
if _print_header_finding_warning_once.has_logged: return
_print_header_finding_warning_once.has_logged = True

print("""\033[0;33m>>> While locating the headers you use, we encountered a compiler warning or error.
Expand Down Expand Up @@ -147,6 +148,17 @@ def _get_headers_msvc(compile_args: typing.List[str], source_path: str):
# End: template filled by Bazel
))

# Write header_cmd to a temporary file, so we can use it as a parameter to cl.exe,
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
# because Windows cmd has a limitation of 8KB of command line length. So here we make a threshold of len(compile_args) < 8000
WIN_CMD_LIMITS = 8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth documenting the 8191--and why you chose a smaller limit? I assume because length calculate is an approximation of the real one with escaping https://docs.python.org/3/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows

Another, (maybe better) approach could be try-except whatever exception it throws when the command is too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add better doc here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S.g. Thoughts on the pro/con of the try-except approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-except will be conflict with the following logic:

for line in header_search_process.stderr.splitlines():

We can't detect whether the error comes from compile failure or command line overflow. Not sure if errorcode can help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I was thinking that there's probably a (Python) Error thrown if the line is longer than the command length limit (after escaping and all that). Like this one https://stackoverflow.com/questions/2381241/what-is-the-subprocess-popen-max-length-of-the-args-parameter? You could test by exceeding the limit to subprocess.run and seeing what it throws!
(Let's definitely not rely on parsing an error message out of stderr--your hardcoded 8000 is more reliable than that!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpsauer Tested on my Windows 10 21H2:
For powershell, it can support user to type more than 8191 charactors on the console, while for cmd, user can't type any more if command line reaches 8191 charactors.
If you execute command from a script (i.e. .ps1 or .cmd/.bat), the cmd may ignore charactors that exceeds 8191 limitation, while powershell doesn't. For example:

CMD:

set a=1234...(more than 8191 chars)....EOL
:: will only show 1234....(to 8191 length), can't echo "EOL", and no error
echo %a%

PWSH:

$env:a="1234....(more than 8191 chars)...EOL"
# show full $a string, everything is OK
echo $env:a

But if your file path is too long (by default it can't longer than 254), most applications will fail with an error (just as your link) with both cmd and powershell console.

Copy link
Contributor

@cpsauer cpsauer May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure, but I think we're talking past each other somehow.

I meant that I'd guess Python would throw an Error if the args to subprocess.run were too long, even with check=False, rather then letting the error happen in the Windows shell. That is, is an error like the one linked thrown if you temporarily replace header_cmd in the subprocess command with "a "*10000 (or similar)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got your point.
Just test and subprocess.call will return error code 1, with an stderr output "command line too long".

We may check return code 1 or stderr text (the text is in locale language)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/1bc92ddf-b79e-413c-bbaa-99a5281a6c90
msvc compiler (only tested cl.exe) normally returns error code 2 (such as command error, compile error...)
We will use error code 1 for command line too long problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, bummer. So no Python error, then. Sorry for pushing in the wrong direction.

Do whatever you think is safest and most robust here. If you're confident code 1 is always this command-length-issue, then do that. And if not, then let's fall back to your old solution. (Again, sorry.)

if len('\n'.join(header_cmd)) >= WIN_CMD_LIMITS:
fd, temp_params = tempfile.mkstemp(text=True)
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
with open(temp_params, 'w') as f:
# should skip cl.exe the 1st line
f.write('\n'.join(header_cmd[1:]))
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
os.close(fd)
header_cmd = [header_cmd[0], f'@{temp_params}']
LoSealL marked this conversation as resolved.
Show resolved Hide resolved

header_search_process = subprocess.run(
header_cmd,
stderr=subprocess.PIPE,
Expand Down Expand Up @@ -205,7 +217,7 @@ def _get_headers(compile_args: typing.List[str], source_path: str):
# As an alternative approach, you might consider trying to get the headers by inspecing the Middlemen actions in the aquery output, but I don't see a way to get just the ones actually #included--or an easy way to get the system headers--without invoking the preprocessor's header search logic.
# For more on this, see https://github.com/hedronvision/bazel-compile-commands-extractor/issues/5#issuecomment-1031148373

# Rather than print a scary compiler error, warn gently
# Rather than print a scary compiler error, warn gently
if not os.path.isfile(source_path):
if not _get_headers.has_logged_missing_file_error: # Just log once; subsequent messages wouldn't add anything.
_get_headers.has_logged_missing_file_error = True
Expand All @@ -216,7 +228,7 @@ def _get_headers(compile_args: typing.List[str], source_path: str):
That way everything is generated, browsable and indexed for autocomplete.
However, if you have *already* built your code, and generated the missing file...
Please make sure you're supplying this tool with the same flags you use to build.
You can either use a refresh_compile_commands rule or the special -- syntax. Please see the README.
You can either use a refresh_compile_commands rule or the special -- syntax. Please see the README.
[Supplying flags normally won't work. That just causes this tool to be built with those flags.]
Continuing gracefully...\033[0m""", file=sys.stderr)
return set()
Expand All @@ -227,6 +239,28 @@ def _get_headers(compile_args: typing.List[str], source_path: str):
_get_headers.has_logged_missing_file_error = False


def _params_file_to_source_files(compile_args: typing.List[str]):
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
"""Converts a list of params files to a list of source files."""
params_files = list(filter(lambda f: f.startswith("@") and f.endswith(".params"), compile_args))
# Assume there is only one params file in Bazel of one target
# And we parse compile_args generally, so it may not have params file but normal arguments
assert len(params_files) <= 1, "Only one params file is supported"
for params_file in params_files:
params_file = pathlib.Path.cwd() / params_file.strip('@')
if params_file.exists():
with open(params_file, encoding=locale.getpreferredencoding()) as f:
new_compile_args = [line.strip() for line in f.readlines()]
# concat with the compiler
return compile_args[0:1] + new_compile_args
else:
raise FileNotFoundError(f"""params file {params_file.name} is not created yet.
With --features=compiler_param_file enabled, the .params file won't be created until the target is built.
You have to build the target to generate the .params file. This is a bazel limit.
""")
# return original compile args if no params file or it's broken
return compile_args


def _get_files(compile_args: typing.List[str]):
"""Gets the ({source files}, {header files}) clangd should be told the command applies to."""
# Bazel puts the source file being compiled after the -c flag, so we look for the source file there.
Expand All @@ -240,10 +274,16 @@ def _get_files(compile_args: typing.List[str]):
# Concretely, the message usually has the form "action 'Compiling foo.cpp'"" -> foo.cpp. But it also has "action 'Compiling src/tools/launcher/dummy.cc [for tool]'" -> external/bazel_tools/src/tools/launcher/dummy.cc
# If we did ever go this route, you can join the output from aquery --output=text and --output=jsonproto by actionKey.
# For more context on options and how this came to be, see https://github.com/hedronvision/bazel-compile-commands-extractor/pull/37
SOURCE_EXTENSIONS = ('.c', '.cc', '.cpp', '.cxx', '.c++', '.C', '.m', '.mm', '.cu', '.cl', '.s', '.asm', '.S')
if os.name == 'nt': # is windows
## if --features=compiler_param_file is enabled on Windows, compile args are stored in .params files
## try to parse .params file to get actual compile args, will do nothing if no .params file is found
compile_args = _params_file_to_source_files(compile_args)
compile_only_flag = '/c' if '/c' in compile_args else '-c' # For Windows/msvc support
if compile_only_flag not in compile_args:
raise ValueError(f"{compile_only_flag} not found in compile_args: {compile_args}")
source_index = compile_args.index(compile_only_flag) + 1
Copy link
Contributor

@cpsauer cpsauer May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think index will already throw a ValueError if not found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just throws a KeyError, I think this exception will better let user know witch compile_args is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[At least on my system]

>>>"".index("foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

But regardless, this is going to lead to a hard crash, right? Like this isn't something we expect to show the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But w/o this two lines it still goes to a hard crash if both /c or -c not found...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree!
It'll always say "-c not found...", though, bc of the previous line, though, right?
Not super opposed or anything, I'm not sure how much this adds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! What about change typo to

if compile_only_flag not in compile_args:
  raise ValueError(f"/c or -c (required for parsing sources) is not found in compile_args: {compile_args}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I really think this is okay without! They should always be present from Bazel, and we'd get a crash that user will report

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will just delete them. Maybe I only encounter during my development.

Copy link
Contributor

@cpsauer cpsauer May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I'm silly. You're adding it so that the contents of the array itself will be in the error message for easy debugging, right? That's nice, because then people will actually report the compile command as part of the error, even if we don't anticipate this case arriving during normal use.

(Sorry I was slow to understand.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ To be clear, if easy, could you re-add your code from #49 (comment). Again, my apologies for being slow to understand what you were up to.

source_file = compile_args[source_index]
SOURCE_EXTENSIONS = ('.c', '.cc', '.cpp', '.cxx', '.c++', '.C', '.m', '.mm', '.cu', '.cl', '.s', '.asm', '.S')
assert source_file.endswith(SOURCE_EXTENSIONS), f"Source file not found after {compile_only_flag} in {compile_args}"
assert source_index + 1 == len(compile_args) or compile_args[source_index + 1].startswith('-') or not compile_args[source_index + 1].endswith(SOURCE_EXTENSIONS), f"Multiple sources detected after {compile_only_flag}. Might work, but needs testing, and unlikely to be right given Bazel's incremental compilation. CMD: {compile_args}"

Expand Down Expand Up @@ -414,6 +454,8 @@ def _get_commands(target: str, flags: str):
# Shush logging. Just for readability.
'--ui_event_filters=-info',
'--noshow_progress',
# Disable param file, only useful for Windows. No effect on Linux.
LoSealL marked this conversation as resolved.
Show resolved Hide resolved
'--features=-compiler_param_file',
] + shlex.split(flags) + sys.argv[1:]

aquery_process = subprocess.run(
Expand Down Expand Up @@ -476,7 +518,7 @@ def _ensure_external_workspaces_link_exists():
except OSError:
print("\033[0;31m>>> //external already exists, but it isn't a symlink or Windows junction. //external is reserved by Bazel and needed for this tool. Please rename or delete your existing //external and rerun. More details in the README if you want them.\033[0m", file=sys.stderr) # Don't auto delete in case the user has something important there.
exit(1)

# Normalize the path for matching
# First, workaround a gross case where Windows readlink returns extended path, starting with \\?\, causing the match to fail
if is_windows and current_dest.startswith('\\\\?\\'):
Expand All @@ -495,7 +537,7 @@ def _ensure_external_workspaces_link_exists():
else:
source.symlink_to(dest, target_is_directory=True)
print(f"""\033[0;32m>>> Automatically added //external workspace link:
This link makes it easy for youand for build toolingto see the external dependencies you bring in. It also makes your source tree have the same directory structure as the build sandbox.
This link makes it easy for you--and for build tooling--to see the external dependencies you bring in. It also makes your source tree have the same directory structure as the build sandbox.
It's a win/win: It's easier for you to browse the code you use, and it eliminates whole categories of edge cases for build tooling.\033[0m""", file=sys.stderr)


Expand All @@ -508,7 +550,7 @@ def _ensure_gitignore_entries():
'/.cache/', # Where clangd puts its indexing work
]

# Separate operations because Python doesn't have a built in mode for read/write, don't truncate, create, allow seek to beginning of file.
# Separate operations because Python doesn't have a built in mode for read/write, don't truncate, create, allow seek to beginning of file.
open('.gitignore', 'a').close() # Ensure .gitignore exists
with open('.gitignore') as gitignore:
lines = [l.rstrip() for l in gitignore]
Expand Down