Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse inputs from ActionGraphContainer instead of arguments #37

Conversation

alexander-born
Copy link

@alexander-born alexander-born commented Mar 25, 2022

For some targets the assert for multiple source files in _get_files stops the compile commands extraction.
This is due to some generated directories ending with source file extensions.

example:

'-isystem', 'bazel-out/k8-opt/bin/someip-posix.cc'

This change does not interpret arguments as source files if the previous argument is -isystem.

@alexander-born alexander-born changed the title Remove isystem dirs with source extensions Don't interpred isystem dirs with source extensions as source files Mar 25, 2022
@alexander-born alexander-born changed the title Don't interpred isystem dirs with source extensions as source files Don't interpret isystem dirs with source extensions as source files Mar 25, 2022
@cpsauer
Copy link
Contributor

cpsauer commented Mar 29, 2022

Hey, @alexander-born!

Good catch--and definitely not a case I'd thought of. I hadn't realized people named directories with source extensions, but yep, that'll definitely trip us up. Sorry for being a little slow on the reply here. Know that I really appreciate you and your helping make this tool better.

Here's what I'm thinking while reading:

  1. Heh, like with extensionless headers, I rather wish people didn't put source extensions on directories. But here we are.
  2. Hmm, isn't this a somewhat more general problem? Some other cases that occur are -isystem w/o a space before the directory or = (if that's allowed)--but also for all the other types of header search path flags (like -I).
  3. Seems hard to handle all those cases. What do you think about, in the case of multiple source files, instead checking to see if any are directories and disqualifying them from the list? Any other, cleaner ideas I've missed?
  4. [minor and probably obviated by 3)] Don't we already import itertools? Should we nest the helper?

Thanks for your thoughtfulness and care!
Chris

@alexander-born
Copy link
Author

alexander-born commented Mar 29, 2022

  1. Completely agree.
  2. Yes this solution is not general enough, we need to catch all of these search paths.
  3. Good idea, I'll try that.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 29, 2022

Thank you! You're great :)

@cpsauer
Copy link
Contributor

cpsauer commented Mar 29, 2022

@alexander-born, a potentially better alternative to directory-checking occurred while I was going to sleep last night:

[A potential problem with directory checking is that perhaps the directory might not exist yet if they're build outputs.]

Instead, we're already getting information about the inputs a compile command uses from aquery. What about trying to filter the candidate source files from the inputs Bazel says the action takes? That list should exist even if the files themselves haven't been generated yet. We could maybe even just filter through the inputs list for source files, rather than parse the command line at all, but we'd need to test that Bazel never hides the source behind "middlemen"...

To see what Bazel thinks the inputs are--and to make sure that they don't include those problematically named directories--you can run, e.g., bazel aquery "mnemonic('CppCompile',//...)" | less

We're already parsing the aquery output in the python code. See the aquery_output parameter to _convert_compile_commands. It's in a slightly less human-readable format, a jsonproto.

You can see that raw format either by dumping it from python or via, e.g., bazel aquery "mnemonic('CppCompile',//...) --output=jsonproto" | less

Thoughts?
-C

P.S. To show you an example of human-readable aquery output for a CppCompile action--and maybe save you some time (note "Inputs:" list)

action 'Compiling kstring_test.cpp'
  Mnemonic: CppCompile
  Target: //:kstring_test
  Configuration: darwin-fastbuild
  Execution platform: @local_config_platform//:host
  ActionKey: 2af4f0466d83be63669629d78d2553b1fd071310f90f59a8634a2e39c123e414
  Inputs: [bazel-out/darwin-fastbuild/internal/_middlemen/_S_S_Ckstring_Utest-BazelCppSemantics_build_arch_darwin-fastbuild, external/bazel_tools/tools/cpp/grep-includes.sh, external/local_config_cc/cc_wrapper.sh, external/local_config_cc/libtool, external/local_config_cc/libtool_check_unique, external/local_config_cc/make_hashed_objlist.py, external/local_config_cc/wrapped_clang, external/local_config_cc/wrapped_clang_pp, external/local_config_cc/xcrunwrapper.sh, kstring_test.cpp]
  Outputs: [bazel-out/darwin-fastbuild/bin/_objs/kstring_test/kstring_test.d, bazel-out/darwin-fastbuild/bin/_objs/kstring_test/kstring_test.o]
  ExecutionInfo: {requires-darwin: '', supports-xcode-requirements-set: ''}
  Command Line: (exec external/local_config_cc/wrapped_clang_pp \
    '-D_FORTIFY_SOURCE=1' \
    -fstack-protector \
    -fcolor-diagnostics \
    -Wall \
    -Wthread-safety \
    -Wself-assign \
    -fno-omit-frame-pointer \
    -O0 \
    -DDEBUG \
    '-std=c++11' \
    'DEBUG_PREFIX_MAP_PWD=.' \
    -iquote \
    . \
    -iquote \
    bazel-out/darwin-fastbuild/bin \
    -iquote \
    external/com_google_googletest \
    -iquote \
    bazel-out/darwin-fastbuild/bin/external/com_google_googletest \
    -iquote \
    external/bazel_tools \
    -iquote \
    bazel-out/darwin-fastbuild/bin/external/bazel_tools \
    -isystem \
    external/com_google_googletest/googlemock \
    -isystem \
    bazel-out/darwin-fastbuild/bin/external/com_google_googletest/googlemock \
    -isystem \
    external/com_google_googletest/googlemock/include \
    -isystem \
    bazel-out/darwin-fastbuild/bin/external/com_google_googletest/googlemock/include \
    -isystem \
    external/com_google_googletest/googletest \
    -isystem \
    bazel-out/darwin-fastbuild/bin/external/com_google_googletest/googletest \
    -isystem \
    external/com_google_googletest/googletest/include \
    -isystem \
    bazel-out/darwin-fastbuild/bin/external/com_google_googletest/googletest/include \
    -MD \
    -MF \
    bazel-out/darwin-fastbuild/bin/_objs/kstring_test/kstring_test.d \
    '-frandom-seed=bazel-out/darwin-fastbuild/bin/_objs/kstring_test/kstring_test.o' \
    -isysroot \
    __BAZEL_XCODE_SDKROOT__ \
    -F__BAZEL_XCODE_SDKROOT__/System/Library/Frameworks \
    -F__BAZEL_XCODE_DEVELOPER_DIR__/Platforms/MacOSX.platform/Developer/Library/Frameworks \
    '-mmacosx-version-min=12.3' \
    -no-canonical-prefixes \
    -pthread \
    '-std=c++20' \
    -no-canonical-prefixes \
    -Wno-builtin-macro-redefined \
    '-D__DATE__="redacted"' \
    '-D__TIMESTAMP__="redacted"' \
    '-D__TIME__="redacted"' \
    -target \
    x86_64-apple-macosx \
    -c \
    kstring_test.cpp \
    -o \
    bazel-out/darwin-fastbuild/bin/_objs/kstring_test/kstring_test.o)
# Configuration: 2bc885e6d31c22261d787b93d674f32836182b8da879aaf54b77618b9a74d265
# Execution platform: @local_config_platform//:host
  ExecutionInfo: {requires-darwin: '', supports-xcode-requirements-set: ''}

@alexander-born
Copy link
Author

Wow you're good. Directory checking did indeed not work because as you correctly assumed, they did not exist yet 👍

I will try your suggested solution.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 30, 2022

heh, well, honored to be of help. Wish only that I'd gotten to it earlier.

@alexander-born
Copy link
Author

Unfortunately in my case the CppCompile action does not have a "inputs" list.

  "actions": [{                                                                                                                                                                                                                               
    "targetId": 1,                                                                                                                                                                                                                            
    "actionKey": "fa368b279aaaf9cfd3b576187cd74c5d3ed31a43153663ea89901529c343d7f6",                                                                                                                                                          
    "mnemonic": "CppCompile",                                                                                                                                                                                                                 
    "configurationId": 1,                                                                                                                                                                                                                     
    "arguments": [...],
    "inputDepSetIds": [1],
    "outputIds": [5, 6],
    "discoversInputs": true,
    "primaryOutputId": 5
  }],

@alexander-born
Copy link
Author

Would it make sense to use the arguements where the previous argument is -c?

-c Only run preprocess, compile, and assemble steps

On a first glance it seems that the input source files are always prepended with the -c argument. Not sure if this is a general rule.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 30, 2022

The inputs are sneakily in there! See the "inputDepSetIds"?
(They're doing things by ID, I think, to avoid having entries referenced from multiple rules be duplicated. You'll have to do a little traversal of their structure to find things. Running aquery with --output=jsonproto will give a sense for how they've set things up.)

[If we can, I think we should try to avoid using the proximity to -c, since that's just by coincidence.]

Getting late over here! I'd better crash for now, but good luck! And don't hesitate to write back.

@alexander-born
Copy link
Author

Ah I see, thanks.

@alexander-born alexander-born force-pushed the remove_isystem_dirs_with_source_extensions branch from d5118d5 to b74a4e5 Compare March 30, 2022 11:28
@alexander-born
Copy link
Author

alexander-born commented Mar 30, 2022

I just pushed changes to parse the input files from the ActionGraphContainer. I hope this does not increase runtime too much.

@alexander-born alexander-born changed the title Don't interpret isystem dirs with source extensions as source files Parse inputs from ActionGraphContainer instead of command line arguments Mar 30, 2022
@alexander-born alexander-born changed the title Parse inputs from ActionGraphContainer instead of command line arguments Parse inputs from ActionGraphContainer instead of arguments Mar 30, 2022
@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Wahoo! Testing.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Shucks. Okay, looks like we've got (at least) one case with two cpp files in the input.

AssertionError: Multiple sources detected. Might work, but needs testing, and unlikely to be right given bazel. source_files: ['actualfile.cpp', 'external/androidndk/ndk/toolchains/llvm/prebuilt/darwin-x86_64/lib/python2.7/site-packages/lldb/macosx/heap/heap_find.cpp']

I'll dig a bit

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Very weird. external/androidndk/ndk/toolchains/llvm/prebuilt/darwin-x86_64/lib/python2.7/site-packages/lldb/macosx/heap/heap_find.cpp is indeed a cpp file, not used in the compile command.

So I think we should maybe just intersect the two sets of input candidates?

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

I'm considering parsing the file out of the action name instead. It's only available in the --text output, but I could match on actionKey. Another downside is that it excludes the external/ prefix. Hmm.

@alexander-born
Copy link
Author

That's unfortunate. I am just guessing here. Maybe we don't need to consider transitive dep sets?

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Oh, good point! I'd think, too, that the source file would always be directly depended on. Let's see

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Tried commenting out your good depset traversal code, but no dice. It seems like the source file isn't included directly. I'll dig.

In the meantime, to confirm, the code to try disabling is the block starting with if hasattr(self.dep_sets[id], "transitiveDepSetIds"):, right?

@alexander-born
Copy link
Author

Correct.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

[Ugh, I'm sorry this parsing was so gross.]

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Looks like the source file is always one transitive layer deep, but it's probably dangerous to depend on that vs just intersecting the two.

@alexander-born
Copy link
Author

alexander-born commented Mar 31, 2022

Ok, so creating the intersection it is?

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

I think so, but gimme one sec.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

I'm trying to decide whether doing that intersection is cleaner than getting them from the action message.
That is, the action 'Compiling kstring_test.cpp', above.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

The benefit of that would be easy to modify, non-recursive code, and clearly simple and fast, though you've already done a great job with the other.

The downside (as above) would be that the action message is a little different with tools and external workspaces. For example: action 'Compiling src/tools/launcher/dummy.cc [for tool]' -> external/bazel_tools/src/tools/launcher/dummy.cc

I can imagine solving this with some intersect logic like making sure the path in the action name is contained in the path from the command line, but things like paths with spaces are going to be a bad time.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

I think that's likely to be worse. Let's do the intersect.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Do you want to do, or should I? (Getting pretty late here, so I'd do tm.)

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Honestly, should we just take advantage of the proximity to the -c? Maybe that's the cleanest of our options.

I'm regretting dismissing that earlier. I'd originally thought this would be a much easier filtering operation.

@alexander-born
Copy link
Author

alexander-born commented Mar 31, 2022

I see the following options:

  1. intersect the two lists
  2. use the first file?
  3. use proximity to -c

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Okay, yeah, after all that, I think my vote is for using -c, a great find by you.
It's simple and easy to put an assert on. Despite being a bit of a hack, it's radically simpler (assuming it works), and we'll know fast if it doesn't.

@alexander-born
Copy link
Author

Give me a few minutes.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Thanks for being great. I'm so sorry we're likely wasting the good recursive traversal you wrote. But I love your resourcefulness.

@alexander-born alexander-born force-pushed the remove_isystem_dirs_with_source_extensions branch from b74a4e5 to 2424c89 Compare March 31, 2022 06:36
@alexander-born
Copy link
Author

No worries regarding wasting the traversal code, sometimes one has to explore a few options and it didn't take too much time.

@alexander-born
Copy link
Author

Can you try the current changes?

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Yep! On it.
(Works for you, yeah?)

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

heh, yikes, just ran the pull and overwrote another repo. One sec.
(In particular the repo I was testing from.)

@alexander-born
Copy link
Author

Works for me yes.

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

Works for me! (Though my other repo is still a bit messed)
If you want, you can consider this done! I'll merge it in shortly with a smile.

[I'm still going to finish fix that other repo. And I'm going to massage things just a little here, adding one performance trick I noticed while reading the docs exploring, documenting the assumption we made, and defending it with an assert.]

You should feel super good about this! I really appreciate your tracking down a new problem case, finding us the best solution, and being willing to explore around to find the best option.

@alexander-born
Copy link
Author

Thanks for your kind words and very good support!

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

And likewise :)

@cpsauer
Copy link
Contributor

cpsauer commented Mar 31, 2022

<Commits actually merged, but GitHub didn't recognize, so I'll manually close.>

@alexander-born, if you get a chance, after you update, could you double check that this fixes things for you?

Cheers!
Chris

@cpsauer cpsauer closed this Mar 31, 2022
@alexander-born
Copy link
Author

alexander-born commented Mar 31, 2022

Thanks, yes the compile commands generation works now for the previously problematic targets!

@alexander-born alexander-born deleted the remove_isystem_dirs_with_source_extensions branch March 31, 2022 08:15
cpsauer added a commit that referenced this pull request Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants