New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add attributes to control header and external source inclusion #44
Add attributes to control header and external source inclusion #44
Conversation
Awesome project, BTW. Really helpful to me at work and for personal projects. |
Hi, @dave-hagedorn! I'm delighted to hear you're finding this tool useful. Thanks for a high-quality contribution! I'm super down to merge in the option to turn off header finding. It makes sense that tools would want the just set of compile commands you'd run for a full build and wouldn't necessarily need headers to work around clangd/clangd#123. Thanks for adding. I'm also down with an option to exclude external workspaces, though I'd imagine the project would need to be really big to need it with headers off, right? Is that the case you're finding yourself in? Thanks also for fixing the trailing whitespace that's crept in. Sorry about that. If you know of a good, automated (GitHub bot?) solution to auto-remove trailing whitespace that gets pushed, I'd love to know. A few last things to get us over the finish line: Again, thanks so much! I really appreciate your going back and making the tool even better than you found it. P.S. I'm curious what leads you to prefer ccls over clangd! |
I also want to double check on one more thing: The primary problem you're running into is that the tools you're using don't want the headers, and that compile_commands.json is too big--and not that the tool is too slow, right? |
Hi Chris, Thanks for all the info and fast response. (1) - Good point. I forgot about generated files. I'll try your aquery inspection technique. If that turns into a rabbit hole I would just NOT exclude bazel-out/* but still exclude absoulte paths - does that work for you? (2) - My personal preference here is to leave the script inputs in one place and propagate them throughout the code, as this makes running the script standalone for testing/dev easier. But if you have a preference here I'll defer to that, just let me know (3) - So it is - good catch :) (4) - Sure, can do. Do you want me to remove the (5) - I originally had To your last question - yup. Your tool is fast regardless, but the time it takes clangd to process a 10k+ line compile_commands.json file is prohibitive. I've also had trouble with ccls trying to consume headers in compile_commands.json. I use both ccls and clangd. ccls indexes a lot faster and has CodeLens indicators for cross-references. Both of which are really useful to me when exploring large code bases (WebRTC comes to mind). I've used clangd for the same but the indexing took over an hour vs ~10min. and I'm right clicking all the time to get the "find references" item in the context menu. On the other hand I've found clangd seems to have now caught up and surpassed ccls in terms of code completion especially on C++20 code, and is more forgiving of incorrect syntax. ccls sometimes fails to complete the line you're typing until it's well formed, whereas clangd doesn't seem to have this limitation. I also really like the inlay hints in clangd 14 |
Fun fact - aquery text output has a |
Digging some more I think I can link the targetID for an action with its target in the json proto |
Thanks for being great and responsive yourself :) Things are looking great--looks like you've resolved most of the above. To answer the questions around (1):
On (2): I still kinda think we should directly templatize and scope things to where the settings are used. But if you're finding that running standalone is a big boost to your development productivity, then we should lean into that. Has it been useful to run independently? I'd previously been thinking that the template wasn't really that runnable without Bazel. One minor thought: Okay if in refresh_compile_commands we have exclude_headers and exclude_external_workspaces default to False rather than None, just so it's totally obvious to readers how to read things? Finally, thank you so much for the color on clangd vs ccls. That's really helpful and good for me to know. Sounds like you're still seeing a big indexing speed win vs clangd14, contrary to MaskRay/ccls#880? If you're curious about how I'd thought about it previously, search "CCLS" in https://github.com/hedronvision/bazel-compile-commands-extractor/blob/main/ImplementationReadme.md. (And when I hear back from you, I'll update it to include what I've learned from you.) Also, I bet the clangd folks would love to hear your feedback--both on the codelens and on indexing speed--if you'd be willing to file an issue over there. They're responsive and super great in my experience. -CS |
@dave-hagedorn, I checked on the Windows absolute path sub-case while I was with that desktop.... And msvc does indeed emit absolute paths for all the headers. But if we do decide want to exclude entries for external headers if exclude_external_workspaces is turned on, then I don't think that should be too hard to handle properly. As an outline, instead of removing all absolute paths, we'd include them if they were in the Bazel workspace (gleaned from the environment variable--see the set working directory code) but not external. Definitely worth delegating the logic to a library function, though, since windows paths are a bit trickier with, e.g. / and \ both valid. |
(Merged in the latest changes in from master.) |
@dave-hagedorn, I know this has been a bit more involved, but could I still get your help getting it over the finish line at some point? |
Hi @cpsauer . Sorry - I got sidetracked and now travelling for work. I will be able to pick this up again next week. Some updates - I did try subtracting external targets via query, but nothing I could find worked. Regarding your other suggestions, no problem I can make the changes. Nice catch on the MSVC paths. Thanks for checking this. Makes sense - can normalize the paths and filter out anything not in the workspace. Regarding your point about This is driven by my context/experience, but I find that tradeoff OK, vs indexing a possibly large set (at least in my experience) of external headers. If a workspace pulls in say grpc, boost, abseil, etc... plus the system headers - you can still get a very large compile_commands.json full of headers you rarely (in my opinion) navigate or edit. Another alternative is to instead add a regex type attribute that would allow more fine-grained control of which files to exclude, although I think external/not-external is a simpler and common enough use case. |
Sweet! Thanks for replying. Hope the trip is going well. (Definitely no need to temporarily close--just wanted to check in and make sure I hadn't driven you away by being too exacting. Thanks for being a good sport about all this and a great contributor!) Re aquery: Your great jsonproto traversal it is, then! Thanks so much for experimenting. Re Thanks for being great and thoughtful about everything, Windows included! |
Hi @cpsauer , back and able to wrap this up. Here's my suggestion for headers - let's decouple sources from headers, and be explicit about how to exclude both. I think this is also what you are saying For sources - there are two kinds: in workspace and external workspace With that I'm thinking attributes:
With this, a user can:
I think this covers all possible use cases.
I favour the first form. Just looking at one of my projects, I see 368 external headers and 131 system headers. Having the ability to omit indexing the system headers would speed up indexing quite a bit. |
3880182
to
6c45b7c
Compare
emit_headers - controls whether header files are included in the generated compile_commands.json emit_externals - same, but for external sources or headers - bazel external workspace sources, etc. Both default to True - the existing behaviour. Some tools like ccls don't work well with headers in the compile_commands.json, and some large projects can generate very large compile_commands.json databases when including all external files.
Rename attributes - start with exclude_ Rework external exclusion algorithm - use action targets and look for prefix Tweak how python script passes options around
Remove extra whitespace Tweak macro - default args are set to None, rely on inner rule's attribute's default values
Changed attributes to exclude_external_sources and exclude_headers
6c45b7c
to
5a768e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, Dave! Thanks for getting back so thoughtfully.
Right there with you on decoupling. I think your names and framing are clean and great. Thanks for getting us to better thinking.
On the interface, my one concern is whether we'll be able to reliably distinguish system headers from other external headers. As an example, on some platforms, Bazel links to system/builtin headers via //external. (I know it does this on Android, at least, and I'd guess, maybe LLVM toolchains.) We could try to carefully tease these apart, but I think it might be hard to distinguish from the local_repository case more generally.
Proposed Solution: If we don't see separating system headers from external ones as a major use case, perhaps exclude_headers
should have be limited to three tiers for now: "all"
, "external"
, and None
, with system headers treated as external headers. Thoughts?
[Skip these next bullets unless you disagree and reading more might save a round trip.]
- I do agree that major libraries, especially those from the OS, are more likely to have docs that obviate the need to view their headers. But the line is a bit blurry, and it feels like
"all"
,"external"
, andNone
have clearer use cases. - Similarly, super granular control, like letting users specify which external workspaces to exclude, seems probably excessive if the goal is just quicker browsing. CPU time is so cheap compared to human time.
- I'm hoping clangd will at some point traverse the include graph on its own, obviating the need to find headers at all. I also really like doing things right. So I'm always torn about how much complexity we should pack into the header finding code. It's required lots already.
- Note to self: Excluding external headers was indeed a lot more subtle than it first appeared. (See also review comment about generated headers.)
Could I also ask you to take a polishing pass through the implementation thereafter?
I'm seeing some some minor things--typos, an unneeded include, etc.--but also some bigger things--like generated headers or running the (slow) header search when excluding headers. I'll mark some quickly that I see, but I'm more generally hoping I could ask you to give things a double check.
Then, assuming you've tested and everything, let's merge! Thanks for bearing with me and getting these cases handled really right.
Chris
P.S. Thanks for solving the rebasing--and sorry if the bad one was from me.
Change exclude_headers options to just all and external Optimize some header eclusion cases (thanks @cpsauer) Update docs
Hi @cpsauer, I think I've taken into accout most (all?) of your recent suggestions. I did update README.md, feel free to amend this as needed. At this point I don't have a lot more time to devote to this PR. Any glaring fixes I'm happy to take on but for any refinements, can I ask that you help get this across the finish line? I'm also fine to polish in a follow-up PR. Let me know if this works for you, thanks! |
refresh.template.py
Outdated
if not file_exists: | ||
if not _get_files.has_logged_missing_file_error: # Just log once; subsequent messages wouldn't add anything. | ||
_get_files.has_logged_missing_file_error = True | ||
print(f"""\033[0;33m>>> A source file you compile doesn't (yet) exist: {source_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried the new version of this PR: I think this should be source_file
here.
I got the following error:
File "/home/user/.cache/bazel/_bazel_q456457/1ccf5f150b64de76858888618e3183bd/execroot/ddad/bazel-out/k8-fastbuild/bin/refresh_compile_commands.runfiles/ddad/refresh_compile_commands.py", line 302, in _get_files
Continuing gracefully...\033[0m""", file=sys.stderr)
NameError: name 'source_path' is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexander-born. Fixed.
Moved missing file error check to _get_files, names changed
Also improve assert messaging and remove dead assert.
Plus, fix a misplaced comment and some typos, old and new
And do a little associated cleanup
…ters Sad, but probably necessary to keep most folks moving forward
for more information, see https://pre-commit.ci
Wahoo! Merged! Thanks @dave-hagedorn for some great new features--and breadth of support. Appreciate your bearing with, even though filtering external headers turned out to be substantially tricker than expected. [And please, if it looks like I messed something up in my attempt to go over things and make any improvements I could find, say something!] |
And sorry for the slowness. Doing my best--just a little overloaded here. |
emit_headers - controls whether header files are included in the generated compile_commands.json
emit_externals - same, but for external sources or headers - bazel external workspace sources, etc.
Both default to True - the existing behaviour.
Some tools like ccls don't seem to work well with headers in the compile_commands.json, and some large
projects can generate very large compile_commands.json databases when including all external files.