-
Notifications
You must be signed in to change notification settings - Fork 225
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add internal instrumentation for tracking stack sizes
- Loading branch information
Showing
5 changed files
with
133 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
"8C04:1728" | ||
"8C09:1607" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
c74a0f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep ones are those that go through all the expression nodes and possibly recurse.
I know of a particularly expensive one: llvm/llvm-project#73336.
c74a0f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's strange for Clang-18. Thanks for the data. I've never seen a build of cppfront take more than about 30 sec on any compiler I've tried with any optimization settings... FWIW, on my laptop, a full build of cppfront using MSVC 2022, GCC 10, and Clang 12 currently takes 6, 17, and 17 seconds (respectively) with default optimizations, and takes 16, 25, and 33 seconds (respectively) with
-O2
.For cppfront's own build time: Do you have a suggestion for how to make this better on Clang? I'd welcome a PR if there's low-hanging fruit cppfront could do. Otherwise, it seems like an issue Clang should fix in their optimizer as we seem to be hitting a new (since Clang 12) optimization's worst case, so I'd leave it to them to fix.
For cppfront run time: I considered caching the result of this computation, but that shouldn't make much difference because it would only help if the
is_fold_expression
query were called many times on the same node. However, the lowering pass only calls it in two places (once per expression-list, which could repeat a bit for complex expressions as we recurse, but shouldn't be a major cost; and once perreturn
of an expression-list).I mainly added this instrumentation to measure cppfront execution stack size: I've noticed that cppfront built with all major compilers including MSVC invoked from the command line (CL.EXE) generates a cppfront.exe with reasonable stack sizes. However...
(main issue) When I build it using MSVC in the VS IDE (with AFAICT identical settings as used by the CL.EXE build) I get 10-20x larger stack sizes for the same code, and sometimes cppfront built in that mode stack-overflows when lowering very complex expressions with the default 1MB stack size. When built with MSVC in the VS IDE, the biggest function stack frame is for
emit(declaration_node,...)
which often uses 28K stack when built with MSVC in the VS IDE. For comparison, it uses 2.8K built with the same compiler on the command line with CL.EXE, and 0.5K / 1.2K on Clang / GCC with default optimization flags.(secondary, not a pain point so far) On Clang,
emit(statement_node,...)
for some reason wants 8.9K in debug mode (0.6K with-O2
). With the other compilers it's between 0.2K to at most 3.5K in various debug/release builds.The main pain point is (1), and I haven't been able to find the source of the stack bloat there yet, so I added instrumentation to help measure the problem. It surprised me because I know I'm not storing much data on the stack, so a default 1MB stack size really ought to be more than enough even for very complex code, even with the recursion.
c74a0f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know my compile times can be lower.
I'm currently compiling
cppfront
in my project.It includes some flags which have a compile-time hit.
Also, my build of GCC HEAD is under-performant.
I prefer that it builds in only 30 min.
I have been unable to similarly optimize the build time of Clang HEAD, so it's as fast as Clang 16.
I know why your compile times might be faster.
I think you're not using Libc++ when compiling with Clang.
I have noticed that
cppfront
compiles faster with Libstdc++.Also, earlier compiler versions are known to not support
std::ranges
.Later versions do, which add a bit of a fixed difference to the compile times.
Caching the result of
primary_expression_node::is_fold_expression
forcase expression_list
should do the job (#886).Interesting thing about the stack sizes.
I have never found myself needing to debug those.
I'll see if QtCreator has something integrated that can help.
c74a0f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm just realizing now that I'm pretty spoiled because my typical inner-loop full build of cppfront is 6 seconds with MSVC debug. And only when I do my pre-commit "warning-clean" sanity checks against GCC and Clang they're still sub-20sec (since I only bother with debug mode, plus I don't really feel that time because I run them in parallel with the full regression tests which take ~4 minutes all told).
I'm familiar with "optimize for space" and "optimize for speed" compiler flags (on MSVC, they've long been
/Os
and/Ox
), but that's for general space/speed rather than stack -- in fact,/Os
generates 3x more stack usage in my current test case (see below). In exploring compiler flags, I've noticed that GCC-fconserve-stack
and MSVC/Ox
(general max optimizations) generate near-identical stack sizes (61K largest stack) for my test case. Which FWIW is this, extracted from PR #701 (thanks @filipsajdak !):Cppfront is fine with this, except only on MSVC when built in the IDE (not on the command line), it just barely overflows the 1M default stack size. So for now I just set my IDE build's stack size to 4M, and that's all -- there's no problem on any command-line build on any compiler.
c74a0f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this start happening after merging #506?
If so, it might be due to the added guards.
#907 now does the same during parse.
AFAIK, MSVC in Debug mode uses debug iterators, which increases iterator sizes.
Since many data structures use iterators, that compounds.
And MSVC probably does similar things for abstractions other than iterators.
Maybe you can get similar results for Clang and GCC in Debug mode
if you opt-into some options with similar effects.
I have yet to do this.