Skip to content

Commit

Permalink
Add internal instrumentation for tracking stack sizes
Browse files Browse the repository at this point in the history
  • Loading branch information
hsutter committed Dec 10, 2023
1 parent 2d9382d commit c74a0f2
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 38 deletions.
2 changes: 1 addition & 1 deletion regression-tests/test-results/version
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

cppfront compiler v0.3.0 Build 8C04:1728
cppfront compiler v0.3.0 Build 8C09:1607
Copyright(c) Herb Sutter All rights reserved

SPDX-License-Identifier: CC-BY-NC-ND-4.0
Expand Down
2 changes: 1 addition & 1 deletion source/build.info
Original file line number Diff line number Diff line change
@@ -1 +1 @@
"8C04:1728"
"8C09:1607"
90 changes: 90 additions & 0 deletions source/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -871,6 +871,96 @@ static cmdline_processor::register_flag cmd_gen_version(
[]{ cmdline.gen_version(); }
);

static auto flag_internal_debug = false;
static cmdline_processor::register_flag cmd_internal_debug(
0,
"_debug",
"Generate internal debug instrumentation",
[]{ flag_internal_debug = true; }
);


//-----------------------------------------------------------------------
//
// Internal instrumentation
//
//-----------------------------------------------------------------------
//

class stackinstr
{
struct entry
{
ptrdiff_t delta;
ptrdiff_t cumulative;
std::string_view func_name;
std::string_view file;
int line;
char* ptr;

entry(
std::string_view n,
std::string_view f,
int l,
char* p
)
: delta { entries.empty() ? 0 : std::abs(entries.back().ptr - p) }
, cumulative{ entries.empty() ? 0 : entries.back().cumulative + delta }
, func_name { n }
, file { f }
, line { l }
, ptr { p }
{ }
};
static std::vector<entry> entries;
static std::vector<entry> deepest;
static std::vector<entry> largest;

static auto print(auto&& ee, std::string_view label) {
std::cout << "\n=== Stack debug information: " << label << " stack ===\n";
for (auto& e: ee)
if (e.ptr) {
std::cout
<< " " << std::setw(6)
<< ((std::abs(e.delta) < 1000000)? std::to_string(e.delta) : "-----") << " "
<< std::setw(8)
<< ((std::abs(e.delta) < 1000000)? std::to_string(e.cumulative) : "-------") << " "
<< e.func_name << " (" << e.file << ":" << e.line << ")\n";
}
}

public:
struct guard {
guard( std::string_view name, std::string_view file, int line, char* p ) {
if (flag_internal_debug) {
entries.emplace_back(name, file, line ,p);
if (ssize(deepest) < ssize(entries)) {
deepest = entries;
}
if (largest.empty() || largest.back().cumulative < entries.back().cumulative) {
largest = entries;
}
}
}
~guard() {
if (flag_internal_debug) {
entries.pop_back();
}
}
};

static auto print_entries() { print( entries, "Current" ); }
static auto print_deepest() { print( deepest, "Deepest" ); }
static auto print_largest() { print( largest, "Largest" ); }
};

std::vector<stackinstr::entry> stackinstr::entries;
std::vector<stackinstr::entry> stackinstr::deepest;
std::vector<stackinstr::entry> stackinstr::largest;

#define STACKINSTR stackinstr::guard _s_guard{ __func__, __FILE__, __LINE__, reinterpret_cast<char*>(&_s_guard) };


}

#endif
13 changes: 9 additions & 4 deletions source/cppfront.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@

#include "to_cpp1.h"

static auto enable_debug_output_files = false;
static auto flag_debug_output = false;
static cpp2::cmdline_processor::register_flag cmd_debug(
9,
"debug",
"Emit compiler debug output files",
[]{ enable_debug_output_files = true; }
"Emit compiler debug output",
[]{ flag_debug_output = true; }
);

auto main(
Expand Down Expand Up @@ -101,10 +101,15 @@ auto main(
}

// And, if requested, the debug information
if (enable_debug_output_files) {
if (flag_debug_output) {
c.debug_print();
}
}

if (flag_internal_debug) {
stackinstr::print_deepest();
stackinstr::print_largest();
}

return exit_status;
}
Loading

5 comments on commit c74a0f2

@JohelEGP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep ones are those that go through all the expression nodes and possibly recurse.
I know of a particularly expensive one: llvm/llvm-project#73336.

@hsutter
Copy link
Owner Author

@hsutter hsutter commented on c74a0f2 Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's strange for Clang-18. Thanks for the data. I've never seen a build of cppfront take more than about 30 sec on any compiler I've tried with any optimization settings... FWIW, on my laptop, a full build of cppfront using MSVC 2022, GCC 10, and Clang 12 currently takes 6, 17, and 17 seconds (respectively) with default optimizations, and takes 16, 25, and 33 seconds (respectively) with -O2.

For cppfront's own build time: Do you have a suggestion for how to make this better on Clang? I'd welcome a PR if there's low-hanging fruit cppfront could do. Otherwise, it seems like an issue Clang should fix in their optimizer as we seem to be hitting a new (since Clang 12) optimization's worst case, so I'd leave it to them to fix.

For cppfront run time: I considered caching the result of this computation, but that shouldn't make much difference because it would only help if the is_fold_expression query were called many times on the same node. However, the lowering pass only calls it in two places (once per expression-list, which could repeat a bit for complex expressions as we recurse, but shouldn't be a major cost; and once per return of an expression-list).


I mainly added this instrumentation to measure cppfront execution stack size: I've noticed that cppfront built with all major compilers including MSVC invoked from the command line (CL.EXE) generates a cppfront.exe with reasonable stack sizes. However...

  1. (main issue) When I build it using MSVC in the VS IDE (with AFAICT identical settings as used by the CL.EXE build) I get 10-20x larger stack sizes for the same code, and sometimes cppfront built in that mode stack-overflows when lowering very complex expressions with the default 1MB stack size. When built with MSVC in the VS IDE, the biggest function stack frame is for emit(declaration_node,...) which often uses 28K stack when built with MSVC in the VS IDE. For comparison, it uses 2.8K built with the same compiler on the command line with CL.EXE, and 0.5K / 1.2K on Clang / GCC with default optimization flags.

  2. (secondary, not a pain point so far) On Clang, emit(statement_node,...) for some reason wants 8.9K in debug mode (0.6K with -O2). With the other compilers it's between 0.2K to at most 3.5K in various debug/release builds.

The main pain point is (1), and I haven't been able to find the source of the stack bloat there yet, so I added instrumentation to help measure the problem. It surprised me because I know I'm not storing much data on the stack, so a default 1MB stack size really ought to be more than enough even for very complex code, even with the recursion.

@JohelEGP
Copy link
Contributor

@JohelEGP JohelEGP commented on c74a0f2 Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know my compile times can be lower.

I'm currently compiling cppfront in my project.
It includes some flags which have a compile-time hit.

Also, my build of GCC HEAD is under-performant.
I prefer that it builds in only 30 min.
I have been unable to similarly optimize the build time of Clang HEAD, so it's as fast as Clang 16.

I know why your compile times might be faster.

I think you're not using Libc++ when compiling with Clang.
I have noticed that cppfront compiles faster with Libstdc++.

Also, earlier compiler versions are known to not support std::ranges.
Later versions do, which add a bit of a fixed difference to the compile times.


For cppfront's own build time: Do you have a suggestion for how to make this better on Clang? I'd welcome a PR if there's low-hanging fruit cppfront could do.

Caching the result of primary_expression_node::is_fold_expression for case expression_list should do the job (#886).


Interesting thing about the stack sizes.
I have never found myself needing to debug those.
I'll see if QtCreator has something integrated that can help.

@hsutter
Copy link
Owner Author

@hsutter hsutter commented on c74a0f2 Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm just realizing now that I'm pretty spoiled because my typical inner-loop full build of cppfront is 6 seconds with MSVC debug. And only when I do my pre-commit "warning-clean" sanity checks against GCC and Clang they're still sub-20sec (since I only bother with debug mode, plus I don't really feel that time because I run them in parallel with the full regression tests which take ~4 minutes all told).

I'm familiar with "optimize for space" and "optimize for speed" compiler flags (on MSVC, they've long been /Os and /Ox), but that's for general space/speed rather than stack -- in fact, /Os generates 3x more stack usage in my current test case (see below). In exploring compiler flags, I've noticed that GCC -fconserve-stack and MSVC /Ox (general max optimizations) generate near-identical stack sizes (61K largest stack) for my test case. Which FWIW is this, extracted from PR #701 (thanks @filipsajdak !):

main: () = {
    close_to := :(v) -> _ = :(x) -> bool = {
        return std::abs(v$ - x) < std::max<std::common_type_t<std::decay_t<decltype(x)>,std::decay_t<decltype(v$)>>>(std::numeric_limits<std::decay_t<decltype(x)>>::epsilon(), std::numeric_limits<std::decay_t<decltype(v$)>>::epsilon());
    };
    _ = close_to(1);
}

Cppfront is fine with this, except only on MSVC when built in the IDE (not on the command line), it just barely overflows the 1M default stack size. So for now I just set my IDE build's stack size to 4M, and that's all -- there's no problem on any command-line build on any compiler.

@JohelEGP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. and sometimes cppfront built in that mode stack-overflows when lowering very complex expressions with the default 1MB stack size

Did this start happening after merging #506?
If so, it might be due to the added guards.
#907 now does the same during parse.

AFAIK, MSVC in Debug mode uses debug iterators, which increases iterator sizes.
Since many data structures use iterators, that compounds.
And MSVC probably does similar things for abstractions other than iterators.

Maybe you can get similar results for Clang and GCC in Debug mode
if you opt-into some options with similar effects.

I'll see if QtCreator has something integrated that can help.

I have yet to do this.

Please sign in to comment.