Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack object turned into invalid state during exception-caused stack unwinding #44366

Open
boris-kolpackov opened this issue Feb 25, 2020 · 16 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla c++ confirmed Verified by a second party platform:windows

Comments

@boris-kolpackov
Copy link
Contributor

Bugzilla Link 45021
Version 13.0
OS Windows NT
Attachments Stack trace
CC @boris-kolpackov,@DougGregor,@karen-arutyunov,@zygoloid,@rnk

Extended Description

We have a program that crashes in a destructor called for a stack object
that appears to have never been constructed. The destructor is called
during stack unwinding caused by throwing (and catching) an exception.

This only happens with Clang targeting MSVC and only when the program
is compiled with optimization (-O3) and linked statically (but still
using the DLL msvcrt). This affects Clang 9.0.1 and 10.0.0-rc2.

So far we were unable to create a small reproducer and I am not very
hopeful that we will since the issue appears to be specific to the
code setup/path that we have. However, I would be happy to provide
instructions on how to reproduce the issue under a debugger in the
program directly (it is open source). Below I am also including extra
details in case they may help.

You can see the two complete build logs that only differ in the static
linking bit here (this is using Clang 9.0.1):

https://stage.build2.org/?builds=build2&cf=windows_10-clang_9.0_llvm_msvc_16.4-*O3

In particular, the C++ compiler invocations are the same sans a few
macros that setup DLL importing in the shared case.

Below is a stack trace. The debugger attaches the the following message
to the "b.exe!butl::fdclose(int fd) Line 1482 C++" line:

"Unhandled exception at 0x00007FFCCC24D5B8 (ucrtbase.dll) in b.exe: An
invalid parameter was passed to a function that considers invalid
parameters fatal."

The crash happens when an exception is thrown at line 1250[1] trying to
destroy the fdpipe object. The only fdpipe object that could possibly
be being destroyed here is constructed below at line 1520[2]. So it
feels like something is wrong with the exception handing tables.

[1] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1250
[2] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1520

ucrtbase.dll!00007ffccc24d5b8() Unknown
ucrtbase.dll!00007ffccc24d481() Unknown
ucrtbase.dll!00007ffccc24d559() Unknown
ucrtbase.dll!00007ffccc226d33() Unknown
b.exe!butl::fdclose(int fd) Line 1482 C++
[Inline Frame] b.exe!butl::auto_fd::reset(int fd) Line 12 C++
[Inline Frame] b.exe!butl::auto_fd::~auto_fd() Line 27 C++
[Inline Frame] b.exe!butl::fdpipe::~fdpipe() Line 821 C++
b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1994 C++
[External Code]
[Inline Frame] b.exe!build2::fail_mark_base::fail_mark_base::::operator()(const butl::diag_record & r) Line 447 C++
b.exe!build2::fail_mark_base::fail_mark_base::::__invoke(const butl::diag_record & r) Line 443 C++
[Inline Frame] b.exe!butl::diag_record::flush() Line 192 C++
b.exe!butl::diag_record::~diag_record() Line 206 C++
b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1250 C++
b.exe!build2::test::script::run_pipe(build2::test::script::scope & sp, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> bc, std::_Vector_const_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::command>> ec, butl::auto_fd ifd, unsigned __int64 ci, unsigned __int64 li, const build2::location & ll, bool diag) Line 1820 C++
b.exe!build2::test::script::run_expr(build2::test::script::scope & sp, const std::vector<build2::test::script::expr_term,std::allocatorbuild2::test::script::expr_term> & expr, unsigned __int64 li, const build2::location & ll, bool diag) Line 2049 C++
b.exe!build2::test::script::default_runner::run(build2::test::script::scope & sp, const std::vector<build2::test::script::expr_term,std::allocatorbuild2::test::script::expr_term> & expr, build2::test::script::command_type ct, unsigned __int64 li, const build2::location & ll) Line 2084 C++
b.exe!build2::test::script::parser::exec_lines(std::_Vector_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::line>> i, std::_Vector_iterator<std::_Vector_val<std::_Simple_typesbuild2::test::script::line>> e, unsigned __int64 & li, build2::test::script::command_type ct) Line 3210 C++
b.exe!build2::test::script::parser::exec_scope_body() Line 3115 C++
[Inline Frame] b.exe!build2::test::script::parser::execute(build2::test::script::scope & sc, build2::test::script::script & s, build2::test::script::runner & r) Line 2910 C++
b.exe!build2::test::script::execute_impl(build2::test::script::scope & s, build2::test::script::script & scr, build2::test::script::runner & r) Line 2920 C++
[Inline Frame] b.exe!build2::test::script::parser::exec_scope_body::::operator()(const build2::diag_frame * ds, build2::test::script::scope & s, build2::test::script::script & scr, build2::test::script::runner & r) Line 3075 C++
b.exe!build2::scheduler::async<`lambda at C:\temp\cfg\build2-0.13.0-a.0.20200224125633.f749aab34924\libbuild2\test\script\parser.cxx:3069:39',const build2::diag_frame *&,std::reference_wrapperbuild2::test::script::scope,std::reference_wrapperbuild2::test::script::script,std::reference_wrapperbuild2::test::script::runner>(unsigned __int64 start_count, std::atomic & task_count, build2::test::script::parser::exec_scope_body:: && f, const build2::diag_frame * & a, std::reference_wrapperbuild2::test::script::scope && a, std::reference_wrapperbuild2::test::script::script && a, std::reference_wrapperbuild2::test::script::runner && a) Line 25 C++
[Inline Frame] b.exe!build2::scheduler::async(std::atomic & task_count, build2::test::script::parser::exec_scope_body:: && f, const build2::diag_frame * & a, std::reference_wrapperbuild2::test::script::scope && a, std::reference_wrapperbuild2::test::script::script && a, std::reference_wrapperbuild2::test::script::runner && a) Line 85 C++
b.exe!build2::test::script::parser::exec_scope_body() Line 3068 C++
[Inline Frame] b.exe!build2::test::script::parser::execute(build2::test::script::scope & sc, build2::test::script::script & s, build2::test::script::runner & r) Line 2910 C++
b.exe!build2::test::script::parser::execute(build2::test::script::script & s, build2::test::script::runner & r) Line 2890 C++
[Inline Frame] b.exe!build2::test::perform_script_impl(const build2::target & t, const build2::test::testscript & ts, const butl::basic_path<char,butl::dir_path_kind> & wd, const build2::test::common & c) Line 370 C++
b.exe!build2::test::rule::perform_script::::operator()(const build2::diag_frame * ds, build2::test::script::scope_state & r, const build2::target & t, const build2::test::testscript & ts, const butl::basic_path<char,butl::dir_path_kind> & wd) Line 553 C++
[Inline Frame] b.exe!build2::scheduler::async(unsigned __int64 start_count, std::atomic & task_count, build2::test::rule::perform_script:: && f, const build2::diag_frame * && a, std::reference_wrapperbuild2::test::script::scope_state && a, std::reference_wrapper && a, std::reference_wrapper && a, std::reference_wrapper<const butl::basic_path<char,butl::dir_path_kind>> && a) Line 25 C++
b.exe!build2::test::rule::perform_script(build2::action a, const build2::target & t, unsigned __int64 pass_n) Line 544 C++
[External Code]
[Inline Frame] b.exe!std::_Func_class<build2::target_state,build2::action,const build2::target &>::operator()(build2::action _Args, const build2::target & _Args) Line 969 C++
b.exe!build2::execute_recipe(build2::action t, build2::target & r, const std::function<build2::target_state (build2::action, const build2::target &)> &) Line 1020 C++
b.exe!build2::execute_impl(build2::action t, build2::target &) Line 1583 C++
[Inline Frame] b.exe!build2::execute::::operator()(const build2::diag_frame * ds, build2::target & t) Line 1696 C++
[Inline Frame] b.exe!build2::scheduler::async(unsigned __int64 start_count, std::atomic & task_count, build2::execute:: && f, const build2::diag_frame * && a, std::reference_wrapperbuild2::target && a) Line 25 C++
b.exe!build2::execute(build2::action a, const build2::target & ct, unsigned __int64 start_count, std::atomic * task_count) Line 1691 C++
[Inline Frame] b.exe!build2::execute_async(build2::action a, const build2::target & t, unsigned __int64 sc, std::atomic & tc, bool fail) Line 561 C++
b.exe!build2::execute(const butl::small_vectorbuild2::value,1 &, build2::action a, build2::action_targets & ts, unsigned short diag, bool prog) Line 349 C++
b.exe!build2::main(int argc, char * * argv) Line 1595 C++
[External Code]

@boris-kolpackov
Copy link
Contributor Author

assigned to @zmodem

@karen-arutyunov
Copy link
Mannequin

karen-arutyunov mannequin commented Feb 28, 2020

We have managed to reproduce the issue with a single (though not exactly trivial) source file.

Normally, the attached program creates a pipe, closes its ends and exits:

c:\tmp>clang++ -g test.cxx

c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
exit

However, when it is built optimized it tries to close the pipe input twice and crashes inside the _close() function:

c:\tmp>clang++ -O3 -g test.cxx

c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
fdclose: 3

The last fdclose() call is not expected to happen.
The stack trace is attached.

@karen-arutyunov
Copy link
Mannequin

karen-arutyunov mannequin commented Feb 28, 2020

@karen-arutyunov
Copy link
Mannequin

karen-arutyunov mannequin commented Feb 28, 2020

@boris-kolpackov
Copy link
Contributor Author

It would be really helpful if this bug is at least looked at before the final 10.0.0 release. Because it is only triggered by throwing an exception, this could be a latent issue present in a lot more codebases than just ours. So I took the liberty to raise the importance to release blocker.

@rnk
Copy link
Collaborator

rnk commented Feb 28, 2020

Thanks for the reproducer, I did some investigation. I think it's not that a destructor is running twice when it should not, it's that somehow a stale ofd field is being read in the ofd destructor.

The main thing that seems to cause this is SROA. We end up with a PHI on the ofd cleanuppad:
ehcleanup17: ; preds = %ehcleanup, %"?fdopen_pipe@@ya?AUfdpipe@@XZ.exit"
%ofd.sroa.11.0 = phi i32 [ -1, %ehcleanup ], [ %2, %"?fdopen_pipe@@ya?AUfdpipe@@XZ.exit" ], !dbg !​1282
%ofd.sroa.0.1 = phi i32 [ %ofd.sroa.0.0, %ehcleanup ], [ %1, %"?fdopen_pipe@@ya?AUfdpipe@@XZ.exit" ], !dbg !​1282
%10 = cleanuppad within none [], !dbg !​1332

Anything that makes the PHI go away, such as removing potentially throwing operations from run_pipe or the extra callbacks destructor, hides the problem.

I'm not sure we should hold up 10.0 for this bug, but I will keep looking into it.

@rnk
Copy link
Collaborator

rnk commented Feb 29, 2020

little more reduced
Reduced the program a bit.

@boris-kolpackov
Copy link
Contributor Author

Any progress on this? I would have liked to help but unfortunately this is out of my depth. I would be happy to do any legwork (testing, etc) if needed.

@boris-kolpackov
Copy link
Contributor Author

We've run into this issue in another, unrelated, program. I've done some experiments with results described below in hopes it will help to fix this.

Let's call the original code that triggers this A and this second program B. Let's also call the reduced version of A that is attached to this report (the latest version by Reid Kleckner) A'.

One thing that I was wondering is why nobody else seems to be running into this and one potential answer is that most (all) other projects used clang-cl rather than the vanilla driver like we do. So I tried clang-cl on A' and the result does not trigger the issue with /O2. In the end, I've tracked the difference down to -fexternc-nounwind that clang-cl passes. So if we pass this option to vanilla clang, then A' does not trigger the issue, neither with -O2 nor with -O3.

This, however, does not help A: for the original code the only thing that seems to help is lowering the optimization level to -O1.

B is in between: with -fexternc-nounwind the issue is gone with -O2 but not with -O3.

@boris-kolpackov
Copy link
Contributor Author

Still an issue in Clang 11.0.1 and all the observations regarding -fexternc-nounwind and -O2/-O3 still hold.

@rnk
Copy link
Collaborator

rnk commented Mar 1, 2021

Sorry, I was on leave over the summer, so I didn't see the 2020 June/July updates. I haven't had time to read over this again, but if the problem is related to fexternc-nounwind, doesn't that mean the program is invoking UB by throwing an exception from an extern "C" function? In any case, I'll add this to our next bug triage round and do more thorough investigation later.

@boris-kolpackov
Copy link
Contributor Author

No, an exception is not thrown from an extern "C" function. My guess is -fexternc-nounwind makes a difference because it affects the shape of the generated code (because the compiler makes different assumptions/provisions when calling extern "C" functions with/without this option).

@rnk
Copy link
Collaborator

rnk commented Mar 4, 2021

Hans said he'd take a look at this.

@boris-kolpackov
Copy link
Contributor Author

This still appears to be an issue in Clang 12.0.1 however, things have shifted around:

  1. The original reproducer nor longer triggers it.

  2. The reduced reproducer still triggers it with -O2 and -O3 provided -fexternc-nounwind is not passed.

I haven't tried the test mentioned in comment #​9 as "program B" but could do so if you think it would be helpful.

@boris-kolpackov
Copy link
Contributor Author

This bug is still present in Clang 13.0.0 with the same state as for 12.0.1 (see the previous comment):

  1. The original reproducer nor longer triggers it.

  2. The reduced reproducer still triggers it with -O2 and -O3 provided -fexternc-nounwind is not passed.

We've also hit another similar issue that is likely a manifestation of the same underlying bug in Clang (i.e., it is triggered when an exception is thrown but only in -O2/-O3 optimized code). We've tried to reduce it to something that we can attach but so far unsuccessfully (things tend to change/disappear with slight code changes). One thing that is different in this case is that if running under the Visual Studio debugger, the process gets interrupted with "Unhandled exception" that points to the throw statement that is actually handled. Also the stack trace at this point looks wrong (no main(), etc).

@rnk
Copy link
Collaborator

rnk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#45064

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c++ confirmed Verified by a second party platform:windows
Projects
None yet
Development

No branches or pull requests

4 participants