Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer test failures in symbolize_stack.cpp on AArch64 #55460

Closed
ilovepi opened this issue May 14, 2022 · 5 comments
Closed

Sanitizer test failures in symbolize_stack.cpp on AArch64 #55460

ilovepi opened this issue May 14, 2022 · 5 comments
Labels
backend:AArch64 bug Indicates an unexpected problem or unintended behavior compiler-rt

Comments

@ilovepi
Copy link
Contributor

ilovepi commented May 14, 2022

Fuchsia's clang CI has found a test failure in symbolize_stack.cpp for AArch64.

Due to another failure, we're not exactly sure when this first appeared, but the failing bot can be found here: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8814202726650721633/overview

For both UBSAN and LSAN, the test fails due to an unexpected warning causing matching to fail

==180122==WARNING: Symbolizer buffer too small

We saw the failure disappear briefly between 1ecc3d8 and c74753f, but I don't see anything in that blame list, or following blame list (7ff7001 to ac7a9ef) that suggests this behavior should be different.

@ilovepi ilovepi added bug Indicates an unexpected problem or unintended behavior compiler-rt labels May 14, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented May 14, 2022

@llvm/issue-subscribers-bug

@llvmbot
Copy link
Collaborator

llvmbot commented May 14, 2022

@llvm/issue-subscribers-backend-aarch64

@ilovepi
Copy link
Contributor Author

ilovepi commented May 16, 2022

This appears to be a more fundamental issue, since we're now occasionally seeing this on x64 linux bots as well.

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8814193995900305713/overview

Some of the failing bots only have unrelated changes, such as a change to clangd (71cb8c8)

I'm looking at reproducing the issue since it also shows up on x64 linux. Since it seems to come and go, my worry is that there is some subtle race in the runtime code, or that a precondition may no longer hold because of a change elsewhere.

@ilovepi
Copy link
Contributor Author

ilovepi commented May 20, 2022

So after a lot of digging we determined the root cause of the issue. The test in symbolize_stack.cpp(

) calls a recursive template, designed to create a function with a long function name. The purpose of the test is to ensure that the symbolizer does not crash when the name is very long.

The SymbolizerProcess class defines a buffer of fixed length(16K)(

static const uptr kBufferSize = 16 * 1024;
), and uses that buffer during symbolization.

However, in ReadFromSymbolizer() (

) when reading into the buffer the symbolizer code checks if the buffer is full, and if so, basically aborts reading( ), and discards all data read so far. The buffer has the null terminator written to its first character, and a pointer to it is returned back up the call chain. Thus, when the symbolizer data is eventually flushed , it is truncated. The long function name in question can be over 45K long, far exceeding the size of the underlying buffer.

The test in symbolize_stack.cpp is trying to match a set of nested c++ vectors in FileCheck, essentially matching vector<.*vector.*vector<.*vector.*vector<. Now typically the test seems to have been lucky in how the diagnostic output makes it into FileCheck, and in general there are usually at least 5 nested vectors available in the output, and the test passes.

In our CI, though, we have some fairly long path names, and further there have been some changes to how demangling is done recently. I think the confluence of these factors started causing us to hit the test failure with more regularity in our CI.

I do want to point out that this doesn't appear to be a bug in the demangler, but to be a subtle design flaw in how online symbolization works. Typically when reading into a fixed buffer, you keep working until the buffer is full (or you reach the end of the input), and then process whatever you've already received and then after you can discard the old contents, you start filing the buffer again until you've reached the end of the input. Now this is code that is doing online symbolization, and it seems unlikely to me that this code would generally be allowed to block. Given that I believe blocking would be required to operate in the way I've described, I'm unsure of the correct course of action here.

Sure, we can increase the buffer size. Setting it to the size of a huge page seems like a reasonable compromise. But that only kind of punts the problem down the line. Maybe the fix here is to stop dropping the read contents, and instead allow the symbolizer to flush what it's already ready -- even though it may be incomplete. Afterall, it's already printed a warning that the buffer was to small.

All of the quick solutions I can see probably render the test useless.

  • increasing the buffer size means we'd need to also increase the number of recursions
  • reducing the number of recursions means we are no longer testing that long function names don't trigger a crash
  • matching on earlier function names may silently ignore crashes

ilovepi added a commit that referenced this issue May 25, 2022
Addresses tests flakes described in
#55460

The test being updated can fail in FileCheck to match when given long
enough stack traces. This can be problematic when file system paths
become long enough to cause the majority of the long function name to
become truncated. We found in our CI that the truncated output would
often fail to match, thereby causing the test to fail when it should not.

Here we change the test to match on sybolizer output that should be more
reliable than matching inside the long function name.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D126102
@ilovepi
Copy link
Contributor Author

ilovepi commented May 25, 2022

7f54399 lands a suppression to remove the flake. I'm going to leave this open until we land a more thorough fix of the underlying issue.

@ilovepi ilovepi closed this as completed in acfeb1a Jun 7, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Addresses tests flakes described in
llvm/llvm-project#55460

The test being updated can fail in FileCheck to match when given long
enough stack traces. This can be problematic when file system paths
become long enough to cause the majority of the long function name to
become truncated. We found in our CI that the truncated output would
often fail to match, thereby causing the test to fail when it should not.

Here we change the test to match on sybolizer output that should be more
reliable than matching inside the long function name.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D126102
arichardson pushed a commit to CTSRD-CHERI/compiler-rt that referenced this issue Sep 12, 2023
Addresses tests flakes described in
llvm/llvm-project#55460

The test being updated can fail in FileCheck to match when given long
enough stack traces. This can be problematic when file system paths
become long enough to cause the majority of the long function name to
become truncated. We found in our CI that the truncated output would
often fail to match, thereby causing the test to fail when it should not.

Here we change the test to match on sybolizer output that should be more
reliable than matching inside the long function name.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D126102
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 bug Indicates an unexpected problem or unintended behavior compiler-rt
Projects
None yet
Development

No branches or pull requests

3 participants