New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizer test failures in symbolize_stack.cpp on AArch64 #55460
Comments
@llvm/issue-subscribers-bug |
@llvm/issue-subscribers-backend-aarch64 |
This appears to be a more fundamental issue, since we're now occasionally seeing this on x64 linux bots as well. Some of the failing bots only have unrelated changes, such as a change to clangd (71cb8c8) I'm looking at reproducing the issue since it also shows up on x64 linux. Since it seems to come and go, my worry is that there is some subtle race in the runtime code, or that a precondition may no longer hold because of a change elsewhere. |
So after a lot of digging we determined the root cause of the issue. The test in symbolize_stack.cpp(
The SymbolizerProcess class defines a buffer of fixed length(16K)(
However, in
The test in In our CI, though, we have some fairly long path names, and further there have been some changes to how demangling is done recently. I think the confluence of these factors started causing us to hit the test failure with more regularity in our CI. I do want to point out that this doesn't appear to be a bug in the demangler, but to be a subtle design flaw in how online symbolization works. Typically when reading into a fixed buffer, you keep working until the buffer is full (or you reach the end of the input), and then process whatever you've already received and then after you can discard the old contents, you start filing the buffer again until you've reached the end of the input. Now this is code that is doing online symbolization, and it seems unlikely to me that this code would generally be allowed to block. Given that I believe blocking would be required to operate in the way I've described, I'm unsure of the correct course of action here. Sure, we can increase the buffer size. Setting it to the size of a huge page seems like a reasonable compromise. But that only kind of punts the problem down the line. Maybe the fix here is to stop dropping the read contents, and instead allow the symbolizer to flush what it's already ready -- even though it may be incomplete. Afterall, it's already printed a warning that the buffer was to small. All of the quick solutions I can see probably render the test useless.
|
Addresses tests flakes described in #55460 The test being updated can fail in FileCheck to match when given long enough stack traces. This can be problematic when file system paths become long enough to cause the majority of the long function name to become truncated. We found in our CI that the truncated output would often fail to match, thereby causing the test to fail when it should not. Here we change the test to match on sybolizer output that should be more reliable than matching inside the long function name. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126102
7f54399 lands a suppression to remove the flake. I'm going to leave this open until we land a more thorough fix of the underlying issue. |
Addresses tests flakes described in llvm/llvm-project#55460 The test being updated can fail in FileCheck to match when given long enough stack traces. This can be problematic when file system paths become long enough to cause the majority of the long function name to become truncated. We found in our CI that the truncated output would often fail to match, thereby causing the test to fail when it should not. Here we change the test to match on sybolizer output that should be more reliable than matching inside the long function name. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126102
Addresses tests flakes described in llvm/llvm-project#55460 The test being updated can fail in FileCheck to match when given long enough stack traces. This can be problematic when file system paths become long enough to cause the majority of the long function name to become truncated. We found in our CI that the truncated output would often fail to match, thereby causing the test to fail when it should not. Here we change the test to match on sybolizer output that should be more reliable than matching inside the long function name. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126102
Fuchsia's clang CI has found a test failure in symbolize_stack.cpp for AArch64.
Due to another failure, we're not exactly sure when this first appeared, but the failing bot can be found here: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8814202726650721633/overview
For both UBSAN and LSAN, the test fails due to an unexpected warning causing matching to fail
We saw the failure disappear briefly between 1ecc3d8 and c74753f, but I don't see anything in that blame list, or following blame list (7ff7001 to ac7a9ef) that suggests this behavior should be different.
The text was updated successfully, but these errors were encountered: