Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rtsan] Restrict arches and disable android #98268

Merged
merged 3 commits into from
Jul 10, 2024

Conversation

cjappl
Copy link
Contributor

@cjappl cjappl commented Jul 10, 2024

Follow on to #92460 (reporting old android builds failing) and #98264 (powerpc failing to discover gtests).

Getting back to stability by getting back down to a very basic set of supported arches and OS's. In the future if there is demand we can build it back up. Especially when I understand how to test these systems, or have people who want to do the work on them.

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

CC @vitalybuka @glandium

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

I am signing off for the night here in a sec, back on tomorrow morning to monitor for more failures and issues.

If someone has a better idea than "disable all architectures" it seems like everything is working on all these archs, except for powerpc is failing to discover gtests. I don't know if we should go conservative (disable everything) or if we have a more precise solution to this problem. This review is the "big hammer"

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

Alright, I think there is still something fishy going on. There are some errors coming to my inbox that look like this:

Step 12 (ninja check 2) failure: stage 2 checked (failure)
******************** TEST 'RealtimeSanitizer-Unit :: ./Rtsan-aarch64-NoInstTest/failed_to_discover_tests_from_gtest' FAILED ********************
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'RealtimeSanitizer-Unit :: ./Rtsan-s390x-NoInstTest/failed_to_discover_tests_from_gtest' FAILED ********************

Anyone have any bright ideas as to why on some systems these tests would be non-discoverable? Digging into it now, but open to any theories

Here is a representative build showing the problem:
https://lab.llvm.org/buildbot/#/builders/98/builds/119

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

Is there any better way of troubleshooting issues like this? (on less common archs, different OS's that I don't commonly have access to). The approach of "submit something, and see what blows up, then respond ASAP to fix it" is more reactive than proactive.

Any advice appreciated, I apologize for the newbie mistakes, still getting my sea legs under me in LLVM world.

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

Alright, I cannot reproduce on my ubuntu docker image, nor my MacOS laptop. Any guidance appreciated on how to more reasonably reproduce and test changes. (Or any specific theories on why gtest discovery may be grumpy on some systems)

I have brought the other RTSan codeowner @davidtrevelyan up to speed on everything going on. I am traveling for the next few days, so I'll have spotty availability. Please ping both of us on an issue if you want it to be visible as soon as possible.

Thank you all, especially @vitalybuka for your patience hanging in there through these growing pains.

@vitalybuka vitalybuka self-requested a review July 10, 2024 18:28
@vitalybuka
Copy link
Collaborator

LGTM

@vitalybuka vitalybuka merged commit b81fcd0 into llvm:main Jul 10, 2024
6 checks passed
@cjappl cjappl deleted the RestrictArchesAndNoAndroid branch July 10, 2024 19:12
@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

1 Failure so far from the build bot. CC @davidtrevelyan , I'm disappearing for my trip very soon

The Buildbot has detected a new failure on builder clang-aarch64-sve-vla while building compiler-rt.


BUILD FAILED: failed stage 1 checked (failure)

Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'RealtimeSanitizer-Unit :: ./Rtsan-aarch64-NoInstTest/failed_to_discover_tests_from_gtest' FAILED ********************

********************

vitalybuka added a commit that referenced this pull request Jul 10, 2024
@dyung
Copy link
Collaborator

dyung commented Jul 10, 2024

1 Failure so far from the build bot. CC @davidtrevelyan , I'm disappearing for my trip very soon

The Buildbot has detected a new failure on builder clang-aarch64-sve-vla while building compiler-rt.


BUILD FAILED: failed stage 1 checked (failure)

Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'RealtimeSanitizer-Unit :: ./Rtsan-aarch64-NoInstTest/failed_to_discover_tests_from_gtest' FAILED ********************

********************

I'm trying to stabilize one of my bots (https://lab.llvm.org/buildbot/#/builders/174) and while testing the ToT offline, I am seeing this test fail as well on it. It was previously failing last night until the RTSAN feature was disabled. The failures since then are unrelated MemProfiler tests that are failing that I am trying to get fixed, but just wanted to give you a heads up that I am also seeing the failure on my bot.

@chapuni
Copy link
Contributor

chapuni commented Jul 10, 2024

I see the failure as well in my private builders (aarch64-ubuntu and x86_64-ubuntu).

@cjappl
Copy link
Contributor Author

cjappl commented Jul 10, 2024

If someone has a reproducible way to get it, and can propose a fix we would be eternally grateful!

We have yet to see it on our local builds.

@dyung
Copy link
Collaborator

dyung commented Jul 10, 2024

If someone has a reproducible way to get it, and can propose a fix we would be eternally grateful!

We have yet to see it on our local builds.

My bot is a fairly standard Ubuntu 20.04 x86_64 machine. Do you have a similarly setup machine? Running the cmake and build commands might work to try and reproduce the issue.

@Caslyn
Copy link
Contributor

Caslyn commented Jul 10, 2024

Hi @cjappl - would you be able to revert this change while digging into it? We are seeing this error on our x64/arm64 clang builders as well:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8742774943333305809/overview

@dyung
Copy link
Collaborator

dyung commented Jul 10, 2024

If someone has a reproducible way to get it, and can propose a fix we would be eternally grateful!
We have yet to see it on our local builds.

My bot is a fairly standard Ubuntu 20.04 x86_64 machine. Do you have a similarly setup machine? Running the cmake and build commands might work to try and reproduce the issue.

Taking a quick look at my bot, trying to run the unittest binary immediately results in a segfault:

~/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/lib/rtsan/tests$ ./Rtsan-x86_64-NoInstTest 
Segmentation fault (core dumped) 

This seems to explain why we see this output. From llvm/utils/lit/lit/formats/googletest.py, there is this comment on line 126:

                    # This doesn't look like a valid gtest file.  This can
                    # have a number of causes, none of them good.  For
                    # instance, we could have created a broken executable.
                    # Alternatively, someone has cruft in their test
                    # directory.  If we don't return a test here, then no
                    # failures will get reported, so return a dummy test name
                    # so that the failure is reported later.

@glandium
Copy link
Contributor

I haven't actually bisected, but I suppose this caused this new bustage when building compiler-rt for Windows targets:

CMake Error at cmake/Modules/AddCompilerRT.cmake:357 (add_library):
  Error evaluating generator expression:

    $<TARGET_OBJECTS:RTLSanCommon.x86_64>

  Objects of target "RTLSanCommon.x86_64" referenced but no such target
  exists.
Call Stack (most recent call first):
  lib/asan/CMakeLists.txt:225 (add_compiler_rt_runtime)

chapuni added a commit that referenced this pull request Jul 11, 2024
Some failures in Rtsan-aarch64-NoInstTest have been reported.

This reverts commit ed17431.
This reverts commit b81fcd0.
@vitalybuka
Copy link
Collaborator

Sorry, RTLSanCommon is mine and unrelated. Should be fixed unready

@vitalybuka
Copy link
Collaborator

@chapuni Thanks for the revert!
@cjappl has no committer access and I didn't pay attention to the thread, fixing lsan related issues.

@chapuni
Copy link
Contributor

chapuni commented Jul 11, 2024

Reverted.

Could anyone show me the command line of linking the unittest with -v -Wl,-v in a succeeded case?

@davidtrevelyan
Copy link
Contributor

Reverted.

Could anyone show me the command line of linking the unittest with -v -Wl,-v in a succeeded case?

Many thanks @chapuni for your help on this. My link command was generated by cmake and is as follows (I added the extra flags you suggested manually at the end):

cd /host/build_ubuntu/projects/compiler-rt/lib/rtsan/tests && ../../../../../bin/clang++ RtsanNoInstTestObjects.rtsan_preinit.cpp.aarch64.o RtsanNoInstTestObjects.rtsan_test_context.cpp.aarch64.o RtsanNoInstTestObjects.rtsan_test_main.cpp.aarch64.o RtsanNoInstTestObjects.gtest-all.cc.aarch64.o RtsanNoInstTestObjects.gmock-all.cc.aarch64.o /host/build_ubuntu/projects/compiler-rt/lib/rtsan/tests/libRTRtsanTest.aarch64.a -o /host/build_ubuntu/projects/compiler-rt/lib/rtsan/tests/./Rtsan-aarch64-NoInstTest -resource-dir=/host/build_ubuntu/./lib/../lib/clang/19 -Wl,-rpath,/host/build_ubuntu/./lib/../lib/clang/19/lib/aarch64-unknown-linux-gnu -lstdc++ -no-pie -ldl -lrt -lm -pthread -march=armv8-a -v -Wl,-v

I get the following output:

clang version 19.0.0git
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /host/build_ubuntu/bin
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/13
Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/ld" -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux -dynamic-linker /lib/ld-linux-aarch64.so.1 -o /host/build_ubuntu/projects/compiler-rt/lib/rtsan/tests/./Rtsan-aarch64-NoInstTest /lib/aarch64-linux-gnu/crt1.o /lib/aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/13/crtbegin.o -L/host/build_ubuntu/./lib/../lib/clang/19/lib/aarch64-unknown-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/13 -L/lib/aarch64-linux-gnu -L/usr/lib/aarch64-linux-gnu -L/lib -L/usr/lib RtsanNoInstTestObjects.rtsan_preinit.cpp.aarch64.o RtsanNoInstTestObjects.rtsan_test_context.cpp.aarch64.o RtsanNoInstTestObjects.rtsan_test_main.cpp.aarch64.o RtsanNoInstTestObjects.gtest-all.cc.aarch64.o RtsanNoInstTestObjects.gmock-all.cc.aarch64.o /host/build_ubuntu/projects/compiler-rt/lib/rtsan/tests/libRTRtsanTest.aarch64.a -rpath /host/build_ubuntu/./lib/../lib/clang/19/lib/aarch64-unknown-linux-gnu -lstdc++ -ldl -lrt -lm -v -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /usr/lib/gcc/aarch64-linux-gnu/13/crtend.o /lib/aarch64-linux-gnu/crtn.o
GNU ld (GNU Binutils for Ubuntu) 2.42

and the Rtsan-aarch64-NoInstTest executable runs (and passes) just fine. I'm using an ubuntu:latest image, with the default apt installations of cmake and build-essential packages, on an arm64 Darwin host machine. I'm going to investigate soon whether I can reproduce the issue on an older version of ubuntu, 20.04

@davidtrevelyan
Copy link
Contributor

A quick update:

  1. I can now reproduce the segfault running Rtsan-aarch64-NoInstTest by using an ubuntu:20.04 image with cmake 3.24, but I cannot reproduce the segfault on ubuntu:latest.
  2. Trapping the segfault in the debugger, I can see that dlsym is segfaulting during the first call to __interception::InterceptFunction, which is triggered by the .preinit_array when we INTERCEPT_FUNCTION(calloc);. The arguments to dlsym appear to be OK: dlsym(RTLD_NEXT, "calloc").

Has anyone seen this sort of thing before?

Here's my backtrace for reference:

* thread #1, name = 'Rtsan-aarch64-N', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x0000fffff7dd48b4 libdl.so.2`___lldb_unnamed_symbol13$$libdl.so.2 + 236
    frame #2: 0x0000fffff7dd4290 libdl.so.2`dlsym + 104
    frame #3: 0x000000000044ff38 Rtsan-aarch64-NoInstTest`__interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) at interception_linux.cpp:42:21
    frame #4: 0x000000000044ff34 Rtsan-aarch64-NoInstTest`__interception::InterceptFunction(name="calloc", ptr_to_real=0x00000000004a23c8, func=4517200, trampoline=4517200) at interception_linux.cpp:61
    frame #5: 0x000000000044f9ac Rtsan-aarch64-NoInstTest`__rtsan::InitializeInterceptors() at rtsan_interceptors.cpp:347:3
    frame #6: 0x0000fffff7fdaa3c ld-2.31.so`___lldb_unnamed_symbol57$$ld-2.31.so + 268

@davidtrevelyan
Copy link
Contributor

@cjappl and I have drafted a fix for this, see above! Seems like dlsym was trying to call calloc during the interceptor initialization. This problem is solved in the other sanitizers so our solution is essentially the same as those.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Follow on to llvm#92460 (reporting old android builds failing) and llvm#98264
(powerpc failing to discover gtests).

Getting back to stability by getting back down to a very basic set of
supported arches and OS's. In the future if there is demand we can build
it back up. Especially when I understand how to test these systems, or
have people who want to do the work on them.

---------

Co-authored-by: David Trevelyan <david.trevelyan@gmail.com>
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Some failures in Rtsan-aarch64-NoInstTest have been reported.

This reverts commit ed17431.
This reverts commit b81fcd0.
MaskRay pushed a commit that referenced this pull request Jul 22, 2024
…it to avoid segfault in dlsym (#98679)

Follows #98268 with a fix for a
segfault during preinit on `ubuntu:20.04` environments. Previously,
`rtsan` was not handling the situation where `dlsym` calls `calloc`
during the interceptors initialization, resulting in a call to a
function at a null address.

@cjappl and I took inspiration from the solution in `nsan`, but we
re-used the sanitizer internal allocator instead of our own static
buffer. This PR also re-enables the existing non-instrumented `rtsan`
tests for `x86_64` and `arm64` architectures.

---------

Co-authored-by: Chris Apple <cja-private@pm.me>
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
…it to avoid segfault in dlsym (llvm#98679)

Follows llvm#98268 with a fix for a
segfault during preinit on `ubuntu:20.04` environments. Previously,
`rtsan` was not handling the situation where `dlsym` calls `calloc`
during the interceptors initialization, resulting in a call to a
function at a null address.

@cjappl and I took inspiration from the solution in `nsan`, but we
re-used the sanitizer internal allocator instead of our own static
buffer. This PR also re-enables the existing non-instrumented `rtsan`
tests for `x86_64` and `arm64` architectures.

---------

Co-authored-by: Chris Apple <cja-private@pm.me>
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…it to avoid segfault in dlsym (#98679)

Summary:
Follows #98268 with a fix for a
segfault during preinit on `ubuntu:20.04` environments. Previously,
`rtsan` was not handling the situation where `dlsym` calls `calloc`
during the interceptors initialization, resulting in a call to a
function at a null address.

@cjappl and I took inspiration from the solution in `nsan`, but we
re-used the sanitizer internal allocator instead of our own static
buffer. This PR also re-enables the existing non-instrumented `rtsan`
tests for `x86_64` and `arm64` architectures.

---------

Co-authored-by: Chris Apple <cja-private@pm.me>

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants