Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: thread sanitizer failing on ppc64le #35547

Open
randall77 opened this issue Nov 12, 2019 · 9 comments

Comments

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 12, 2019

Thread sanitizer is failing on linux-ppc64le-buildlet and linux-ppc64le-power9osu

The test is from the x/net repository, but I'm not sure that matters much.

From https://build.golang.org/log/7ba95758a5a8b138b85ba4a9f87a543c3177e884 :

--- FAIL: TestRace (16.94s)
    --- FAIL: TestRace/test_0 (16.04s)
        socket_test.go:363: race not detected for test 0: err:<nil> out:FATAL: ThreadSanitizer CHECK failed: ./gotsan.cpp:14108 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff)
    --- FAIL: TestRace/test_1 (0.91s)
        socket_test.go:363: race not detected for test 1: err:<nil> out:FATAL: ThreadSanitizer CHECK failed: ./gotsan.cpp:14108 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff)
FAIL
FAIL	golang.org/x/net/internal/socket	16.953s

Some internal check in tsan is failing, causing this test to fail.
We claim to support the race detector in this config, so this test should pass.

@laboger

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Nov 13, 2019

Once again, I am not able to reproduce this failure. Maybe I'm not building and running this like is done with the buildlet.

I checked out the latest golang.org/x/net and rebuilt it with upstream Go. The tests all passed on my debian buster power9 machine.

I see the test says it is testing in module mode. Not exactly sure what that means. Is there documentation on how to run all the golang.org/x repos? I did copy the old go.mod file it was using and set up GOMOD to point to it and it still passed.

@randall77

This comment has been minimized.

Copy link
Contributor Author

@randall77 randall77 commented Nov 13, 2019

I hate unreproducible bugs :(

Try checking out Go and x/net from the two hashes listed in that error log.
Then maybe try using a gomote instead of your machine.

I don't think this failure would have anything to do with modules.

Maybe just try writing a helloworld race detector example and see if it works?

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Nov 13, 2019

I ran all the race tests I know about in golang. They all passed on my power8 and power9.

in runtime/race:
go test -race


Passed 348 of 348 tests (100.00%, 0+, 0-)
0 expected failures (0 has not fail)
--- PASS: TestRace (6.42s)
=== RUN   TestIssue8102
--- PASS: TestIssue8102 (0.00s)
=== RUN   TestIssue9137
--- PASS: TestIssue9137 (0.00s)
=== RUN   TestNonGoMemory
--- PASS: TestNonGoMemory (0.00s)
=== RUN   TestRandomScheduling
--- PASS: TestRandomScheduling (0.01s)
PASS
ok  	runtime/race	15.214s

In golang.org/x/net at the commit from the failure:

:~/gocode/src/golang.org/x/net/internal/socket$ go test -test.v
=== RUN   TestSocket
=== RUN   TestSocket/Option
--- PASS: TestSocket (0.01s)
    --- PASS: TestSocket/Option (0.01s)
=== RUN   TestControlMessage
--- PASS: TestControlMessage (0.00s)
=== RUN   TestUDP
=== RUN   TestUDP/Message
=== RUN   TestUDP/Messages
--- PASS: TestUDP (0.00s)
    --- PASS: TestUDP/Message (0.00s)
    --- PASS: TestUDP/Messages (0.00s)
=== RUN   TestRace
=== RUN   TestRace/test_0
=== RUN   TestRace/test_1
--- PASS: TestRace (1.98s)
    --- PASS: TestRace/test_0 (0.98s)
    --- PASS: TestRace/test_1 (0.99s)
PASS
ok  	golang.org/x/net/internal/socket	1.997s
boger@yanny6:~/gocode/src/golang.org/x/net/internal/socket$ uname -a
Linux yanny6 4.19.0-5-powerpc64le #1 SMP Debian 4.19.37-5+deb10u1 (2019-07-19) ppc64le GNU/Linux
go tool dist test race

##### Testing race detector
ok  	runtime/race	7.897s
ok  	flag	0.025s
ok  	net	0.104s
ok  	os	0.115s
ok  	os/exec	0.083s
ok  	encoding/gob	0.036s
ok  	flag	0.028s
ok  	os/exec	0.077s

ALL TESTS PASSED (some were excluded)

I do not have gomote access.
Isn't the error message about personality related to some kind of system or kernel setting?

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Nov 13, 2019

If I look at the code from LLVM from the error output I see this:

#elif SANITIZER_PPC64V2
  // Disable ASLR for Linux PPC64LE.
  int old_personality = personality(0xffffffff);
  if (old_personality != -1 && (old_personality & ADDR_NO_RANDOMIZE) == 0) {
    VReport(1, "WARNING: Program is being run with address space layout "
               "randomization (ASLR) enabled which prevents the thread and "
               "memory sanitizers from working on powerpc64le.\n"
               "ASLR will be disabled and the program re-executed.\n");
    CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
    ReExec();
  }

So I believe that error message is saying the test is supposed to be able to set the personality to ADDR_NO_RANDOMIZE but that failed.

@ceseo

This comment has been minimized.

Copy link
Contributor

@ceseo ceseo commented Nov 13, 2019

Can someone disable ASLR in the ppc64le builders (i.e. echo 0 | sudo tee /proc/sys/kernel/randomize_va_space) and see if it works? I can't reproduce on VMs or bare metal systems and I don't have a Docker instance available at the moment.

@randall77

This comment has been minimized.

Copy link
Contributor Author

@randall77 randall77 commented Nov 13, 2019

Interesting. So that check is failing, so it fails to re-exec itself (which presumably would work?). I don't understand what that check is trying to achieve - is it just that -1 is a special case meaning "failure"? Anyone have the docs to the personality function?

Instead of fixing the reexec in the race detector, we could set ADDR_NO_RANDOMIZE on the binary that go build -race (and go run -race) generates.

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Nov 14, 2019

Look up the documentation for the Linux personality syscall (you can google it). In this case the value of ADDR_NO_RANDOMIZE is not a setting on the binary but a setting for the personality of the process. If the syscall returns -1 (EINVAL) that means the kernel could not change the personality.

To me it looks like even though the syscall fails it continues to try to ReExec regardless of the syscall result but the comments say the race detector doesn't work in ASLR mode (due to unexpected address ranges) and in this test it is supposed to detect a race condition but does not so fails.

I don't know why it would fail to change the personality in this case -- it must work in some cases because this code is executed whenever using race on ppc64le and distros have had ASLR enabled for a while. Maybe if we had the errno that would help.

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Nov 18, 2019

OK I think I found out the issue. When a Docker container is set up, there is a default seccomp security profile with a list of syscalls that are not allowed. The personality syscall is on that list. There is a way to define the container so it allows this syscall.

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Dec 4, 2019

Can I get gomote access so I can try to understand this problem a bit more. It is curious as to why some race tests work fine (I think they must all have to do the personality syscall) but this one does not. Also I am not able to reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.