Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: 1.21 regression in test times with -race #61852

Closed
howardjohn opened this issue Aug 8, 2023 · 11 comments
Closed

cmd/go: 1.21 regression in test times with -race #61852

howardjohn opened this issue Aug 8, 2023 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. RaceDetector

Comments

@howardjohn
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.21.0 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Setup mostly empty project:

$ go mod init fake
$ cat <<EOF>test_test.go
package main

import "testing"

func TestNothing(t *testing.T) {
}
EOF

Now test combinations of -race and 1.20 and 1.21:

$ go1.21.0 test -c -race .; time ./fake.test
real    1.009
$ go1.21.0 test -c .; time ./fake.test
real    0.004
$ go1.20 test -c -race .; time ./fake.test
real    0.022
$ go1.20 test -c .; time ./fake.test
real    0.004

You can see 1.21 + -race hangs for exactly 1s after all tests complete

@howardjohn
Copy link
Contributor Author

go tool trace doesn't see the 1s delay

Strace:
strace.txt

Important part copied here:

2023-08-08T09:38:38.792942 [pid 1147219] futex(0xc000090148, FUTEX_WAKE_PRIVATE, 1) = 1
2023-08-08T09:38:38.792964 [pid 1147219] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
2023-08-08T09:38:38.792976 [pid 1147222] <... futex resumed>)      = 0
2023-08-08T09:38:38.793068 [pid 1147222] futex(0xc000090148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2023-08-08T09:38:38.793102 [pid 1147219] <... nanosleep resumed>NULL) = 0
2023-08-08T09:38:39.779020 [pid 1147219] futex(0x6f4e00, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2023-08-08T09:38:39.779184 [pid 1147218] <... nanosleep resumed>0x7ffe82549c48) = 0
2023-08-08T09:38:39.779247 [pid 1147218] exit_group(0)             = ?
2023-08-08T09:38:39.779495 [pid 1147223] <... futex resumed>)      = ?
2023-08-08T09:38:39.779535 [pid 1147222] <... futex resumed>)      = ?
2023-08-08T09:38:39.779561 [pid 1147219] <... futex resumed>)      = ?

@seankhliao
Copy link
Member

Duplicate of #61655

@seankhliao seankhliao marked this as a duplicate of #61655 Aug 8, 2023
@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
@howardjohn
Copy link
Contributor Author

I don't agree this is a duplicate nor is it resolved. #61655 was closed as a duplicate of #20364 which is from 6 years ago.

My issue is about a regression from 1.20 to 1.21. In 1.20 the test exits instantly, in 1.21 it takes 1s. This is not documented in the release notes, nor does it seem intended

@cespare
Copy link
Contributor

cespare commented Aug 8, 2023

I'm going to reopen this.

It's true that this looks like exactly the same issue as #20364. However, I tested Go releases going back several years and the last major release where I saw the 1s sleep was go 1.13.15. In go 1.14.15 and all later releases I tested before 1.21, there is no sleep. I'm not sure what changed, but the sleep has been gone long enough that I'd say this counts as a regression at this point.

I'm guessing it had to do with some race detector change this cycle but I can't locate any relevant issues/discussion with a few searches. The next step is perhaps to bisect so we can locate where it started happening. (I don't have time to do this at the moment but I can take a look later if nobody beats me to it.)

@cespare cespare reopened this Aug 8, 2023
@howardjohn
Copy link
Contributor Author

howardjohn commented Aug 8, 2023

Just a guess llvm/llvm-project@11081a6. If we hit SKIP_TEST=1 then we exit before we export atexit_sleep_ms=0.

Not sure where SKIP_TEST comes from though

edit: maybe not, I think that is setting it for the "test" itself

@howardjohn
Copy link
Contributor Author

howardjohn commented Aug 8, 2023

Actually maybe this: https://reviews.llvm.org/D112603 (llvm/llvm-project@b332134#diff-39729b589bbf9d353b9558f115735f4ec75c3f56539af5d1a2ff1c998d9fb20eR722).

flags()->atexit_sleep_ms -> flags()->atexit_sleep_ms * 1000. So before, the default was probably '1000', but it slept for 1/1000 of the time.

However, I would have thought this meant in 1.20 setting atexit_sleep_ms=1000000 would sleep for 1s but it doesn't; it seems atexit_sleep_ms does nothing at all in Go 1.20, even set to large values.

So whatever the case is, it seems more like "atexit_sleep_ms did nothing from Go 1.13 to 1.20, and Go 1.21 fixed it".

IMO a default of 0s is ideal, so explicitly setting that to 0 while keeping the fix of atexit_sleep_ms seems the best of both worlds, but I will have to defer that to maintainers.

cc @dvyukov

@seankhliao seankhliao added RaceDetector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 8, 2023
@ianlancetaylor
Copy link
Contributor

So whatever the case is, it seems more like "atexit_sleep_ms did nothing from Go 1.13 to 1.20, and Go 1.21 fixed it".

Yes.

This was fixed, accidentally, by https://go.dev/cl/486615 for #59711. When using the race detector, when the program calls os.Exit or returns from main, the race detector calls its own finalization routine. That includes sleeping for (by default) 1000 milliseconds. However, before CL 486615, the Go runtime would quickly decide to preempt the thread, and send it a SIGURG. The signal would interrupt the sleep, but the race detector doesn't notice that, and just carries on exiting the program. After CL 486615, the Go runtime thinks that the thread is making a system call, so it doesn't try to preempt it, so the sleep completes.

So for the race detector this is arguably a bug fix. The race detector now honors the atexit_sleep_ms flag documented at https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags, whereas before, for Go programs, that flag was ignored.

So this really is #20364. It's just that the sleep broke when we introduced signal-based preemption, and nobody noticed. Now it is fixed. I don't think there is anything to change in Go, so I'm closing this issue (again). Please comment if you disagree.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
@howardjohn
Copy link
Contributor Author

To be honest I think this behavior is not a good default and should be treated as new behavior, even if it was caused by fixing a bug.

However that's a personal preference so if the project doesn't agree no problem - easy enough to workaround it in every project I work on.

That being said it does seem to warrant a release note on the new behavior and how to disable it?

@mdwhatcott
Copy link

mdwhatcott commented Aug 10, 2023

@ianlancetaylor - Couple thoughts:

  • I agree with @howardjohn in suggesting that some mention of this should have been made in the Go 1.21 release notes.
  • The default value of atexit_sleep_ms is 1000ms. Not having any idea how this value was selected, it seems somewhat arbitrary. Would a shorter duration, maybe 100ms, be almost as effective, and also far less noticeable to most users? (I'm not sure how to evaluate the risks/tradoffs of different values. For example, how bad is it to just use 0 as the value?)

@ianlancetaylor
Copy link
Contributor

The race detector is built on top of Thread Sanitizer, which is what implements atexit_sleep_ms. Discussions about changing that value are better directed to the Thread Sanitizer maintainers.

@cristaloleg
Copy link

The default value of atexit_sleep_ms is 1000ms. Not having any idea how this value was selected,

@mdwhatcott see https://groups.google.com/g/golang-nuts/c/HBJQaijCwGo/m/P1NNQN7wyMAJ

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. RaceDetector
Projects
None yet
Development

No branches or pull requests

7 participants