-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/signal: fixedbugs/issue21576.go flake on linux-ppc64le-power9osu builder #34836
Comments
One of my colleagues observed the same failure on arm64 platform, we guess that might be due to too many testcases accessing /tmp at the same time slows down the building phase, I'm also a little bit confused why not setting a larger time-out. |
Yes, I think a timeout could explain that problem. The situation would be clearer if the test printed the error value. I think it would be fine to increase the timeout. Since we expect the test to pass, we can set the timeout to be much larger. Yes, it will take longer to fail, but the length of time it takes for a test to fail is not all that interesting. Let's make it at least one minute. |
We print out the output from .CombinedOutput() which I thought would capture the actual error, but sure we can add the error value.
Sure, I've sent CL https://go-review.googlesource.com/c/go/+/200519 updating the time to 1 minute. To sidetrack a little, linux-ppc64le-power9osu has been failing with mysterious errors such as in #34658 where the source code in the test doesn't close the connection but somehow we are getting back |
Change https://golang.org/cl/200519 mentions this issue: |
@shawn-xdji can you please try that patch, keeping it at 5 * time.Second but just printing out the error value, if possible? Thanks. |
Well, in my arm device, I got a error value when run all.bash, like below. Error value is "signal: killed". Does this error value mean that go routine has been killed before checking deadlock? FAIL fixedbugs/issue21576.go 12.680s This error has a high frequency in arm device which has 200+ cores, especially when device handled a lot of jobs. Well in some arm device with 128 cores, this error almost never happened. |
@odeke-em Printing the error value helps because in this case there is no output. @dianhong01 Thanks, an error value of "signal: killed" confirms that the test timed out. (When the timeout expires, the parent will send |
Cool thanks! I've added the error value too as well as increased the timeout to 1minute. @shawn-xdji please don't hesitate to ping here more in case of anything. |
@ianlancetaylor, Hi Ian, if setting the context time-out to, say 5 seconds, and let the child 'go run' sleep for 10 seconds, the child process is killed after 10 seconds, is it due to parent sending SIGKILL at T+10s or child not responding any signal during that period? Thanks. @odeke-em @dianhong01 is the colleague I mentioned. Attaching a case revised from the one provided by @odeke-em , removing the blocking channel bypass deadlock. |
You can tell by looking at the exit status. An exit status of "signal: killed" means that it was killed by the parent process. I'm not sure what exit status will be used for a timeout, but it's definitely not "signal: killed". |
From the
linux-ppc64le-power9osu
builder (https://build.golang.org/log/8415a2011de6d02d89ecb3587ca3ac2735d86b3d):In #21576 (comment), @odeke-em notes:
Does that explain this failure?
Would it be possible to code the test in a way that does not depend on timing? (What is the harm in having a higher timeout, assuming that the test has not regressed?)
The text was updated successfully, but these errors were encountered: