Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: test failures on freebsd-arm-paulzhol due to too few samples #52656

Closed
bcmills opened this issue May 2, 2022 · 6 comments
Closed
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. OS-FreeBSD WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented May 2, 2022

greplogs -l -e 'pprof_test.go:\d+: too few samples' --since=2022-03-10
2022-05-02T04:48:54-7a22c8a/freebsd-arm-paulzhol (TestTimeVDSO)
2022-05-02T04:45:15-d872374/freebsd-arm-paulzhol (TestCPUProfile, TestCPUProfileInlining)
2022-05-01T00:05:20-edab07d/freebsd-arm-paulzhol (TestMathBigDivide, TestCPUProfileLabel)
2022-04-30T00:14:28-11a650b/freebsd-arm-paulzhol (TestCPUProfile, TestCPUProfileMultithreaded, TestMorestack)

The last similar failures in the logs before then were from #51568, which still appears to be fixed. This builder is very slow — is it possible that it is swapping so hard that it is unable to profile a running process reliably?

attn @paulzhol; CC @prattmic @rhysh

@bcmills bcmills added OS-FreeBSD arch-arm Issues solely affecting the 32-bit arm architecture. labels May 2, 2022
@bcmills bcmills added this to the Backlog milestone May 2, 2022
@paulzhol
Copy link
Member

paulzhol commented May 2, 2022

High chance it's swapping. If it's a single test, and it is just sleeping, it could be a cpufreq thing.
I've rebuilt the KVM host with HZ_1000 (was 250) and PREEMPT_VOLUNTARY (was PREEMPT_NONE), maybe it will help.

BTW, tried retrybuilds -key .gobuildkey-host-freebsd-arm-paulzhol -loghash 13036880f6517a2b6c0f759c3453aba7f5d64b6f -builder freebsd-arm-paulzhol to re-run one of the builds. I'm getting a permission error:

Restarting {Builder:freebsd-arm-paulzhol Hash:d8723745bacd1960139ea866e61377a0a75aec2f LogURL:https://build.golang.org/log/13036880f6517a2b6c0f759c3453aba7f5d64b6f}
rpc error: code = PermissionDenied desc = unexpected HTTP status code received from server: 403 (Forbidden); transport: received unexpected content-type "text/html; charset=UTF-8"

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 2, 2022
@bcmills
Copy link
Member Author

bcmills commented May 2, 2022

Thanks. We can let the builder sit for a while and then recheck whether this failure mode is still occurring.

(FWIW, it isn't necessary to run retrybuilds for this kind of failure, since these days have already been triaged and likely won't be examined again.)

@paulzhol
Copy link
Member

I still see these failures after removing much of the memory presure caused by the tmps mount #50540 (comment).
I do see concurent tests running with the runtime/pprof one. Empirically I think I saw cmd/compile/internal/test compiling away. So basically 2 cpuHogger threads compete with at least two compile threads for CPU runtime from the kernel.
Maybe there a way to isolate this test from the others?

@bcmills
Copy link
Member Author

bcmills commented May 16, 2022

It looks like the test function is explicitly retrying with longer and longer attempts to try to coax the scheduler into actually running the threads:
https://cs.opensource.google/go/go/+/master:src/runtime/pprof/pprof_test.go;l=440-474;drc=335569b59804f8d14bdb9c7ee2e8b0c2268226ae

But it also appears that maxDuration is hard-coded to 5s. Seems like it ought to just use t.Duration instead.

@gopherbot
Copy link

Change https://go.dev/cl/406614 mentions this issue: runtime/pprof: eliminate arbitrary deadline in testCPUProfile

@bcmills
Copy link
Member Author

bcmills commented May 16, 2022

I don't think these tests need to be isolated — the properties they're checking for appear to be reasonable, even under load.

However, I do think it's reasonable to eliminate the apparently-arbitrary limit on the profile duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. OS-FreeBSD WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

3 participants