-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: why AllocsPerRun tests fails when GOMAXPROCS>1 #5000
Labels
Comments
It is flaky with old scheduler as well, here is what I've just observed on revision 15951 from Wed Feb 27 11:59:36 2013 -0800: $ ./fmt.test -test.v -test.run=Mallocs -test.cpu=1,2,3,4,5,6,7,8,9,10 === RUN TestCountMallocs --- PASS: TestCountMallocs (0.00 seconds) === RUN TestCountMallocs-2 --- PASS: TestCountMallocs-2 (0.00 seconds) === RUN TestCountMallocs-3 --- PASS: TestCountMallocs-3 (0.00 seconds) === RUN TestCountMallocs-4 --- PASS: TestCountMallocs-4 (0.00 seconds) === RUN TestCountMallocs-5 --- PASS: TestCountMallocs-5 (0.00 seconds) === RUN TestCountMallocs-6 --- PASS: TestCountMallocs-6 (0.00 seconds) === RUN TestCountMallocs-7 --- FAIL: TestCountMallocs-7 (0.00 seconds) fmt_test.go:607: Fprintf(buf, "%s"): got 1.19 allocs, want <=1 === RUN TestCountMallocs-8 --- PASS: TestCountMallocs-8 (0.00 seconds) === RUN TestCountMallocs-9 --- PASS: TestCountMallocs-9 (0.00 seconds) === RUN TestCountMallocs-10 --- PASS: TestCountMallocs-10 (0.00 seconds) FAIL |
Here it is: #0: 0x1c340 runtime.mallocgc src/pkg/runtime/zmalloc_darwin_amd64.c:36 #1: 0x1d56b runtime.settype_flush src/pkg/runtime/zmalloc_darwin_amd64.c:449 #2: 0xe0ee gc src/pkg/runtime/mgc0.c:1822 #3: 0xe01e runtime.gc src/pkg/runtime/mgc0.c:1790 #4: 0x1c51c runtime.mallocgc src/pkg/runtime/zmalloc_darwin_amd64.c:116 #5: 0x1f34b gostringsize src/pkg/runtime/zstring_darwin_amd64.c:47 #6: 0x1f95f runtime.slicebytetostring src/pkg/runtime/zstring_darwin_amd64.c:300 #7: 0x7c390 fmt.Sprintf src/pkg/fmt/print.go:229 #8: 0x3fa0f fmt_test.func·006 src/pkg/fmt/fmt_test.go:597 #9: 0x2a110 testing.AllocsPerRun src/pkg/testing/allocs.go:32 #10: 0x323ca fmt_test.TestCountMallocs src/pkg/fmt/fmt_test.go:609 #11: 0x2d02a testing.tRunner src/pkg/testing/testing.go:347 #12: 0x14b00 runtime.goexit src/pkg/runtime/proc.c:1217 |
This issue is simple to solve if we omit efficiency considerations. The fix would be to pass "true" as argument to settype_flush() in mgc0.c. However, I don't recommend this fix because it wastes memory and increases the number of system calls. The efficient solution appears to be a new memory allocator for internal use by Go's runtime with support for explicit deallocation only (like malloc+free in C, although simpler code). The allocated memory would be completely separate from garbage-collected memory. Work on this allocator cannot start before Go1.1 is released. |
>This issue is simple to solve if we omit efficiency considerations. The fix would be to pass "true" as argument to settype_flush() in mgc0.c. Yeah it does not look like a good solution. I have a very dirty "solution": https://golang.org/cl/7625044/diff/5001/src/pkg/testing/allocs.go It fixes ./fmt.test -test.v -test.run=Mallocs -test.cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 on my machine. I can think of other hacky solutions: 1. Provide testing.runtime_allocTestStart/End() which will somehow offset the stats (use system memory in settype_flush(), explicitly offset stats, etc). 2. Just explicitly offset mem stats in settype_flush, i.e. pretend that it does not allocate. |
What if we add a field to MemStats that counts the number of allocations for internal runtime reasons. Then we let testing.AllocsPerRun ignore that, perhaps with a parameter. AllocsPerRun was not in Go 1 so we can still change it. It's ugly but it could work reasonably well if we are careful about what consider an internal runtime reason--e.g., we would have to be sure that it doesn't correlate with something the program is doing. |
I have some ideas on how to eliminate settype_flush() entirely (as part of making malloc/GC faster), so perhaps it easier/simple to "undo" memstats in settype_flush() after mallocgc() pretending that it does not allocate. (but we must be careful to redo them again when the memory is freed, to not cause persistent slant) |
> 1. By eliminating settype_flush() entirely do you mean to set the typeinfo > in function mallocgc()? Yes, but I don't have the exact plan. > 2. I do not see how to implement the redo operation when the memory is > freed. Won't settype_sysfree() do? I mean we must know when the span is freed/reused. |
hit some more flakiness in the http tests that is probably this issue. https://golang.org/issue/5005?c=15 |
Another flake: http://build.golang.org/log/72e239cad5bccb88912758d623659d9231c23c4b I think we need to offset malloc stats in settype_flush() (pretending that it does not allocate) and revert it back in settype_sysfree() for now. |
My input, copied from the mailing list: AllocsPerRun is inherently flaky. There could be background stuff going on we'll never control. Perhaps we should instead write our tests not to be so sensitive (and build-breaking). I suggest all AllocsPerRun tests should be skipped if -test.short is true, so they are evaluated by the test owners but not in every build configuration in the world. |
The problem comes in with automated builds that regularly run all the tests. Flakiness causes noise that constantly needs to be triaged. Should automated builds not run the long tests? Is there some third class of tests? -test.flaky=true? In my simplistic world view, tests should either pass with any and all values of GOMAXPROCS, GOGC, GOGCTRACE, test.cpu, etc., or control for these variables, or get fixed or get deleted. |
This issue was updated by revision f578726. Should reduce the flakiness a little. Malloc counting is important to general testing but not to the build dashboard, which uses -short. R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/12866047 |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: