Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: it looks many slice operations become 20%+ slower on tip #49744

Open
go101 opened this issue Nov 23, 2021 · 8 comments
Open

runtime: it looks many slice operations become 20%+ slower on tip #49744

go101 opened this issue Nov 23, 2021 · 8 comments

Comments

@go101
Copy link

@go101 go101 commented Nov 23, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.18-91abe4be0e Sat Nov 20 08:47:36 2021 +0000 linux/amd64

What did you do?

https://play.golang.org/p/HM5X5crNSr4

What did you expect to see?

Similar performance,

What did you see instead?

Tip is much slower for both of the benchmarks.

The following is the benchstat results:

_CreateOnOneLargeBlock-4      4.78µs ± 4%    7.50µs ± 8%  +56.85%  (p=0.000 n=10+10)
_CreateOnManySmallBlocks-4    17.5µs ±11%    21.9µs ± 1%  +25.13%  (p=0.000 n=10+10)

This could be also verified by run go test -bench=. -benchmem slice_test.go in the src/runtime dir.
Most benchmarks spend more time than v1.17.3, except several ones like BenchmarkAppend*/*.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 23, 2021

Loading

@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Nov 23, 2021

I was curious and bisected with the given test cases. I found the following:

  • Setting GOGC=off eliminates the difference between 1.17 and 1.18.
  • The beginning of the regression (about half of the perf loss) starts at CL 358674, when the pacer redesign was enabled. Setting GOEXPERIMENT=nopacerredesign recovers this loss.
  • Keeping GOEXPERIMENT=nopacerredesign and re-bisecting shows the other half of the regression coming from CL 353975.

Loading

@ALTree
Copy link
Member

@ALTree ALTree commented Nov 23, 2021

Thanks for bisecting. cc @mknyszek too, then.

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Nov 23, 2021

I suspect this is just the new minimum heap size (512 KiB). Microbenchmarks suffer because they have a small heap, so you GC more often to save memory. Instead of GOEXPERIMENT=nopacerredesign, try to modify https://cs.opensource.google/go/go/+/master:src/runtime/mgcpacer.go;l=56?q=defaultHeapMinimum&ss=go%2Fgo and see if performance recovers.

We're thinking we're going to roll back just the new minimum heap size for 1.18, and try again later.

Loading

@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Nov 23, 2021

I can confirm that restoring defaultHeapMinimum to 4 << 20 eliminates the slowdown in my testing.

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Nov 23, 2021

Cool. I think this is a non-issue for this release, but it's helped find a lot of various finalizer bugs and such in our tests (and elsewhere) so I'm leaving it in for just a little longer.

I'll leave this open until we roll back the heap minimum.

Loading

@mpx
Copy link
Contributor

@mpx mpx commented Nov 24, 2021

Random thought: If/when PGO (profile guided optimization) lands, perhaps it could be used to determine a reasonable minimum heap? This could be useful for short-lived utilities.

Loading

@aclements
Copy link
Member

@aclements aclements commented Nov 24, 2021

@mpx, that's an interesting idea. I think first we're going to try to see if we can just make the GC scale down better. My hypothesis is that the smaller minimum heap size is primarily a problem because of the high constant overheads of starting a GC (it used to be the "small heap amortization problem"—we didn't account properly for all sources of GC work and non-heap work could dominate for small heaps—but the new pacer fixes that). There's probably always going to be some amount of cost for things that genuinely run on very small heaps (after all, a smaller minimum heap trades CPU for RAM in that regime), but my instinct is that it shouldn't be nearly as high as what we're seeing. If we can fix it that way, it may simply obviate the issue.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants