Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: decreasing heap makes performance worse #27545

LK4D4 opened this issue Sep 6, 2018 · 4 comments

runtime: decreasing heap makes performance worse #27545

LK4D4 opened this issue Sep 6, 2018 · 4 comments


Copy link

@LK4D4 LK4D4 commented Sep 6, 2018

Hello, I'm trying to mitigate #18155 - trying to remove some persistent buffers, and encountered something that I don't understand.

What version of Go are you using (go version)?

Tried Go 1.10.4 and Go 1.11

Does this issue reproduce with the latest release?


What operating system and processor architecture are you using (go env)?


What did you do?

So, I have two branches, diff between them looks like

diff --git a/go/mysql/conn.go b/go/mysql/conn.go
index 2ac69b54a..53f518d62 100644
--- a/go/mysql/conn.go
+++ b/go/mysql/conn.go
@@ -166,7 +166,7 @@ type Conn struct {
        // Call startEphemeralPacket(length) to get a buffer. If length
        // is smaller or equal than connBufferSize-4, this buffer will be used.
        // Otherwise memory will be allocated for it.
-       buffer []byte
+       //buffer []byte

        // Keep track of how and of the buffer we allocated for an
        // ephemeral packet on the read and write sides.
@@ -179,6 +179,7 @@ type Conn struct {
        // allocated temporary buffers for writes and reads respectively.
        currentEphemeralWriteBuffer *[]byte
        currentEphemeralReadBuffer  *[]byte
+       //_                           [connBufferSize]byte

 // bufPool is used to allocate and free buffers in an efficient way.
@@ -193,7 +194,7 @@ func newConn(conn net.Conn) *Conn {
                reader:   bufio.NewReaderSize(conn, connBufferSize),
                writer:   bufio.NewWriterSize(conn, connBufferSize),
                sequence: 0,
-               buffer:   make([]byte, connBufferSize),
+               //buffer:   make([]byte, connBufferSize),

That is - I'm removing unused buffer field which is instantiated with 16kb []byte slice. This makes performance worse: benchmark shows 10-20% increase in running time.
Cpu profile shows that version without buffer spends much more time in function gcAssistAlloc. And blocking profile shows that much more time spent in runtime.gcParkAssist.

I wrote a script to reproduce the problem (it's not that isolated code, sadly :():

#!/usr/bin/env bash
set -e
set -x

mkdir -p "$TMP_GOPATH/src/"
mkdir -p "$RESULT_PATH"
if [ ! -d "$REPO_PATH" ]; then
        git clone "$REPO_PATH"

go get
go get

"$TMP_GOPATH/bin/govendor" sync
git checkout origin/test_branch
go test -run=^$ -bench=BenchmarkParallelMediumQueries -benchmem -cpuprofile="$RESULT_PATH/good.cpu" -trace="$RESULT_PATH/good.trace" ./go/mysql > "$RESULT_PATH/good.txt"
git checkout origin/test_branch_no_buffer
go test -run=^$ -bench=BenchmarkParallelMediumQueries -benchmem -cpuprofile="$RESULT_PATH/bad.cpu" -trace="$RESULT_PATH/bad.trace" ./go/mysql > "$RESULT_PATH/bad.txt"

"$TMP_GOPATH/bin/benchcmp" "$RESULT_PATH/good.txt" "$RESULT_PATH/bad.txt"

Copy it somewhere, chmod +x and run. It will produce benchcmp output and CPU profiles and traces in /tmp/gc_vitess_result. govendor sync might take some time, but eventually it'll do it. You might need to run it couple of times(to warm things up) before it starts to produce consistent result.

Also, you might notice line + //_ [connBufferSize]byte in my diff - if I uncomment it, then there is no performance hit. I tried to allocate the same amount of memory in benchmark function itself with no luck.
When I disable GC - performance becomes better-same.

Also, there is another two fields in this struct - *bufio.Reader and *bufio.Writer, removal of them(pooling them from sync.Pool actually) has same consequences, removal of all three - makes benchmark 50-70% worse.
Let me know if I could provide more info. I'll try to write more isolated benchmark in the meantime.

Benchmark code

What did you expect to see?

Performance is better of the same after removing allocation.

What did you see instead?

Performance is worse after removing allocation.

@RLH @aclements I would appreciate any input on this problem.

/cc @Tahara @sougou

@andybons andybons changed the title [GC] Decreasing heap makes performance worse runtime: decreasing heap makes performance worse Sep 6, 2018
@andybons andybons added this to the Unplanned milestone Sep 6, 2018
Copy link

@aclements aclements commented Sep 6, 2018

Hi @LK4D4. Could you post the output of GODEBUG=gctrace=1 for both benchmark configurations?

I'm particularly wondering about the heap size and how it compares. If the heap is relatively small, there's a known amortization failure (#19839 and #23044) that can make GC more expensive on smaller heaps.

Copy link
Contributor Author

@LK4D4 LK4D4 commented Sep 6, 2018

Here are gctraces.
Yeah, looks like the heap of the second one is 6Mb smaller and they both quite small. I'll close this as a duplicate then.

Copy link

@aclements aclements commented Sep 6, 2018

Those look like just benchmark results and not gctraces.

But if you confirmed that the heaps are small, then, yeah, I'd say this is the GC amortization failure.

Copy link
Contributor Author

@LK4D4 LK4D4 commented Sep 6, 2018

@aclements sorry :/ Have so many different outputs at this point.
Closing as duplicate of #19839

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.