Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Excessive allocations when using the Token() API #56299

Open
dop251 opened this issue Oct 18, 2022 · 7 comments · May be fixed by #56307
Open

encoding/json: Excessive allocations when using the Token() API #56299

dop251 opened this issue Oct 18, 2022 · 7 comments · May be fixed by #56307
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dop251
Copy link

dop251 commented Oct 18, 2022

What version of Go are you using (go version)?

$ go version
go version
go version go1.19 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/dop/Library/Caches/go-build"
GOENV="/Users/dop/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/dop/Documents/workspace/pkg/mod"
GOOS="darwin"
GOPATH="/Users/dop/Documents/workspace"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.19/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.19/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/dop/Documents/workspace/go/test/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/26/dlsc0mm136n0sdrsvb_p90y80000gn/T/go-build87975269=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I have been trying to investigate the difference in performance between parsing the same schemaless JSON using Decoder.Decode() and using Decoder.Token() with a simple handler. The code can be found here: https://go.dev/play/p/_QvFQQSTB0R

What did you expect to see?

A comparable performance.

What did you see instead?

name          old time/op    new time/op    delta
DecodeJson-4    21.7µs ± 2%     5.5µs ± 2%  -74.66%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
DecodeJson-4    6.09kB ± 0%    2.29kB ± 0%  -62.40%  (p=0.000 n=9+9)

name          old allocs/op  new allocs/op  delta
DecodeJson-4       211 ± 0%        33 ± 0%  -84.36%  (p=0.000 n=10+10)

"Old" here is the version which uses Token()

What is especially striking is the difference in the number of allocations and the allocation size. The only allocations that I make are for maps and slices, but Decode() does them too. I thought something was off and decided to investigate.

So far I have found one reason:

  • Token() uses readValue() to read primitive values and map keys. readValue() resets the state of the scanner at the beginning (
    dec.scan.reset()
    )
  • After the value is read the scanner ends up in stateEndValue(), which thinks it has read the top-level value (because parseState is empty at this point):
    if n == 0 {
  • It then calls stateEndTop() which checks the current character for being space (which it won't be because we're actually in the middle of a value) and then goes ahead an allocates and sets scanner.err
    s.error(c, "after top-level value")

This error (and therefore the allocation) is completely unnecessary because the error gets dropped when Token() calls readValue() again.

I tried to fix it by introducing a new flag in the scanner called 'inStream', then setting the flag at the beginning of Token() (and resetting it on defer) and then checking the flag in stateEndValue() to avoid allocating the error. It's not the most elegant solution, but it appears to be working:

name          old time/op    new time/op    delta
DecodeJson-4    21.7µs ± 2%    12.4µs ± 2%  -42.83%  (p=0.000 n=10+9)

name          old alloc/op   new alloc/op   delta
DecodeJson-4    6.09kB ± 0%    3.68kB ± 0%  -39.54%  (p=0.000 n=9+9)

name          old allocs/op  new allocs/op  delta
DecodeJson-4       211 ± 0%        76 ± 0%  -63.98%  (p=0.000 n=10+10)

Note it's still somewhat off the Decode() performance but it's a significant improvement. I can submit a PR with my changes.

@dr2chase
Copy link
Contributor

dr2chase commented Oct 18, 2022

Thanks for looking into this. I'm not sure if submitted PRs to Go before or not, if not, please have a look at https://go.dev/doc/contribute#sending_a_change_github. (Gopherbot will catch any missed steps.)

@dsnet and/or @mvdan , can you have a look at this?

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 18, 2022
@mvdan
Copy link
Member

mvdan commented Oct 18, 2022

The current Token API is slow due to its nature. See #40128 for a proposal to change the API.

That said, I don't oppose improvements to the existing API - as long as they don't break existing correctness guarantees. If you can send a change as described in https://go.dev/doc/contribute, we can discuss there. It's hard to judge whether or not this change is reasonable without seeing it in full.

dop251 added a commit to dop251/go that referenced this issue Oct 18, 2022
…aming mode (Token API)

When the scanner is used by the Token API it always resets the state before so that the scanner behaves as if it was parsing a top-level value,
which causes it to allocate and set the 'err' field because the following character is not a space. This error value is completely unnecessary
because it's dropped by the next invocation of readValue().

Fixes golang#56299
@gopherbot
Copy link

gopherbot commented Oct 18, 2022

Change https://go.dev/cl/443778 mentions this issue: encoding/json: reduce the number of allocations when decoding in streaming mode (Token API)

@dsnet
Copy link
Member

dsnet commented Oct 18, 2022

What is the benchmark being run? It's not one of the standard ones in the package.

@dop251
Copy link
Author

dop251 commented Oct 18, 2022

The benchmark is run on this: https://go.dev/play/p/_QvFQQSTB0R

I could add it to the PR if necessary.

@dsnet
Copy link
Member

dsnet commented Oct 18, 2022

I figured it out eventually. It would help to explicitly say that:

func BenchmarkDecodeJson(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		DecodeStd() // or DecodeWithDecoder
	}
}

that's the piece of information that was missing.

@dop251
Copy link
Author

dop251 commented Oct 18, 2022

My bad, sorry.

@seankhliao seankhliao added this to the Backlog milestone Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
6 participants