-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: excessive allocations when using the Token() api #56299
Comments
Thanks for looking into this. I'm not sure if submitted PRs to Go before or not, if not, please have a look at https://go.dev/doc/contribute#sending_a_change_github. (Gopherbot will catch any missed steps.) |
The current That said, I don't oppose improvements to the existing API - as long as they don't break existing correctness guarantees. If you can send a change as described in https://go.dev/doc/contribute, we can discuss there. It's hard to judge whether or not this change is reasonable without seeing it in full. |
…aming mode (Token API) When the scanner is used by the Token API it always resets the state before so that the scanner behaves as if it was parsing a top-level value, which causes it to allocate and set the 'err' field because the following character is not a space. This error value is completely unnecessary because it's dropped by the next invocation of readValue(). Fixes golang#56299
Change https://go.dev/cl/443778 mentions this issue: |
What is the benchmark being run? It's not one of the standard ones in the package. |
The benchmark is run on this: https://go.dev/play/p/_QvFQQSTB0R I could add it to the PR if necessary. |
I figured it out eventually. It would help to explicitly say that: func BenchmarkDecodeJson(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
DecodeStd() // or DecodeWithDecoder
}
} that's the piece of information that was missing. |
My bad, sorry. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I have been trying to investigate the difference in performance between parsing the same schemaless JSON using Decoder.Decode() and using Decoder.Token() with a simple handler. The code can be found here: https://go.dev/play/p/_QvFQQSTB0R
What did you expect to see?
A comparable performance.
What did you see instead?
"Old" here is the version which uses
Token()
What is especially striking is the difference in the number of allocations and the allocation size. The only allocations that I make are for maps and slices, but Decode() does them too. I thought something was off and decided to investigate.
So far I have found one reason:
readValue()
to read primitive values and map keys.readValue()
resets the state of the scanner at the beginning (go/src/encoding/json/stream.go
Line 90 in 7cf06f0
stateEndValue()
, which thinks it has read the top-level value (because parseState is empty at this point):go/src/encoding/json/scanner.go
Line 281 in 7cf06f0
stateEndTop()
which checks the current character for being space (which it won't be because we're actually in the middle of a value) and then goes ahead an allocates and setsscanner.err
go/src/encoding/json/scanner.go
Line 331 in 7cf06f0
This error (and therefore the allocation) is completely unnecessary because the error gets dropped when
Token()
callsreadValue()
again.I tried to fix it by introducing a new flag in the scanner called 'inStream', then setting the flag at the beginning of
Token()
(and resetting it ondefer
) and then checking the flag instateEndValue()
to avoid allocating the error. It's not the most elegant solution, but it appears to be working:Note it's still somewhat off the
Decode()
performance but it's a significant improvement. I can submit a PR with my changes.The text was updated successfully, but these errors were encountered: