encoding/json: speed up the decoding scanner #28923
#5683 covered the overall speed issues of encoding and decoding JSON. It's now closed, since the package has gotten noticeably faster in the last few Go releases. For example, after a series of patches I sent during 1.12:
Notice, however, how decoding is about five times slower than encoding. While it makes sense that decoding is more work, a 5x difference seems to point that there are some bottlenecks.
Here are the encoding top 10 functions by cpu, as reported by
It's been optimized to a point where
And here are the as reported by
The big difference here is that all 10 functions are in the json package itself. In particular, the scanner's
There's another issue about speeding up the decoder, #16212. It's about doing reflect work ahread of time, like the encoder does. However, out of that top 10, only
So I propose that we refactor or even redesign the scanner to make it faster. The main bottleneck at the moment is the
The godoc for the
I tried using a function switch, and even a function jump table, but neither gave noticeable speed-ups. In fact, they all made
I have a few ideas in mind to try to make the work per byte a bit faster, but I'm limited by the "one unknown step call per byte" design, as I tried to explain above. I understand that the current design makes the JSON scanner simpler to reason about, but I wonder if there's anything we can do to make it faster while keeping it reasonably simple.
This issue is to track small speed-ups in the scanner, but also to discuss larger refactors to it. I'm milestoning for 1.13, since at least the small speed-ups can be merged for that release.
The text was updated successfully, but these errors were encountered:
Here's a potential idea. Quoted strings are very common in JSON, so we could make the scanner tokenise those with specialised code. Only a few states are involved in strings, and they're hot paths - for example,
Of course, the disadvantage is that we break with the "one step call == one byte advanced" rule. I think we need to break that rule, if we hope to make the scanner significantly faster.
I had a realisation yesterday - the decoder scans the input bytes twice, both in
On the bright side, this means we can introduce cheap speed-ups without redesigning the scanner. The tokenization happens during the second scan, at which point there can't be any syntax errors. https://go-review.googlesource.com/c/go/+/151042 uses this for a shortcut, since we can iterate over a string literal's bytes in a much faster way. Forty lines of code gives a ~7% speed-up, which I think is worth it.
Here are the final numbers after the three CLs above:
I'll leave it here during the freeze - looking forward to feedback and reviews once the 1.13 tree opens. So far, I haven't redesigned or restructured the scanner, I've simply introduced shortcuts in the decoder.
If these kinds of optimizations are welcome, I might look into more of them for 1.13 once this first batch is reviewed.
Decoder.Decode and Unmarshal actually scan the input bytes twice - the first time to check for syntax errors and the length of the value, and the second to perform the decoding. It's in the second scan that we actually tokenize the bytes. Since syntax errors aren't a possibility, we can take shortcuts. In particular, literals such as quoted strings are very common in JSON, so we can avoid a lot of work by special casing them. name old time/op new time/op delta CodeDecoder-8 10.3ms ± 1% 9.1ms ± 0% -11.89% (p=0.002 n=6+6) UnicodeDecoder-8 342ns ± 0% 283ns ± 0% -17.25% (p=0.000 n=6+5) DecoderStream-8 239ns ± 0% 230ns ± 0% -3.90% (p=0.000 n=6+5) CodeUnmarshal-8 11.0ms ± 0% 9.8ms ± 0% -11.45% (p=0.002 n=6+6) CodeUnmarshalReuse-8 10.3ms ± 0% 9.0ms ± 0% -12.72% (p=0.004 n=5+6) UnmarshalString-8 104ns ± 0% 92ns ± 0% -11.35% (p=0.002 n=6+6) UnmarshalFloat64-8 93.2ns ± 0% 87.6ns ± 0% -6.01% (p=0.010 n=6+4) UnmarshalInt64-8 74.5ns ± 0% 71.5ns ± 0% -3.91% (p=0.000 n=5+6) name old speed new speed delta CodeDecoder-8 189MB/s ± 1% 214MB/s ± 0% +13.50% (p=0.002 n=6+6) UnicodeDecoder-8 40.9MB/s ± 0% 49.5MB/s ± 0% +20.96% (p=0.002 n=6+6) CodeUnmarshal-8 176MB/s ± 0% 199MB/s ± 0% +12.93% (p=0.002 n=6+6) Updates #28923. Change-Id: I7a5e2aef51bd4ddf2004aad24210f6f50e01eaeb Reviewed-on: https://go-review.googlesource.com/c/go/+/151042 Run-TryBot: Daniel Martí <firstname.lastname@example.org> TryBot-Result: Gobot Gobot <email@example.com> Reviewed-by: Brad Fitzpatrick <firstname.lastname@example.org>
readValue is a hot function, clocking in at ~13% flat CPU use in CodeDecoder. In particular, looping over the bytes is slow. That's partially because the code contains a bounds check at the start of the loop. The source of the problem is that scanp is a signed integer, and comes from a field, so the compiler doesn't know that it's non-negative. Help it with a simple and comparatively cheap hint. While at it, use scanp as the index variable directly, removing the need for a duplicate index variable which is later added back into scanp. name old time/op new time/op delta CodeDecoder-8 11.3ms ± 1% 11.2ms ± 1% -0.98% (p=0.000 n=9+9) name old speed new speed delta CodeDecoder-8 172MB/s ± 1% 174MB/s ± 1% +0.99% (p=0.000 n=9+9) Updates #28923. Change-Id: I138f83babdf316fc97697cc18f595c3403c1ddb7 Reviewed-on: https://go-review.googlesource.com/c/go/+/170939 Run-TryBot: Daniel Martí <email@example.com> TryBot-Result: Gobot Gobot <firstname.lastname@example.org> Reviewed-by: Brad Fitzpatrick <email@example.com>
We can work out how many bytes can be unquoted trivially in rescanLiteral, which already iterates over a string's bytes. Removing the extra loop in unquoteBytes simplifies the function and speeds it up, especially when decoding simple strings, which are common. While at it, we can remove unnecessary checks like len(s)<2 and s=='"'. Add a comment explaining why. name old time/op new time/op delta CodeDecoder-8 11.2ms ± 0% 11.1ms ± 1% -1.63% (p=0.000 n=9+10) name old speed new speed delta CodeDecoder-8 173MB/s ± 0% 175MB/s ± 1% +1.66% (p=0.000 n=9+10) Updates #28923. Change-Id: I2436a3a7f8148a2f7a6a4cdbd7dec6b32ef5e20c Reviewed-on: https://go-review.googlesource.com/c/go/+/151157 Run-TryBot: Daniel Martí <firstname.lastname@example.org> TryBot-Result: Gobot Gobot <email@example.com> Reviewed-by: Brad Fitzpatrick <firstname.lastname@example.org>
In the common case, structs have a handful of fields and most inputs match struct field names exactly. The previous code would do a linear search over the fields, stopping at the first exact match, and otherwise using the first case insensitive match. This is unfortunate, because it means that for the common case, we'd do a linear search with bytes.Equal. Even for structs with only two or three fields, that is pretty wasteful. Worse even, up until the exact match was found via the linear search, all previous fields would run their equalFold functions, which aren't cheap even in the simple case. Instead, cache a map along with the field list that indexes the fields by their name. This way, a case sensitive field search doesn't involve a linear search, nor does it involve any equalFold func calls. This patch should also slightly speed up cases where there's a case insensitive match but not a case sensitive one, as then we'd avoid calling bytes.Equal on all the fields. Though that's not a common case, and there are no benchmarks for it. name old time/op new time/op delta CodeDecoder-8 11.0ms ± 0% 10.6ms ± 1% -4.42% (p=0.000 n=9+10) name old speed new speed delta CodeDecoder-8 176MB/s ± 0% 184MB/s ± 1% +4.62% (p=0.000 n=9+10) name old alloc/op new alloc/op delta CodeDecoder-8 2.28MB ± 0% 2.28MB ± 0% ~ (p=0.725 n=10+10) name old allocs/op new allocs/op delta CodeDecoder-8 76.9k ± 0% 76.9k ± 0% ~ (all equal) Updates #28923. Change-Id: I9929c1f06c76505e5b96914199315dbdaae5dc76 Reviewed-on: https://go-review.googlesource.com/c/go/+/172918 Run-TryBot: Daniel Martí <email@example.com> TryBot-Result: Gobot Gobot <firstname.lastname@example.org> Reviewed-by: Brad Fitzpatrick <email@example.com>
This is a re-submission of CL 151157, since it was reverted in CL 190909 due to an introduced crash found by a fuzzer. The revert CL included regression tests, while this CL includes a fixed version of the original change. In particular, what we forgot in the original optimization was that we still need the length and trailing quote checks at the beginning of unquoteBytes. Without those, we could end up in a crash later on. We can work out how many bytes can be unquoted trivially in rescanLiteral, which already iterates over a string's bytes. Removing the extra loop in unquoteBytes simplifies the function and speeds it up, especially when decoding simple strings, which are common. While at it, we can remove the check that s=='"', since all call sites already meet that condition. name old time/op new time/op delta CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10) name old speed new speed delta CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10) Updates #28923. Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde Reviewed-on: https://go-review.googlesource.com/c/go/+/190659 Run-TryBot: Daniel Martí <firstname.lastname@example.org> TryBot-Result: Gobot Gobot <email@example.com> Reviewed-by: Peter Weinberger <firstname.lastname@example.org>