Join GitHub today
encoding/json: speed up the decoding scanner #28923
#5683 covered the overall speed issues of encoding and decoding JSON. It's now closed, since the package has gotten noticeably faster in the last few Go releases. For example, after a series of patches I sent during 1.12:
Notice, however, how decoding is about five times slower than encoding. While it makes sense that decoding is more work, a 5x difference seems to point that there are some bottlenecks.
Here are the encoding top 10 functions by cpu, as reported by
It's been optimized to a point where
And here are the as reported by
The big difference here is that all 10 functions are in the json package itself. In particular, the scanner's
There's another issue about speeding up the decoder, #16212. It's about doing reflect work ahread of time, like the encoder does. However, out of that top 10, only
So I propose that we refactor or even redesign the scanner to make it faster. The main bottleneck at the moment is the
The godoc for the
I tried using a function switch, and even a function jump table, but neither gave noticeable speed-ups. In fact, they all made
I have a few ideas in mind to try to make the work per byte a bit faster, but I'm limited by the "one unknown step call per byte" design, as I tried to explain above. I understand that the current design makes the JSON scanner simpler to reason about, but I wonder if there's anything we can do to make it faster while keeping it reasonably simple.
This issue is to track small speed-ups in the scanner, but also to discuss larger refactors to it. I'm milestoning for 1.13, since at least the small speed-ups can be merged for that release.
referenced this issue
Nov 22, 2018
Here's a potential idea. Quoted strings are very common in JSON, so we could make the scanner tokenise those with specialised code. Only a few states are involved in strings, and they're hot paths - for example,
Of course, the disadvantage is that we break with the "one step call == one byte advanced" rule. I think we need to break that rule, if we hope to make the scanner significantly faster.
I had a realisation yesterday - the decoder scans the input bytes twice, both in
On the bright side, this means we can introduce cheap speed-ups without redesigning the scanner. The tokenization happens during the second scan, at which point there can't be any syntax errors. https://go-review.googlesource.com/c/go/+/151042 uses this for a shortcut, since we can iterate over a string literal's bytes in a much faster way. Forty lines of code gives a ~7% speed-up, which I think is worth it.
Here are the final numbers after the three CLs above:
I'll leave it here during the freeze - looking forward to feedback and reviews once the 1.13 tree opens. So far, I haven't redesigned or restructured the scanner, I've simply introduced shortcuts in the decoder.
If these kinds of optimizations are welcome, I might look into more of them for 1.13 once this first batch is reviewed.