Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: unnecessary allocation in (*scanner).error #10335

Closed
toffaletti opened this issue Apr 3, 2015 · 12 comments

Comments

Projects
None yet
9 participants
@toffaletti
Copy link

commented Apr 3, 2015

I've got a long running process where pprof --alloc_objects reported this:

11288877697 of 18066163372 total (62.49%)
Dropped 1073 nodes (cum <= 90330816)
Showing top 40 nodes out of 301 (cum >= 135464561)
      flat  flat%   sum%        cum   cum%
 861225002  4.77%  4.77% 1518495517  8.41%  encoding/json.(*scanner).error
 561339051  3.11%  7.87%  561339051  3.11%  reflect.Value.MapIndex
 554128561  3.07% 10.94%  554128561  3.07%  reflect.New
 479534186  2.65% 13.60%  479534186  2.65%  strconv.quoteWith

The encoding/json.(*scanner).error surprised me and I went digging thinking the process was handling lots of invalid JSON. What I think is going on is the scanner allocates several temporary objects for every JSON value even when the JSON is valid just in case the input stream ends on the next step.

stateEndTop is calling (*scanner).error which does:

s.err = &SyntaxError{"invalid character " + quoteChar(c) + " " + context, s.bytes}

and quoteChar does:

s := strconv.Quote(string(c))
return "'" + s[1:len(s)-1] + "'"

I made a really nasty hack to get rid of the allocations in the common case. I present it in shame:

+var commaErr = &SyntaxError{"invalid character ',' after top-level value", 0}
+var braceErr = &SyntaxError{"invalid character '}' after top-level value", 0}
+var brackErr = &SyntaxError{"invalid character ']' after top-level value", 0}
+
 // error records an error and switches to the error state.
 func (s *scanner) error(c int, context string) int {
    s.step = stateError
+   if context == "after top-level value" && s.bytes == 0 {
+       switch c {
+       // special cases - common avoid allocation
+       case ',':
+           s.err = commaErr
+           return scanError
+       case '}':
+           s.err = braceErr
+           return scanError
+       case ']':
+           s.err = brackErr
+           return scanError
+       }
+   }
    s.err = &SyntaxError{"invalid character " + quoteChar(c) + " " + context, s.bytes}
    return scanError
 }
@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2015

Thanks! A few things:

  • We can't accept patches via issues. To contribute code, please see golang.org/doc/contribute.html
  • It seems like a better fix might just be to store c in the scanner in case it is needed.
  • Are there any benchmarks that cover this code? What is the impact on them, running with -benchmem? You'll want to use benchcmp to compare them.

@josharian josharian changed the title encoding/json excessive allocation in (*scanner).error encoding/json: unnecessary allocation in (*scanner).error Apr 3, 2015

@josharian josharian added the repo-main label Apr 3, 2015

@toffaletti

This comment has been minimized.

Copy link
Author

commented Apr 3, 2015

Thanks, I don't think that code is fit for contribution though. I will provide a benchmark to reproduce the issue as soon as I can isolate it.

@toffaletti

This comment has been minimized.

Copy link
Author

commented Apr 3, 2015

The benchmark is pretty trivial, but reproducing the allocations seems to require unmarshalling into a struct. I wasn't able to reproduce using an empty interface.

I went to http://www.json.org/example.html got a large sample of a JSON object, pasted it into https://mholt.github.io/json-to-go/ to generate a struct. Then the benchmark is simply:

func BenchmarkJSON(b *testing.B) {
    var o Autogenerated
    for n := 0; n < b.N; n++ {
        for i := 0; i < 1000; i++ {
            err := json.Unmarshal([]byte(testJSON), &o)
            if err != nil {
                b.Fatal(err)
            }
        }
    }
}
$ benchcmp old.txt new.txt 
benchmark         old ns/op     new ns/op     delta
BenchmarkJSON     133828654     137355079     +2.64%

benchmark         old allocs     new allocs     delta
BenchmarkJSON     161100         63083          -60.84%

benchmark         old bytes     new bytes     delta
BenchmarkJSON     6981457       5524665       -20.87%

I wish I had more time to work on this and properly contribute a fix, but I unfortunately don't. Apologies, this is the best I can do right now.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2015

No worries, and thanks, this is useful as-is.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2015

Here's a smaller, self-contained benchmark for whoever looks at this. (Not me.) The newline in the byte slice is important.

func BenchmarkIssue10335(b *testing.B) {
    b.ReportAllocs()
    var s struct{}
    j := []byte(`{"a":{
}}`)
    for n := 0; n < b.N; n++ {
        if err := Unmarshal(j, &s); err != nil {
            b.Fatal(err)
        }
    }
}

With the hack above:

benchmark               old ns/op     new ns/op     delta
BenchmarkIssue10335     1744          1028          -41.06%

benchmark               old allocs     new allocs     delta
BenchmarkIssue10335     10             4              -60.00%

benchmark               old bytes     new bytes     delta
BenchmarkIssue10335     432           288           -33.33%

@josharian josharian added this to the Go1.5 milestone Apr 4, 2015

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 5, 2015

@potocnyj

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2015

FWIW, I did a little experimentation with this yesterday and found that this input should also showcase the problem:

[]byte(`{"a":{ }}`)`

The newline in the input isn't the significant part, it's the fact that the two closing braces have no whitespace between them.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2015

Thanks, @potocnyj. You're right. To reproduce, you need both whitespace (newline or space as you have it) as well as the two closing braces next to each other.

@bradfitz bradfitz modified the milestones: Go1.5Maybe, Go1.5 Apr 7, 2015

@bradfitz bradfitz unassigned adg Apr 7, 2015

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 7, 2015

I retracted my change.

@peterwald

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2015

@gopherbot

This comment has been minimized.

Copy link

commented Apr 25, 2015

CL https://golang.org/cl/9074 mentions this issue.

@rsc rsc closed this in a13606e Jun 18, 2015

@mikioh mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 18, 2015

@golang golang locked and limited conversation to collaborators Jun 25, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Nov 16, 2016

CL https://golang.org/cl/33276 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.