Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: new MarshalAppender interface to make custom marshallers more efficient #34701

Open
philpearl opened this issue Oct 4, 2019 · 9 comments

Comments

@philpearl
Copy link
Contributor

commented Oct 4, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes. And with the head of the master branch.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/phil/Library/Caches/go-build"
GOENV="/Users/phil/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/phil/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0j/nhsrh_152tg44s03bvq7mrh80000gn/T/go-build757131500=/tmp/go-build -gno-record-gcc-switches -fno-common"

Problem

If you marshal a struct with time.Time field in it there are additional allocations per time field. There appear to be 3 reasons for this.

  1. the json.Marshaler interface forces you to allocate a []byte to return the result rather than following an 'append' or 'io.Writer' pattern
  2. the json.Marshaler interface isn't trusted to return good JSON, so there's additional overhead validating and compacting the returned data when copying it into the underlying buffer
  3. If the MarshalJSON function has a non-pointer receiver there's an allocation creating the interface value to call the function

These inefficiencies exist for any type that implements MarshalJSON, but time.Time is a big issue in code I've been involved with.

Proposal

I think we can remove the first 2 of these by creating a new interface that follows the append pattern and trusts implementations to create compact valid JSON.

// MarshalAppender is implemented by types that can marshal themselves into compact and
// valid JSON. Implementations should append their JSON to the data parameter and return 
// the resulting byte slice. MarshalAppender can be implemented more efficiently than
// Marshaler. If a type implements both Marshaler and MarshalAppender then 
// MarshalAppender is used in preference.
type MarshalAppender interface {
	MarshalAppendJSON(data []byte) ([]byte, error)
}

I think an "Append" model is preferable to using io.Writer as the standard library has many append methods, e.g. time.Time has AppendFormat, and there are many Append functions in strconv. I suspect that means an interface following the append pattern will be less surprising and easier to use efficiently.

I've knocked together a prototype implementation with the following results on a simple benchmark (interestingly I can't find a benchmark that covers using a custom marshaler in the existing codebase).

benchstat before.txt after.txt 
name                       old time/op    new time/op    delta
MarshalMarshaler-8            128ns ± 0%      94ns ± 8%  -26.25%  (p=0.000 n=7+8)
MarshalMarshalerEncoder-8     132ns ± 3%      90ns ± 1%  -32.12%  (p=0.001 n=7+6)

name                       old alloc/op   new alloc/op   delta
MarshalMarshaler-8            96.0B ± 0%     24.0B ± 0%  -75.00%  (p=0.000 n=8+8)
MarshalMarshalerEncoder-8     80.0B ± 0%      8.0B ± 0%  -90.00%  (p=0.000 n=8+8)

name                       old allocs/op  new allocs/op  delta
MarshalMarshaler-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=8+8)
MarshalMarshalerEncoder-8      3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=8+8)

I've be very pleased to attempt to contribute this if the proposal is accepted.

Apologies if I've not followed the proposal process properly - I don't know what I'm doing!

@gopherbot gopherbot added this to the Proposal milestone Oct 4, 2019
@gopherbot gopherbot added the Proposal label Oct 4, 2019
@philpearl philpearl closed this Oct 4, 2019
@philpearl philpearl reopened this Oct 4, 2019
@networkimprov

This comment has been minimized.

Copy link

commented Oct 7, 2019

cc @mvdan

@wI2L

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

Regarding the validation and compaction that is applied to the JSON data returned by MashalJSON implementations, I think this should still be done, even for an Append model.
However, the ability to disable it may be addressed separately, perhaps with an encoder's option, just like we are already doing it for HTML characters escaping.

@philpearl

This comment has been minimized.

Copy link
Contributor Author

commented Oct 11, 2019

@wI2L could you expand on why it's necessary? Folk can always validate their JSON themselves using json.Valid. Is there a history of people producing bad marshallers that this was introduced to address?

@wI2L

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

@philpearl AFAIK, there is no mention of this behavior in the documentation, neither in Marshal or Marshaler comments. I presume this is not an oversight, but rather an implicit behavior that is inherent to the fact that Mashal is supposed/expected to return valid JSON, but don't quote me on that.
My thought is that not doing validation/compaction of the data returned by the implementations of the interface you propose would create a gap with the existing, which would require to document it and clearly point that the behavior is not the same.
With that said, I do however think that we should be able to deactivate this behavior when performance matter.

@lofcek

This comment has been minimized.

Copy link

commented Oct 12, 2019

When I read it for a first time I tough, that is a bright idea. However I changed my mind.

  • Similar pattern for Unmarshaler would not work, would it?
  • For parsing small json files is current json.Marshaler good enough. For bigger (let say kilobytes) is your approach better. If result don't have to be in continuous memory I could prefer to write [][]byte (because it does not need reallocations).
  • For even bigger I would prefer write to io.Writer (typically file). However these issue solve project like easyjson.
  • Find general solution for all this cases are probably impossible and this proposal solves only limited part of possible cases.
@mvdan

This comment has been minimized.

Copy link
Member

commented Oct 13, 2019

I'm surprised that this already has over thirty thumbs up after being posted just over a week ago. Have there been discussions about this proposal elsewhere?

While I agree that the current interface has some performance limitations, I'm not convinced that this is a change we want to make right now. It would make the json package significantly more complex, and there are other aspects to performance, such as the validation and compaction that @wI2L brings up.

There are a large number of proposals that want to significantly change the json package. If we accepted even a small portion of them, the package's API would quickly double in size and become unmanageable. This is why many of them are under the Proposal-Hold label, to all be considered at once when it's time to design a new iteration of the package.

I think this proposal falls under the same category. The design choices that make encoding/json a bit inefficient also make it much simpler to use than the faster json encoders and decoders out there. I think that's a good tradeoff to make for the standard library.

Until large design refactors happen at some point in the future, I'd personally recommend looking at third-party json packages with performance at the center of their design. For example, there are many that use code generation.

@creker

This comment has been minimized.

Copy link

commented Oct 13, 2019

@mvdan there's a blog post about this reposted on reddit https://www.reddit.com/r/golang/comments/ddqpbt/bad_go_adventures_with_json_marshalling/ That's how I personally found out about this proposal.

@philpearl

This comment has been minimized.

Copy link
Contributor Author

commented Oct 13, 2019

Oh god someone put in on reddit. Save me from the comments!

@philpearl

This comment has been minimized.

Copy link
Contributor Author

commented Oct 13, 2019

One reason I'm pursuing this is that I've tried to change a large codebase to use a code generation based JSON library (easyjson) and there are issues with doing so, including the following.

  • you have to remember to regenerate the code all the time (yes, we've written some randomised tests for this but it's still hard to keep everything in step)
  • in places we've embedded structs, which works great with reflection based marshalling/unmarshalling, but can go disastrously wrong with embedded interfaces
  • sometimes these libraries are just wrong - so we need to test everything marshals/unmarshals via different JSON implementations and make fixes.

In my head if there are easy wins that improve the standard JSON library performance (and I'm more concerned with garbage than straight-line speed) then that's a much nicer path for folk who run into performance issues than a lift-and-shift to a whole new JSON library.

I'm also struct by the irony that the main use for custom marshallers (other than time.Time & json.RawMessage) appears to be for building nullable values without using pointers - presumably to try to avoid allocations.

I liked the idea of this change because it enables improvements without impacting the normal user much if at all. Surely not many folk write custom marshallers, and those that do can I hope deal with a little extra complication?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.