Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: allow per-Encoder/per-Decoder registration of marshal/unmarshal functions #5901

Open
rsc opened this issue Jul 17, 2013 · 21 comments

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Jul 17, 2013

For example, if a user wants to marshal net.IP with custom code, we should provide a way
to do that, probably a method on *Encoder. Similarly for *Decoder.

Same for encoding/xml.
@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Aug 20, 2013

Comment 1:

Labels changed: removed go1.3.

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Nov 27, 2013

Comment 2:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Dec 4, 2013

Comment 3:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Dec 4, 2013

Comment 4:

Labels changed: added repo-main.

@rsc rsc added accepted labels Dec 4, 2013
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jan 2, 2017

I did some work on this sometime back in October, with CL https://go-review.googlesource.com/c/31091 to get the conversation started.
Perhaps we can work on it for Go1.9.

In there I introduced Encoder.RegisterEncoder

func (enc *Encoder) RegisterEncoder(t reflect.Type, fn func(interface{}) ([]byte, error))

and Decoder.RegisterDecoder

func (dec *Decoder) RegisterDecoder(t reflect.Type, fn func([]byte) (interface{}, error))
@sebastien-rosset

This comment has been minimized.

Copy link

@sebastien-rosset sebastien-rosset commented Sep 14, 2017

We have a similar requirement for custom serialization. Our specific use cases are:

  1. Data anonymization. For example, omit IP addresses in the JSON output.
  2. Data masking. For example, omit specific fields depending on user privileges.
  3. Omit empty values (e.g. empty strings). This is useful to generate compact documents.
    In all three cases, the decision to omit fields is done at runtime with contextual information.

We are using CiscoM31@1e9514f. In our case, the client interface is implemented by doing a simple lookup in a map, so there is no need to register hundreds of custom marshaller (we have lots of structs).
We could have used a technique similar to https://go-review.googlesource.com/c/31091.

@harrisonhjones

This comment has been minimized.

Copy link

@harrisonhjones harrisonhjones commented Dec 16, 2019

@rsc is there still interest in this feature on your end? I have a use-case for it and would be happy to implement it.

[Edit 1] Spelling
[Edit 2]

I wonder if the function signature should be

func (enc *Encoder) RegisterMarshaller(t reflect.Type, f func(reflect.Value) ([]byte, error))

with all standard Encoders exposed so that users can leverage them. Example redactor:

package main

import (
	"bytes"
	"encoding/json"
	"os"
	"reflect"
	"strings"
	"unicode/utf8"
)

func main() {
	enc := json.NewEncoder(os.Stdout)
	text := "My password, foo, is totally secure"

	enc.RegisterMarshaller(reflect.TypeOf(""), StringMarshaller)
	enc.Encode(text)
	// Output
	// "My password, foo, is totally secure"

	enc.RegisterMarshaller(reflect.TypeOf(""), func(value reflect.Value) ([]byte, error) {
		return StringMarshaller(reflect.ValueOf(strings.Replace(value.String(), "foo", "[REDACTED]", -1)))
	})
	enc.Encode(text)
	// Output
	// "My password, [REDACTED], is totally secure"
}

// Largely taken from `func (e *encodeState) `string(s string, escapeHTML bool)` in `encoding/json/encode.go`
// This would exist in encoding/json.
func StringMarshaller(value reflect.Value) ([]byte, error) {
	e := bytes.Buffer{}
	s := value.String()
	escapeHTML := false // TODO: Refactor StringEncoder into a 'htmlEscaping' one and a non 'htmlEscaping' one.

	e.WriteByte('"')
	start := 0
	for i := 0; i < len(s); {
		if b := s[i]; b < utf8.RuneSelf {
			if json.HTMLSafeSet[b] || (!escapeHTML && json.SafeSet[b]) {
				i++
				continue
			}
			if start < i {
				e.WriteString(s[start:i])
			}
			e.WriteByte('\\')
			switch b {
			case '\\', '"':
				e.WriteByte(b)
			case '\n':
				e.WriteByte('n')
			case '\r':
				e.WriteByte('r')
			case '\t':
				e.WriteByte('t')
			default:
				// This encodes bytes < 0x20 except for \t, \n and \r.
				// If escapeHTML is set, it also escapes <, >, and &
				// because they can lead to security holes when
				// user-controlled strings are rendered into JSON
				// and served to some browsers.
				e.WriteString(`u00`)
				e.WriteByte(json.Hex[b>>4])
				e.WriteByte(json.Hex[b&0xF])
			}
			i++
			start = i
			continue
		}
		c, size := utf8.DecodeRuneInString(s[i:])
		if c == utf8.RuneError && size == 1 {
			if start < i {
				e.WriteString(s[start:i])
			}
			e.WriteString(`\ufffd`)
			i += size
			start = i
			continue
		}
		// U+2028 is LINE SEPARATOR.
		// U+2029 is PARAGRAPH SEPARATOR.
		// They are both technically valid characters in JSON strings,
		// but don't work in JSONP, which has to be evaluated as JavaScript,
		// and can lead to security holes there. It is valid JSON to
		// escape them, so we do so unconditionally.
		// See http://timelessrepo.com/json-isnt-a-javascript-subset for discussion.
		if c == '\u2028' || c == '\u2029' {
			if start < i {
				e.WriteString(s[start:i])
			}
			e.WriteString(`\u202`)
			e.WriteByte(json.Hex[c&0xF])
			i += size
			start = i
			continue
		}
		i += size
	}
	if start < len(s) {
		e.WriteString(s[start:])
	}
	e.WriteByte('"')

	return e.Bytes(), nil
}
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Dec 18, 2019

I have a use-case for this as well, but it's a bit specialized. Essentially:

  1. There is a type that I own (as a library author) that is used by many users where they pass it through encoding/json when they shouldn't.
  2. I would like to make changes to my type, but it breaks these users. I have no intention to support the use of this type with encoding/json, but I can't just break these users since there are are sufficient number of them.
  3. I would like to use the feature proposed here to migrate these users to an implementation of marshal/unmarshal that preserves the exact behavior as it exists today (even if its buggy and wrong).
  4. This now frees me to make the changes I want to make.

For prior art, the cmp package has a Comparer option that allows the caller to override the comparison of any specific type in the tree. The ability for users to specify custom comparisons has proven itself to be immensely powerful and flexible.

Even though I'd like to have something like this, I do have some concerns:

  • It is already the case that users often complain that encoding/json is too slow. The addition of this feature may make things slower. For cmp, the need to check whether any given value node matches a custom comparer is a significant source of slow down.
  • A decision needs to be made whether the type override only supports concrete types or interfaces. Concrete types are easier to implement efficiently. However, supporting interfaces provides significant flexibility, but also brings in significant implementation and performance costs.
  • What's the expected behavior when json.Marshal encounters a type T, and the type override has a custom marshaler specified for *T? If the value is addressable, then it makes sense to address the value and call the override function. What if the value is not addressable?
@harrisonhjones

This comment has been minimized.

Copy link

@harrisonhjones harrisonhjones commented Dec 18, 2019

Thanks for the response! Some comments below:

It is already the case that users often complain that encoding/json is too slow. The addition of this feature may make things slower. For cmp, the need to check whether any given value node matches a custom comparer is a significant source of slow down.

I'd have to run a benchmark but I would expect a map lookup to be fairly quick. Perhaps others are concerned with a different scale of "slow" than I.

A decision needs to be made whether the type override only supports concrete types or interfaces. Concrete types are easier to implement efficiently. However, supporting interfaces provides significant flexibility, but also brings in significant implementation and performance costs.

Interesting. I hadn't considered supporting interfaces. At the moment I only need concrete type overrides but if there are users out there that would benefit from an interface check I'd be willing to at least prototype it.

[Edit] Perhaps we could rename RegisterMarshaller to RegisterConcreteMarshaller and introduce RegisterInterfaceMarshaller later if users needed it?

What's the expected behavior when json.Marshal encounters a type T, and the type override has a custom marshaler specified for *T? If the value is addressable, then it makes sense to address the value and call the override function. What if the value is not addressable?

This actually came up in a discussion with a colleague. My preference would be to require users that want to custom marshal both T and *T to have to declare both marshaler overrides. You could use a single custom marshaler but would require both overrides.

...
var foo = ""
enc.RegisterMarshaller(reflect.TypeOf(foo), StringMarshaller)
enc.RegisterMarshaller(reflect.TypeOf(&foo), StringMarshaller)
...

func StringMarshaller(value reflect.Value) ([]byte, error) {
    ... check if value is a *String, if so deref ...
    ...
}

Alternately perhaps we modify the signiture of RegisterMarshaller to be something like:

func (enc *Encoder) RegisterMarshaller(reflect.Type, bool, func(reflect.Value) ([]byte, error))

where, if the bool was "true" it would match both T and *T and if it was false it would only match T or *T, depending on the passed in type. Looks a bit yucky to me though could be made to look better if we used the optional pattern.

@harrisonhjones

This comment has been minimized.

Copy link

@harrisonhjones harrisonhjones commented Dec 30, 2019

@dsnet I imagine you might be busy with the holidays but I wanted to give you a friendly ping on this. Any thoughts on my response?

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 31, 2019

Change https://golang.org/cl/212998 mentions this issue: encoding/json: implement type override for serialization

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Dec 31, 2019

While the logic certainly uses reflect under the hood, I'm not sure if we should expose that in the public API. I propose the following API instead:

// RegisterFunc registers a custom encoder to use for specialized types.
// The input f must be a function of the type func(T) ([]byte, error).
//
// When marshaling a value of type R, the function f is called
// if R is identical to T for concrete types or
// if R implements T for interface types.
// Precedence is given to registered encoders that operate on concrete types,
// then registered encoders that operate on interface types
// in the order that they are registered, then the MarshalJSON method, and
// lastly the default behavior of Encode.
//
// It panics if T is already registered or if interface{} is assignable to T.
func (e *Encoder) RegisterFunc(f interface{})

// RegisterFunc registers a custom decoder to use for specialized types.
// The input f must be a function of the type func([]byte, T) error.
//
// When unmarshaling a value of type R, the function f is called
// if R is identical to T for concrete types or
// if R implements T for interface types.
// Precedence is given to registered decoders that operate on concrete types,
// then registered decoders that operate on interface types
// in the order that they are registered, then the UnmarshalJSON method, and
// lastly the default behavior of Decode.
//
// It panics if T is already registered or if interface{} is assignable to T.
func (d *Decoder) RegisterFunc(f interface{})

Arguments for this API:

  • Most users of this API probably do not want to deal with reflect.Type and reflect.Value, but rather want to deal with concrete types. For this reason, json.Unmarshal takes in an interface{} rather than an reflect.Value.
  • Using interface{} looses type safety, but the other proposed signatures also have type safety issues. For a signature like: func (enc *Encoder) RegisterMarshaller(t reflect.Type, f func(reflect.Value) ([]byte, error)), we still can't statically guarantee that t is the same type as the type that function f expects as it's input.
  • It matches what's done in other API that also have type-safety issues. For example, sort.Slices takes in an interface{} with the requirement that it be a slice kind, rather than a reflect.Value.

The proposed API combined with the ability to handle interfaces, allows you to do something like:

e := json.NewEncoder(w)
e.RegisterFunc(protojson.Marshal)
e.Encode(v)

where protojson.Marshal is a function that matches the expected function signature. It enables the standard encoding/json package to now be able to properly serialize all proto.Message types without forcing or expecting every concrete proto.Message type to implement the json.Marshaler interface.

Since this is something I need for my other work, I uploaded a CL with my prototype implementation that I've been testing with actual code.

Some additional thoughts:

  • One concern is that json.Decoder.DisallowUnknownFields specifies a top-level option that should in theory affect the results of all unmarshal operations recursively. It is already a problem today that custom json.Unmarshaler implementations do not properly respect this option (and has actually been a real problem). It's unfortunate that we're adding another way where options are not properly propagated downward. However, the severity of the problem is not as bad as custom json.Unmarshaler implementations since the location where options are specified is most likely also the place where custom encoder functions are registered, so options can be properly replicated at the top-level.
  • The implementation checks for custom functions to handle the current value type first, and if the value is addressable, also tries checking for a pointer to that value type. This matches the behavior of encoding/json for how it handles checking for json.Marshaler and json.Unmarshaler implementations.
  • The implementation does not call decoder functions if the receiver type is a pointer/interface and the input JSON is null. It also does not call encoder functions if the receiver type is a pointer/interface and is nil. This matches the behavior of encoding/json for how it handles json.Marshaler and json.Unmarshaler implementations.
  • I haven't benchmarked it yet, but expect decoding to take minimal performance hit and encoding to take no performance hit when there are no custom encoders/decoders.
  • Generics might enable greater type safety for the proposed API, but the current proposal for generics does not allow type parameterization at the granularity of individual methods.
@dsnet dsnet changed the title encoding/json: allow override type marshaling proposal: encoding/json: allow override type marshaling Dec 31, 2019
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Dec 31, 2019

Adding Proposal label as there are at least two proposed APIs and this issue proposes added API to encoding/json.

@dmitshur dmitshur removed this from the Unplanned milestone Jan 1, 2020
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Jan 3, 2020

This proposal can be used to give users to ability to address the following issues on their own:

  • #10275: encoding: make stdlib types implement TextMarshaler/BinaryMarshaler consistently
  • #21990: encoding/json: support struct tag for time.Format in JSON Marshaller/Unmarshaller
  • #29678: net: add MarshalText/UnmarshalText to HardwareAddr
  • #33564: crypto/ecdsa: make PublicKey implement json.Unmarshaler and json.Marshaler

Overall, I've been concerned with the proliferation of magic methods where more and more types have MarshalJSON, UnmarshalJSON, MarshalText, or UnmarshalText methods attached. As the maintainer of a number of widely used libraries, I've been pushing back against users who ask for these methods to be added. A type override gives users the flexibility to do what they want, rather than forcing library authors to be responsible for maintaining these methods.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jan 15, 2020
@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Jan 16, 2020

Why is it RegisterFunc and not Register?

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Jan 22, 2020

Overall this seems reasonable to me. I would push back a little about not needing, say, MarshalText on types. It's good if types can create their own good representations by default when that makes sense. I would hope the per-Encoder registration would be necessary for exceptions, not common cases.

For example even if we had this registration it would still seem good to accept proposals like #33564.

@rsc rsc changed the title proposal: encoding/json: allow override type marshaling proposal: encoding/json: allow per-Encoder/per-Decoder registration of marshal/unmarshal functions Jan 22, 2020
@rsc rsc moved this from Incoming to Active in Proposals Jan 22, 2020
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Jan 31, 2020

Why is it RegisterFunc and not Register?

Could be either. I don't feel strong about this name.

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Feb 5, 2020

Let's use json.Register like gob.Register (different kind of argument but same idea).
Otherwise this seems like a likely accept.

@rsc rsc moved this from Active to Likely Accept in Proposals Feb 5, 2020
@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Feb 5, 2020

To clarify, are you suggesting that the type-override registration be a top-level Register function or are you merely suggesting that the Decoder and Encoder methods be named Register out of consistency with gob.Register?

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Feb 12, 2020

@dsnet, only that the method on Decoder/Encoder be called Register.
It will be clear at call sites that a func is being passed; it doesn't add much to call it RegisterFunc.

@rsc

This comment has been minimized.

Copy link
Contributor Author

@rsc rsc commented Feb 12, 2020

No change in consensus, so accepting.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Feb 12, 2020
@rsc rsc changed the title proposal: encoding/json: allow per-Encoder/per-Decoder registration of marshal/unmarshal functions encoding/json: allow per-Encoder/per-Decoder registration of marshal/unmarshal functions Feb 12, 2020
@rsc rsc modified the milestones: Proposal, Backlog Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.