Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: add omitzero option #45669

Open
palsivertsen opened this issue Apr 21, 2021 · 25 comments
Open

proposal: encoding/json: add omitzero option #45669

palsivertsen opened this issue Apr 21, 2021 · 25 comments

Comments

@palsivertsen
Copy link

palsivertsen commented Apr 21, 2021

The omitempty json tag is kind of confusing to use when working with nested structs. The following example illustrates the most basic case using an empty struct for argument's sake.

type Foo struct {
	EmptyStruct struct{} `json:",omitempty"`
}

The "EmptyStruct" field is a struct without any fields and can be empty, that is equal to it's zero value. But when I try to marshal it to json the field is still included in the resulting json object. Reading the encoding/json documentation about the definition of empty it does not mention empty structs:

The "omitempty" option specifies that the field should be omitted from the encoding if the field has an empty value, defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string.

It feels weird that adding the omitempty tag to struct fields is allowed if it will never have the desired effect. If structs will never be considered empty shouldn't there at least be a compiler warning when adding this tag to a struct field?

Working with json time.Time

This behavior causes some confusion when working with time.Time in json structures. Go's zero values for primitive types are fairly reasonable and "guessable" from a non-gopher point of view. But the zero value for time.Time, January 1, year 1, 00:00:00.000000000 UTC, is less common. A more(?) common zero value for time is January 01, 1970 00:00:00 UTC. To avoid confusion when working with json outside the world of go it would be nice to have a way to omit zero value dates.

A commonly suggested workaround to this problem is to use pointers, but pointers might be undesirable for a number of reasons. They are for example cumbersome to assign values to:

type Foo struct {
	T *time.Time
}
_ = Foo{
	T: &time.Now(), // Compile error
}
_ = Foo{
	T: &[]time.Time{time.Now()}[0], // Weird workaround
}

The time.Time documentation also recommends to pass the type as value, not pointer since some methods are not concurrency-safe:

Programs using times should typically store and pass them as values, not pointers. That is, time variables and struct fields should be of type time.Time, not *time.Time.

A Time value can be used by multiple goroutines simultaneously except that the methods GobDecode, UnmarshalBinary, UnmarshalJSON and UnmarshalText are not concurrency-safe.

This playground example illustrates three different uses of empty structs where I'd expect the fields to be excluded from the resulting json. Note that the time.Time type has no exposed fields and uses the time.Time.NarshalJSON() function to marshal itself into json.

Solution

Would it make sense to add a new tag, like omitzero, that also excludes structs?

There's a similar proposal in #11939, but that changes the definition of empty in the omitempty documentation.

@mvdan
Copy link
Member

mvdan commented Apr 21, 2021

Pretty similar to #22480. I think this needs to be put on hold along with #11939. We are aware that the API of encoding/json has issues, and we're working on it.

@seankhliao seankhliao changed the title Omit zero values when encoding json propsoal: encoding/json: add omitzero option Apr 22, 2021
@seankhliao seankhliao changed the title propsoal: encoding/json: add omitzero option proposal: encoding/json: add omitzero option Apr 22, 2021
@gopherbot gopherbot added this to the Proposal milestone Apr 22, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 11, 2021
@rsc
Copy link
Contributor

rsc commented May 12, 2021

Hold for #22480.

@rsc rsc moved this from Incoming to Hold in Proposals (old) May 12, 2021
@rsc
Copy link
Contributor

rsc commented May 12, 2021

Placed on hold.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jun 1, 2022

Taking off hold after closing #52803 as a duplicate of this one.

The big question is if we add ,omitzero do we also add a method to let "am I zero" be user-controllable?

Or perhaps we should allow MarshalJSON to return nil as in #50480.

@rsc rsc moved this from Hold to Active in Proposals (old) Jun 1, 2022
@rsc
Copy link
Contributor

rsc commented Jun 1, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@smikulcik
Copy link

smikulcik commented Jun 1, 2022

Personally I don't like baking the omission policy into struct tags. Things like omitempty, omitnil, omitzero are all limited by that one policy. I kinda liked my proposal for #52803 since it allowed that extra flexibility.

the IsEmpty() proposal also adds this flexibility, but it seems odd to further the abstraction of what it means for a property to be "empty". Arn't we just trying to determine if we want to omit a field or not?

@icholy
Copy link

icholy commented Jun 2, 2022

Or perhaps we should allow MarshalJSON to return nil as in #50480.

Say we go down that path and I write a type like this:

type Null[T any] struct {
	Valid bool
	Value T
}

func (n Null[T]) MarshalJSON() ([]byte, error) {
	if !n.Valid {
		return nil, nil
	}
	return json.Marshal(n.Valid)
}

Now this type can only be used as an omitzero field.

@rsc

@joeshaw
Copy link
Contributor

joeshaw commented Jun 2, 2022

@icholy I believe the return-nil option would be instead of omitzero, not in addition to it.

@icholy
Copy link

icholy commented Jun 2, 2022

@joeshaw having a nil return result in a null in non omitzero or omitempty fields would change the behavior of existing code.

@joeshaw
Copy link
Contributor

joeshaw commented Jun 2, 2022

@icholy Returning nil would not result in null in the JSON (that is done by returning []byte("null")), it would cause the field to be omitted entirely, under the proposal from #50480. Returning nil today causes invalid JSON to be produced.

However, @dsnet points out in #52803 (comment) that nil as a return value might cause existing code which calls MarshalJSON to break because it is not expecting this behavior.

@icholy
Copy link

icholy commented Jun 2, 2022

@joeshaw I want to have a general purpose Null[T] type that will be omitted in the case of omitempty or omitzero and will result in null otherwise.

@mitar
Copy link
Contributor

mitar commented Jun 5, 2022

@icholy So you return string null when you want JSON null, or nil when you want to omit it?

@icholy
Copy link

icholy commented Jun 5, 2022

@mitar my json.Marshaler implementation has no way of knowing if it's being used in an omitempty field.

@mitar
Copy link
Contributor

mitar commented Jun 5, 2022

True. I think this is a broader issue: MarshalJSON not being able to known tags on the field. So I think this is a different issue than this one: how could MarshalJSON know those tags. Maybe something like what Format has (e.g., fmt.State or verb rune). But currently MarshalJSON always overrides and ignores tags. There are simply two ways to "configure" JSON marshaling: or through imperative code (by implementing MarshalJSON) or declaratively through tags. Sadly, those two word do not work together and even more: you cannot implement in MarshalJSON same things you can through tags (see #50480).

@rsc rsc removed the Proposal-Hold label Jun 8, 2022
@inteon
Copy link

inteon commented Jun 14, 2022

  1. Lets start by agreeing on a definition for empty vs zero, here is my proposal:

    • empty JSON-values: false, 0, "", null, [], {}
    • zero JSON-values: empty JSON-values and objects containing only empty JSON-values eg. {a: null, b: 0}
  2. In case we can agree on the definition in 1, we notice that the current JSON encoder implementation is not conforming this definition.
    Since structs that encode to {} are not skipped for fields with the omitempty tag.
    Now we have to decide if it is desirable to fix this: is this a bug or a feature 😉?
    The https://github.com/go-json-experiment/json project seems to acknowledge this error and fixes this issue.

  3. The omitempty tag is not compatible with a custom MarshalJSON function.
    If MarshalJSON returns null, this value is not omitted.
    In my opinion this is a bug, a field tagged with omitempty should never result in e.g. a null JSON-value.
    .
    Letting MarshalJSON return nil could be an option to generally omit a value, however this is not conditional on whether the omitempty tag is present or not.

  4. Finally, many developers want an option to indicate that a custom object is empty.

    • eg. value true instead of false should be considered empty for your usecase and should be omitted
    • eg. the empty value for a time.Time value should be considered as empty
  5. Other proposals like adding omitzero or an extra ShouldOmit() interface are not discussed here, since I think they are no bugs and less plausible to be implemented.


Bugs in 2 and 3 can be fixed as follows:
encode.go - line 750:

// fast path omitEmpty
if f.omitEmpty && isEmptyValue(fv) {
	continue
}
omitEmptyResetLocation := e.Len()

e.WriteByte(next)
if opts.escapeHTML {
	e.WriteString(f.nameEscHTML)
} else {
	e.WriteString(f.nameNonEsc)
}
opts.quoted = f.quoted

startLen := e.Len()
f.encoder(e, fv, opts)
newLen := e.Len()

if f.omitEmpty && (newLen-startLen) <= 5 {
	// using `Next` and `bytes.NewBuffer` we can modify the end of the
	// underlying slice efficiently (without doing any copying)
	fullBuf := e.Next(newLen) // extract underlying slice from buffer
	switch string(fullBuf[startLen:newLen]) {
	case "false", "0", "\"\"", "null", "[]":
		// reconstruct buffer without zero value
		e.Buffer = *bytes.NewBuffer(fullBuf[:omitEmptyResetLocation])
		continue
	default:
		// reconstruct original buffer
		e.Buffer = *bytes.NewBuffer(fullBuf)
	}
}
next = ','

Feature 4 can be added as follows:
encode.go - line 341:

// Emptyable is the interface implemented by types that
// can provide a function to determine if they are empty.
type Emptyable interface {
	IsEmpty() bool
}

var emptyableType = reflect.TypeOf((*Emptyable)(nil)).Elem()

func isEmptyValue(v reflect.Value) bool {
	// first check via the type if `Emptyable` is implemented
	// this way, we can prevent the more expensive `Interface`
	// conversion if not required
	if v.Type().Implements(emptyableType) {
		if v, ok := v.Interface().(Emptyable); ok {
			return v.IsEmpty()
		}
	}

	switch v.Kind() {
	case reflect.Array, reflect.Map, reflect.Slice, reflect.String:
		return v.Len() == 0
	case reflect.Bool:
		return !v.Bool()
	case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
		return v.Int() == 0
	case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
		return v.Uint() == 0
	case reflect.Float32, reflect.Float64:
		return v.Float() == 0
	case reflect.Interface, reflect.Ptr:
		return v.IsNil()
	}
	return false
}

NOTE: Instead of adding a Emptyable interface, the fix for 2 and 3 could be combined with a MarshalJSON function that returns an empty value. Meaning that if we accept the proposed fix for 2 and 3, we also fix 4.

NOTE2 (edit): by replacing case "false", "0", "\"\"", "null", "[]", "{}": with case "false", "0", "\"\"", "null", "[]":; the solution for 2 and 3 becomes a solution for 3 only, this makes it maybe easier to get accepted? -> I'll probably create a PR for this change to gather some extra feedback.

@joeshaw I tried to make this analysis as complete as possible. Please let me know in case I missed something.

@rsc
Copy link
Contributor

rsc commented Jun 15, 2022

/cc @dsnet for any thoughts about how we might want to proceed here.

@inteon
Copy link

inteon commented Jun 17, 2022

@rcs @joeshaw @dsnet #53425 is an implementation that solves 'bug' 3 (as described above).
As mentioned in https://go-review.googlesource.com/c/go/+/412856, it currently awaits approval/ disapproval.
Any ideas on how to get this approved?

@dsnet
Copy link
Member

dsnet commented Jun 17, 2022

@inteon, the difficult lies not in the implementation but
in making sure that we have semantics that works well with the entire ecosystem.
I think it would be helpful to avoid going too much in the weeds of what the implementation would look like.
We should discuss implementation details at high-level to the degree that:

  1. we know it is actually possible to implement (e.g., that it depends on functionality that Go reflection provides), or
  2. what general performance implications a given design will have (e.g., that it won't require a two-pass scan of the value).

Let's step back and ignore how the json package currently omits object members.

In a brand new world, how would we like object members to be omitted?

  • Should it be omitted based on the Go type system?
    • Perhaps based on the Go value itself where certain Go values
      are defined as being omitted (e.g., false, MyStruct{}, etc.)?
    • Perhaps based on some special method on the Go value (e.g., IsZero or ShouldOmit)?
  • Should it be omitted based on the JSON type system?
    • Perhaps based on whether the struct field value encodes as certain JSON values
      which are defined as being omitted (e.g., null, "", {}, etc.)?
    • Customized encoding (e.g., via MarshalJSON) would affect whether
      the JSON value is to be treated as being omitted.

Which is correct?
Both have legitimate usages and neither covers all of the common use-cases by itself.
Therefore, I believe we should support both.

Omit based on Go value

Let's suppose there was a omitgozero tag that omits a Go struct field
if the field is the zero value (per reflect.Value.IsZero).
If the field implements an IsZero() bool method,
then that method is called instead to determine whether to omit the field.

Properties of this approach:

  • A "zero value" is well-defined in the Go language.
    • In contrast, today's definition of being a Go
      "false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string"
      is harder to remember.
    • Also, today's definition does not handle several common use-cases, such as:
      • Being able to omit a nil slice, but not an empty, non-nil slice.
        • See "omit based on JSON value" for omitting empty slices (regardless of nil-ness).
      • Being able to omit the zero value of a Go struct or array.
    • This works well with types like netip.Addr where the zero value is the only invalid value.
  • In general, the "zero Go value" definition has the nice property that
    marshaling a Go struct will emit only the non-zero fields such that unmarshaling the JSON output
    back into a Go struct will generally return the same result (per reflect.DeepEqual)
  • Checking for the "zero value" has first-class support in Go reflection via reflect.Value.IsZero.
  • Because it has first-class support, it can be optimized in ways that the json package simply cannot hope to achieve.
    For example, this optimization makes reflect.Value.IsZero an order of magnitude faster for comparable types.
    • Alternative definitions that exclude unexported fields are both more complicated to explain and
      not as performant since json would have to implement its own recursive walk through a struct.
      • Attempts to exclude unexported fields or non-JSON serializable fields seems to actually want
        omission to be defined in terms of the JSON value (see below) and not the Go value.
      • It cannot leverage the performance of reflect.Value.IsZero.
  • Respecting a custom IsZero method allows users to provide their own custom definition of "zero".
    • This works well with time.Time.IsZero which handles cases of zero time beyond just being time.Time{}.

Omit based on JSON value

Let's suppose there was a omitjsonempty tag that omits a Go struct field
if it would have encoded as an empty JSON value,
which we define as being a JSON null, "", {}, or [].

Properties of this approach:

  • The application of this is recursively effective:
    • For example consider the following:
      type MyStruct struct {
          XXX string         `json:"-"`
          Foo string         `json:",omitgozero"`
          Bar *MyStruct      `json:",omitjsonempty"`
          Baz []int          `json:",omitjsonempty"`
          Raw json.RawMessge `json:",omitjsonempty"`
      }
      json.Marshal(MyStruct{
          XXX: "ignored",               // this is omitted since it is ignored
          Bar: &MyStruct{
              Baz: []int{},             // this is omitted since is encodes as `[]`
          },                            // this is omitted since it encodes as `{}`
          Raw: json.RawMessage("null"), // this is omitted since it encodes as `null`
      })                                // thus, this just ouputs `{}`
    • This allows omitjsonempty to handle cases that omitgozero cannot.
  • Whether a field value is to be omitted is dependent on any custom JSON
    serialization that would occur (e.g., via MarshalJSON).
  • We do not support omitting arbitrary JSON values.
    For our definition, every empty JSON value has a fixed width (at most 4 bytes).
    This is necessary for when we implement true streaming
    where it is easier and faster to ensure that we never flush the empty JSON value
    so that the encoder can unwrite it if necessary.
  • It might seem slow to encode a JSON value only to unwrite it.
    In many cases, the internal implementation can short-circuit the actual
    encoding if it knows that it would have been an empty JSON value.
  • Our definition of an empty JSON value does not include 0 since it is
    ill-defined whether -0, 0e123456, 0.000000, -0.000e+999 are included.
    Also, 0 is not what most users think of as being "empty".

Composability

Both omitgozero and omitjsonempty would be composable.
You can specify both if you want, where the effect is the logical OR of the two.
That is, if the object member would be omitted under
either omitgozero or omitjsonempty then it is omitted.

In many cases, the behavior of omitgozero and omitjsonempty will be identical.
Users should use omitgozero when both have the same effect.

Compatability

The existing semantics of omitempty cannot be changed because we are bound
by the Go 1 compatability guarantees.

We would need to do one of the following:

  1. Give new semantics to a new struct tag (e.g., omitjsonempty), or
  2. Have omitempty respect a new method that does appear by accident
    in existing code (e.g., OmitJSONField() bool), or
  3. Have a new major version of the json package.

Personally, I like the names of omitzero and omitempty since they are shorter,
but changing the semantics of omitempty in the current json package is not acceptable,
which suggest that we may need new names (unless we have a new major-version of the json package).

@inteon
Copy link

inteon commented Jun 18, 2022

@dsnet Thank you for your extended answer

Ok, so 'fixing' omitempty combined with MarshalJSON is considered a breaking change.
I guess adding support for IsZero or IsEmpty would also be considered breaking, since these methods could appear by accident in existing code.
-> this insight can already result in closing a lot of issues/ PRs on this repo

I generally agree with your assessment, but I think that it is important to note that even when using a new major version of the json package, developers would have much more trouble changing all json tags. e.g. because they embed a struct from a library. Therefore, I'm not really keen on the idea of changing the tag names or adding tags... 🤔
My preference here would be to keep omitempty but change its meaning to the meaning you described for omitjsonempty.

Am I correct to assume that all work on a possible json v2 is done here: https://github.com/go-json-experiment/json (looks very promising)? Could you give a timeline on when you would be open to receive feedback/ PRs on this future work?

@icholy
Copy link

icholy commented Jun 18, 2022

I'd prefer a omitzero with omitgozero semantics. Then an omitnull which omits any value which would have marshalled to null.

@dsnet
Copy link
Member

dsnet commented Jun 18, 2022

I think that it is important to note that even when using a new major version of the json package, developers would have much more trouble changing all json tags. e.g. because they embed a struct from a library.

Agreed. This is something I'm concerned about.

It's unclear whether this is an advantage or disadvantage, but if the new "json" package redefines the meaning of omitempty to be omitjsonempty, existing usages will behave the same for most cases (excluding numeric and boolean types; perhaps we should fold false and 0 into the definition of omitjsonempty? 🤷‍♂️).

Am I correct to assume that all work on a possible json v2 is done here: https://github.com/go-json-experiment/json (looks very promising)? Could you give a timeline on when you would be open to receive feedback/ PRs on this future work?

This is just an experimental project worked on by @dsnet, @mvdan, @johanbrandhorst, @rogpeppe, and @ChrisHines. We are not operating in any official capacity (other than @rsc being aware of the work). This work will be presented in the future as a Go proposal when it's ready. Even then, the possible results are highly varied.

The project started off somewhat ad-hoc when @mvdan tweeted his thoughts about a hypothetical v2 "json" package. From the beginning, the project had an ambitious scope trying to address most of the open issues with the JSON package in a unified way such that the features are mostly orthogonal and don't interact poorly with each other.

Progress has been slow since none of us work on Go full-time, and contribute on the side. For myself, @tailscale has been very generous in giving me the freedom to work on this aside from my other duties there.

Of relevance to this issue, go-json-experiment currently implements what I described above as omitgozero and omitjsonempty as just omitzero and omitempty.

@rsc
Copy link
Contributor

rsc commented Jul 13, 2022

Met with @dsnet and others yesterday. Let's put this on hold until we figure out what to do with their work.

@rsc
Copy link
Contributor

rsc commented Jul 13, 2022

Placed on hold.
— rsc for the proposal review group

@rsc rsc moved this from Active to Hold in Proposals (old) Jul 13, 2022
@marwatk
Copy link
Contributor

marwatk commented Jul 14, 2022

As someone currently vendoring encoding/xml for this reason I'd like to point out this discussion applies equally there.

I'd also like to echo @smikulcik and say that while new tags like omitgozero, omitjsonempty or omitzero would help they will be inevitably abused and make code less clear in situations where the real goal is omitting a field in some non-zero case.

For example, only allowing omission based on predetermined reasons will force a dev to essentially lie in IsZero:

type EphemeralTime time.Time

type SomeType struct {
  // ValidUntil is only present while the thing is valid
  ValidUntil EphemeralTime `json:"valid_until,omitzero"`
}

func (t EphemeralTime) IsZero() {
  if t.Now().After(t) {
    return true // Lie just to get it omitted
  }
  // Actual zero check
}

While that's contrived for demonstration, there are other use-cases like non-zero defaults you may want to exclude.

If, instead, there was some mechanism for omitfunc either via tag or predefined interface (e.g. OmitJSON) the user is allowed to control when to omit, which is the real goal and avoids another proposal after this one for the next reason someone needs to omit a field.

@lucastheisen
Copy link

lucastheisen commented Jul 14, 2022

I agree with @marwatk . It is not an uncommon case. Think of serializing an application configuration file. That configuration is likely full of non-zero default values. But when serializing that configuration i would not want to include everything, only the customized fields. It is much easier to maintain that way knowing that only the things i customize are listed. The use case here would be a configuration front end that reads in current configuration, then merges in your changes (from say a web page) then serializes that out to the configuration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests