Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: unclear documentation for how json.Unmarshal and json.Unmarshaler interact with null #48646

Open
piersy opened this issue Sep 27, 2021 · 7 comments

Comments

@piersy
Copy link

@piersy piersy commented Sep 27, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pierspowlesland/.cache/go-build"
GOENV="/home/pierspowlesland/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/pierspowlesland/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/pierspowlesland/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2076678693=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran this code - https://play.golang.org/p/dHiWmj8X7KD

Which shows json.Unmarshal overwriting a struct with nil when passed []byte("null")

What did you expect to see?

I'm not sure exactly.

The go-playground example seems to be operating correctly if looking at the documentation for json.Unmarshal

The JSON null value unmarshals into an interface, map, pointer, or slice by setting that Go value to nil. Because null is often used in JSON to mean “not present,” unmarshaling a JSON null into any other Go type has no effect on the value and produces no error.

But if looking at the documentation for json.Unmarshaller, it seems to not be operating correctly since it states that using json.Unmarshal to unmarshal []byte("null") should be a no-op.

By convention, to approximate the behavior of Unmarshal itself, Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.

What did you see instead?

json.Unmarshal overwriting a struct with nil when passed []byte("null")

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 28, 2021

You're unmarshaling into a pointer, and as per the docs you quoted, that gets set to nil when unmarshalling null. So that behavior is correct and properly documented. It doesn't really matter that the pointer is pointing to a struct; it could be pointing to a float64 and it would be the same reasoning.

The "by convention" bit is from f444b48. It does seem inconsistent at first sight, but note that it says "to approximate". My guess is that that's the best option we have - if you read the commit message, you'll see why we must always call UnmarshalJSON, even for nulls. And, to properly call UnmarshalJSON, the decoder may have to initialize a pointer to be non-nil, which already makes this case inconsistent when compared to when there's no UnmarshalJSON.

If we made the two cases explicitly consistent, then UnmarshalJSON may not be called on nulls, and RawMessage would not work.

I'm definitely open to suggestions to make the docs clearer. Though I don't think explaining all this nuance in detail in the godoc is a good idea :) cc @dsnet

@piersy
Copy link
Author

@piersy piersy commented Sep 28, 2021

Hi @mvdan, thanks for getting back to me so quickly!

I had a look at f444b48 and I think I my confusion stemmed from not having a clear understanding of the different approaches to unmarshaling null into values and pointers.

As I understand it unmarshaling null into a pointer sets that pointer to nil, whereas unmarshaling null into a value leaves the value untouched.

I think this behaviour could be made a little more explicit in the doc for UnmarshalJSON

Something like:

By convention, to approximate the behavior of Unmarshal when unmarshaling into a non pointer, Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.

In the case of a pointer Unmarshal will handle null by setting the pointer to nil and UnmarshalJSON will not be called.

Also I'm not 100% sure I follow the reasoning for raw message not working if UnmarshalJSON is not called on nulls, could you elaborate a bit?

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 28, 2021

Also I'm not 100% sure I follow the reasoning for raw message not working if UnmarshalJSON is not called on nulls, could you elaborate a bit?

RawMessage is a slice, so if the regular rules were followed, then unmarshaling null into it would simply set the slice to nil and not call UnmarshalJSON - thus making RawMessage rather pointless as a mechanism to always capture the original JSON input.

In the case of a pointer Unmarshal will handle null by setting the pointer to nil and UnmarshalJSON will not be called.

Did you verify that this is the case?

The way you modified the first sentence makes sense to me. Just beware that it's not just about pointers - the regular rules include all nilable type kinds that json knows how to work with: interfaces, maps, pointers, and slices. So it's slightly wrong if you talk about "pointer" and "non-pointer". Perhaps it would be more correct to modify the first sentence to say nilable or "pointer-like". Would like to hear @dsnet's thoughts too.

@piersy
Copy link
Author

@piersy piersy commented Sep 29, 2021

Hi @mvdan, thank you for clarifying that.

In the case of a pointer Unmarshal will handle null by setting the pointer to nil and UnmarshalJSON will not be called.

Did you verify that this is the case?

This is the code l used to verify this - https://play.golang.org/p/jD7nQmcKlwL

Thanks for pointing out that it's more than just pointer or non pointer. I actually don't think nillable can be used either, since we can see from the code example that the json.RawMessage instance r is not set to nil despite it being nilable. I believe this is because even though its underlying type is []byte, it's not a slice type, because it is a defined type (as opposed to aliased). So when unmarshaling null into it, it is not set to nil.

Given this, I think the following could be a more correct attempt to make the docs clearer.

If the Unmarshaler's type is one of interface, map, pointer or slice, Unmarshal will handle the json "null" value by setting the unmarshaler to nil. Otherwise UnmarshalJSON will be called, and to approximate the behavior of Unmarshal itself, Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.

@mknyszek mknyszek changed the title encoding/json: Confusion over json.Unmarshal/json.Unmarshaller documentation encoding/json: unclear documentation for how json.Unmarshal and json.Unmarshaler interact with null Oct 4, 2021
@mknyszek mknyszek added this to the Backlog milestone Oct 4, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 4, 2021

CC @dsnet

@dsnet
Copy link
Member

@dsnet dsnet commented Oct 4, 2021

The documented rules in Unmarshal should be treated as an ordered list, rather than a unordered set.

As such, the 4th paragraph says:

To unmarshal JSON into a value implementing the Unmarshaler interface, Unmarshal calls that value's UnmarshalJSON method, including when the input is a JSON null. Otherwise, if the value implements encoding.TextUnmarshaler and the input is a JSON quoted string, Unmarshal calls that value's UnmarshalText method with the unquoted form of the string.

While, it is the 11th paragraph that says:

The JSON null value unmarshals into an interface, map, pointer, or slice by setting that Go value to nil. Because null is often used in JSON to mean “not present,” unmarshaling a JSON null into any other Go type has no effect on the value and produces no error.

Therefore, the presence of an UnmarshalJSON method should take precedence over anything that the 11th paragraph says. The exception is for pointers, but that's also called out in paragraph 3.

I don't think the 11th paragraph should have anything about Unmarshalers mentioned in it because that case should have been specified by a preceding paragraph (i.e., paragraph 4).

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 9, 2021

@dsnet maybe we should clarify that "the following rules" apply from highest to lowest priority. I was probably reading them as if they were an unordered set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants