Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: json.Number accepts quoted values by default #34472

Open
vearutop opened this issue Sep 23, 2019 · 28 comments
Open

encoding/json: json.Number accepts quoted values by default #34472

vearutop opened this issue Sep 23, 2019 · 28 comments
Labels
Milestone

Comments

@vearutop
Copy link
Contributor

@vearutop vearutop commented Sep 23, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/viacheslav.poturaev/Library/Caches/go-build"
GOENV="/Users/viacheslav.poturaev/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="github.com/hellofresh"
GONOSUMDB="github.com/hellofresh"
GOOS="darwin"
GOPATH="/Users/viacheslav.poturaev/go"
GOPRIVATE="github.com/hellofresh"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/h7/7qg3nbt91bb6mgk2xtqqpwc40000gp/T/go-build850653985=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I've decoded a JSON value of "123" into json.Number.
https://play.golang.org/p/9Nwjn3rBFxI

What did you expect to see?

I've expected to see an error.

What did you see instead?

I've seen a successful result and an int64 value of 123 accessible with .Int64().

I could not find any documentation that json.Number accepts quoted string values by default and was expecting the opposite (as per "being precise" motivation expressed in #22463 (comment)).

Not sure if this is a documentation or implementation issue.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 23, 2019

Looks like the Number validation changed very recently (in CL 195045; see #14702).

Is this still reproducible using gotip?

@vearutop

This comment has been minimized.

Copy link
Contributor Author

@vearutop vearutop commented Sep 23, 2019

Same behavior with go version devel +361ab73 Mon Sep 23 05:57:54 2019 +0000 darwin/amd64.

@vearutop

This comment has been minimized.

Copy link
Contributor Author

@vearutop vearutop commented Sep 23, 2019

I think such behavior is historical, at least in go1.12 it is still same. Maybe the documentation has to be more explicit.

@bcmills bcmills removed the WaitingForInfo label Sep 23, 2019
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 23, 2019

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 23, 2019

Thanks for checking.

@bcmills bcmills added this to the Unplanned milestone Sep 23, 2019
@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Sep 23, 2019

Some changes related to json.Number from @breml were merged recently, perhaps he has thoughts.

Without looking deeply into the issue, remember that json.Number is a string type underneath, so perhaps this was meant to be supported by design.

@kentquirk

This comment has been minimized.

Copy link

@kentquirk kentquirk commented Sep 23, 2019

json.Number is a string; the Int64() and Float64() methods return an error if the conversion is invalid.

I can't think of why this would need to exist unless it were intended to work this way.

To force conformance to a specific type in JSON, you could use the type directly instead of json.Number.

I think this could be clarified in documentation but changing the behavior would cause previously-working code to break.

@vearutop

This comment has been minimized.

Copy link
Contributor Author

@vearutop vearutop commented Sep 23, 2019

@kentquirk being a string for raw JSON value is ok, being a quoted string is less ok.

As you would not expect the following code to return valid integer:

strconv.Atoi(`"123"`)

JSON.org defines values as:

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array.

Which makes an explicit difference between double-quoted string and number.

The documentation says:

A Number represents a JSON number literal.

I also think this issue in likely about clarifying documentation given the potential impact of behavior change. Though maybe it would make sense to revisit behavior for Go 2.

@breml

This comment has been minimized.

Copy link
Contributor

@breml breml commented Sep 23, 2019

I quickly checked the code and I have come to the following conclusions:

if v.Type() == numberType && (!fromQuoted || !isValidNumber(string(s))) {
	return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
}
  • I also quickly checked the test cases in decode_test.go and I could not find a test case testing this case, so I assume, it is not intended behavior.
  • Respective test cases could be easily added after the #14702 related test cases
    // #14702
    {
    in: `invalid`,
    ptr: new(Number),
    err: &SyntaxError{
    msg: "invalid character 'i' looking for beginning of value",
    Offset: 1,
    },
    },
    {
    in: `"invalid"`,
    ptr: new(Number),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(struct{ A Number }),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(struct {
    A Number `json:",string"`
    }),
    err: fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into json.Number", `invalid`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(map[string]Number),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },
@kentquirk

This comment has been minimized.

Copy link

@kentquirk kentquirk commented Sep 24, 2019

I feel like the entire point of json.Number is to deal with the complex mess of things that JS programs will do, because JS is way too casual about this stuff.

If you have JSON coming from JS that you don't control, you may see numbers encoded as strings. Worse, you may see them sometimes encoded as strings and sometimes as floats or integers, even for the same field. Your options to consume that in a Go API are:

  • treat it as json.RawMessage and then evaluate the byte stream and unmarshal it manually
  • treat is as a json.Number and use .Int64 or .Float64()

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

If you're going to insist that json.Number force the value to be not-a-string, the correct fix should be to remove it entirely, because it is otherwise useless.

@vearutop

This comment has been minimized.

Copy link
Contributor Author

@vearutop vearutop commented Sep 24, 2019

Additional purpose of json.Number is dealing with float64 53-bit precision of int64.

@kentquirk

This comment has been minimized.

Copy link

@kentquirk kentquirk commented Sep 24, 2019

I think that's a good idea but can you point me to any code that supports that? If you marshal an int64 to a string you may store a value that doesn't successfully convert to a JS Number but as far as I can tell, json.Number won't care and can't help you there.

@Gobd

This comment has been minimized.

Copy link

@Gobd Gobd commented Sep 27, 2019

I don't think this behaviour should change because I rely on it as I'm sure many other people do.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Oct 8, 2019

@Gobd unless I'm missing something, this would be trivial to fix when upgrading to Go 1.14, if the change is done in that release. The json.Number fields in question would just need to get the ,string tag option added. Adding that option shouldn't break earlier Go versions either.

Non-API breaking changes should be kept at a minimum for sure, but given that the program in question is relying on the opposite of what the documentation says, and it would have a very simple fix, I don't see why we should be conservative. Particularly when the new behavior would be more consistent.

There's also the argument that json.Number is especially useful since it allows both non-strings and strings when decoding at the same time. That was never documented and I think it was clearly a mistake. I personally think that keeping that behavior is wrong; why should json.Number allow both at the same time, while other types like int and float64 do not? It would make encoding/json more inconsistent.

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

No - the purpose of json.Number has nothing to do with decoding strings. Its sole purpose is to not lose precision and formatting when encoding/decoding. See https://codereview.appspot.com/6202068.

If it has been used by some to accept strings and non-strings quickly, that's simply using an unintended and undocumented side effect, which I'm arguing against at the start of my comment.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Oct 8, 2019

To support the idea that this was never an intended or documented use of json.Number - applying @breml's fix above doesn't break any of the existing tests or examples.

@vearutop

This comment has been minimized.

Copy link
Contributor Author

@vearutop vearutop commented Oct 8, 2019

The json.Number fields in question would just need to get the ,string tag option added.

The upgrade would be more complicated in case of non-object, e.g. "123" or ["123", 456, 789.0].

@Gobd

This comment has been minimized.

Copy link

@Gobd Gobd commented Oct 8, 2019

What benefit does @breml's fix have over documenting & adding tests to show that it supports strings & non-strings? I think that would be prefered vs changing current behaviour for what seems like no benefit.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Oct 10, 2019

@Gobd the benefit is consistency, and actually following what has been documented since the beginning, including the purpose of the string option. The more special cases the json package documents and implements, the harder it is to understand and maintain in the long run.

The upgrade would be more complicated in case of non-object

Please read my longer comment above; I mention this specific use case.

For example, here's a simple piece of code to accept both a string and a non-string for integer values (not using json.Number since that accepts both today):

https://play.golang.org/p/bQBzvSvMv1e

It requires ten more lines of code, but I think that's completely acceptable when your restriction is that you need to accept different kinds of JSON at once. And as said before, json.RawMessage or interface{} are always an option.

@puellanivis

This comment has been minimized.

Copy link

@puellanivis puellanivis commented Feb 13, 2020

This behavior has been supporting quoted numbers since before go1.0. Changing it now (regardless of how good an argument could be made for it) seems like a bad idea. Just because an argument could be made that programs that are relying upon this behavior are incorrect, or buggy, does not change that people who are relying upon this behavior will feel like there was a violation of the go1 compat guarantee because their programs that worked before have stopped working now.

I will also note that the specification behavior in the proto3 JSON mapping is to always quote int64 values, because it is actually impossible by specification to make a guarantee of more than 53-bits of accuracy with JSON. That Go’s encoding/json permits higher than 53-bits of accuracy is irrelevant to the fact that proxies or any other handling of JSON is technically by spec permitted to drop anything more than 53-bits of accuracy.

So, it remains, the only way to guarantee full int64 range and accuracy is to quote the number in a string. Making json.Number restrictive to only unquoted numbers makes supporting this method of guaranteeing numeric precision more difficult.

@mvrhov

This comment has been minimized.

Copy link

@mvrhov mvrhov commented Feb 18, 2020

This change just broke the "field":"" which previously worked and the use of json.Number was on purpose. I don't have the option to change the json input I'm getting.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Feb 18, 2020

@mvrhov there has been no change yet. If there was, this issue would not be open with the NeedsDecision label.

@puellanivis it's indeed a tough choice. If we can't come to an agreement on changing it (which seems to be the case), then it's probably too late or painful to change it. I can have a look at a documentation change for 1.15, to at least have the documentation reflect the internal behavior that too many people have been relying on already.

@breml

This comment has been minimized.

Copy link
Contributor

@breml breml commented Feb 18, 2020

@mvdan A documentation change would not be enough. If we decide to raise the status of the current behavior to "fully supported", then we need to add test cases for this as well.

In general, I am still in favor of this change, because in my opinion the ,string tag is there to support unmarshaling of quoted (numeric) values.

In regards to the go1 compat guarantee I do not agree with @puellanivis. I do not think, that this fix would be a violation, because this behavior is nowhere documented or specified. If we would treat this as a violation, we would end up in a situation, where we can never fix a bug in Go or the stdlib, because someone could potentially rely on the buggy behavior and it would break their code.

@breml

This comment has been minimized.

Copy link
Contributor

@breml breml commented Feb 18, 2020

Additionally, I would like to mention, that the documentation of type Number in the encoding/json package states the following:

A Number represents a JSON number literal.

In the JSON specification, a number literal is defined as (link):

integer fraction exponent

So the type Number is clearly specified and a JSON number literal is never quoted.

@puellanivis

This comment has been minimized.

Copy link

@puellanivis puellanivis commented Feb 19, 2020

I am not arguing that just because someone might depend upon a buggy behavior, that bugs cannot be fixed.

I am saying that changing unspecified behavior has to be done intelligently, and with respect to the users that might (accidentally) be relying upon that unspecified behavior.

Even if we “fix” all of this, or introduce a ,string tag, or some other way to continue supporting the current behavior, we should not just shove out a “your code now fails” patch all in one version. We should properly document that the behavior was unspecified, that it was never intended, and that it will stop working in the future.

And now after reading the two main standards ECMA-404, and RFC 8259, I don’t even think a change is necessary.

The ECMA-404 standard specifically says:

The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax

And RFC 8259 also only defines only a grammar, not semantics.

So, even with json.Number accepting quoted integers, this package accepts all text that conforms to the JSON grammars that both standards layout. And there are no standards pointing to any MUST semantics here, both of them only cover syntax. And a number encoded in a string is valid syntax.

@breml

This comment has been minimized.

Copy link
Contributor

@breml breml commented Feb 20, 2020

@puellanivis I am not sure, if I get your point in your last comment.

The documentation of json.Number states: "A Number represents a JSON number literal.".
A JSON number literal has a clearly defined syntax and this syntax does not include any quotes.

Of course the encoding/json package accepts all text that conforms to the JSON grammars, but if you want to decode a quoted string (maybe containing a character sequence, that is a valid JSON number literal), json.Number is not the right target type, because the documentation limits the usage of json.Number to "JSON number literals".

@puellanivis

This comment has been minimized.

Copy link

@puellanivis puellanivis commented Feb 20, 2020

Regardless of what the documentation says, it has, since before go1.0, accepted string-quoted numeric values as well. The documentation and the implementation are in disagreement. The solution is to either fix the code, or fix the documentation.

Because the JSON standard does not dictate semantics, there is no ground to stand on to say that we cannot have json.Number support quoted numeric values like it has for over 9 years.

Nothing breaks if we fix the documentation, it is a strict superset of functionality. However, many things will break if we force the code to fit the documentation.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 20, 2020

To put it a different way: we can consider this an implementation bug (“json.Number accepts quoted values when its documentation says it should only accept number literals”), or a documentation bug (“json.Number's documentation says it represents a JSON number literal, when it actually represents any reasonable JSON encoding of a number”).

@puellanivis's point, as I understand it, is that the choice between those interpretations should be informed by existing usage.

@puellanivis

This comment has been minimized.

Copy link

@puellanivis puellanivis commented Feb 20, 2020

Yes, that is way more clearly what I wanted to say.

Though, I wouldn’t say that I am firmly against changing the code to fit the documentation, but it should at least be carefully measured about what the effect either decision would lead to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.