-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: encoding/json: add a function to report whether a []byte represents a JSON array before unmarshal #53170
Comments
Running a parser/validator over the entire input just to check that it's a valid json array, only to then do it again but to unmarshal it doesn't seem to be very efficient. If this is to be used in the context of a |
good point! I agree that just mentioning as "unsafe" is a weak argument, not to mention it is imprecise. What I meant by "unsafe" is in the sense that, IMHO, implementing this kind of verification by hand is "less safe" (in terms of reliable) when compared to utilizing a peer-reviewed native feature of the standard library. Also, could you elaborate on the usage of a The whole point of utilizing the |
That's my point @carlmjohnson. With a With the proposed function, that method would be something like this: func (nn *names) UnmarshalJSON(b []byte) error {
if json.Array(b) {
if err := json.Unmarshal(b, (*[]named)(nn)); err != nil {
return err
}
} else {
var n named
if err := json.Unmarshal(b, &n); err != nil {
return err
}
*nn = names{n}
}
return nil
} Which I believe performs better since it avoids a second invocation of |
Given that the subsequent If the goal is only to choose which type to pass into the second argument of Perhaps this generalizes as: func FirstToken(data []byte) (Token, error) {
// ...
} My intent here is that this is functionally equivalent to calling The motivating example would then look like this: if json.NextToken(b) == json.Delim('[') {
if err := json.Unmarshal(b, (*[]named)(nn)); err != nil {
return err
}
} else {
var n named
if err := json.Unmarshal(b, &n); err != nil {
return err
}
*nn = names{n}
} Perhaps for improved readability this would also benefit from some extra constants: const ObjectOpen = Delim('{')
const ObjectClose = Delim('}')
const ArrayOpen = Delim('[')
const ArrayClose = Delim(']') ...which would then allow The switch json.NextToken(b) {
case json.ArrayOpen:
// ...
case json.ObjectOpen:
// ...
} ... though admittedly most of the other implementations of |
Problem statement
Many web applications utilize JSON encoding as its primary data exchange format. Among these, some applications may even accept different JSON schemes as an acceptable argument for a single method. For an example, a common use case would be to accept both of the following JSONs as request bodies on a HTTP API endpoint:
However, implementing this kind of behavior on the current state of the library is not trivial, and may lead to inefficient or even wrong implementations. There may exists multiple other ways to do this, but I will mostly focus on two:
Unmarshal the received JSON into an
interface{}
, and implement a common type assertion on a switch statement to decide which value you should convert the unmarshaled valueImplement an in-place verification over the []byte to check if it starts with { or [
Even though 1 utilizes native features from Go, it may be considered far from optimal due to the underlying reflection usage and the unmarshal itself. Considering
raw
the informed[]byte
as a valid JSON, there's no need to unmarshal it entirely only to detect whether it represents a list or not. As for the other option, although being faster than 1, it may be considered error prone and unsafe for a lot of developers to deploy it on a production environment. I like its simplicity, but I guess it could be even more simple.Proposal
As a solution for this, I propose the implementation of a new exported function on pkg
encoding/json
to inform whether a raw[]byte
represents a JSON array (i.e. is a valid JSON and starts with[
). The signature could be something as follows, similar to Valid:With this function available, one could easily implement the same use case described before as follows:
As for its underlying implementation, I guess it could not only utilize the same checkValid function to detect if
data
represents a valid JSON before searching for[
, but also utilize already implemented methods of thescanner
struct as a reference for the JSON traversal. I'd love to contribute with a PR if you guys agree it's a valid addition to the library.OBS: I'm still not sure about the name itself. I admit I don't feel very comfortable with the noun
Array
, but it's the same noun utilized on the JSON RFC. Should it be called slice? Any ideas?The text was updated successfully, but these errors were encountered: