Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: clarify Decoder.InputOffset semantics #42571

Open
dsnet opened this issue Nov 13, 2020 · 3 comments
Open

encoding/json: clarify Decoder.InputOffset semantics #42571

dsnet opened this issue Nov 13, 2020 · 3 comments
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Nov 13, 2020

The documentation on Decoder.InputOffset currently says:

The offset gives the location of the end of the most recently returned token and the beginning of the next token.

This explanation is somewhat incorrect since the offset cannot be a single value if there are intervening whitespace in-between two JSON tokens. Furthermore, the json package implicitly handles colons and commas, which are technically JSON tokens according to RFC 7159, section 2. May the offset ever be past these implicit colons and commas, or will it always be before them? The "token" in this context is a somewhat ambiguous.

The documentation should be clarified to say one of the following:

  • The offset may be between the end of the most recently returned token (exclusive), and before the beginning of the next token that may be returned (inclusive).
  • The offset points to the next character immediately after the most recently returned token.
  • The offset points to the first character of the next token that may be returned.

I chose the phrase "may be returned" to indicate that colons and commas aren't included since they aren't ever returned by the json API. I haven't checked the implementation yet to see what it actually returns.

The first definition gives more flexibility to how the Decoder may actually be implemented. The latter two definitions gives the user more assurance about what InputOffset actually means.

Updates #29688

@mvdan
Copy link
Member

mvdan commented Nov 13, 2020

I agree that the current definition is a bit ambiguous, happy to review a CL. I assume we should just document the current behavior.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 13, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Nov 13, 2020
@Windsooon
Copy link

Windsooon commented Sep 9, 2021

I love the third idea, how about:

"The offset points to the first character of the next token that may be returned. (except colons or commas)"

The current comments and implement at src/encoding/json/stream.go didn't treat colons or commas as Token:

// A Token holds a value of one of these types:
//
// Delim, for the four JSON delimiters [ ] { }
// bool, for JSON booleans
// float64, for JSON numbers
// Number, for JSON numbers
// string, for JSON string literals
// nil, for JSON null

It looks like we have to change a lot of comments if we want to fix it.

@dsnet
Copy link
Member Author

dsnet commented Sep 9, 2021

"The offset points to the first character of the next token that may be returned. (except colons or commas)"

This is the most user friendly semantic, but breaks in other ways since it requires reading from the input stream until it locates the next token. This would cause the Decoder to block on a read (e.g., from a network socket) in cases when it otherwise would not have.

It looks like we have to change a lot of comments if we want to fix it.

A JSON token is a well defined in RFC section 2. I don't think we need to enumerate every possible case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants