Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate unsigned integer decoding to support full range of uint64 #782

Closed
wants to merge 1 commit into from

Conversation

jmank88
Copy link
Contributor

@jmank88 jmank88 commented May 22, 2022

Drafted a quick fix for #781

Happy to refactor, add tests, etc.

@pelletier
Copy link
Owner

Thank you for submitting the PR! I think it's directionally right, but I'd love to see if we can refactor it.

Maybe having a parseInteger function return the cleaned up bytes + base (like (data []byte, int base), and have unmarshalInteger pick which strconv.Parse* function to call.

An other option is to pass a boolean to parseInteger to tell it whether to use signed or unsigned.

For performance I think the latter is better (because it allows to ultimately remove the underscores cleaning that allocates), but the first one is probably easier to reason about. If you're willing to update the PR I'm happy with either; we can always revisit it later. As for testing, as long as the coverage / report action is happy, I'm happy :)

@pelletier pelletier added the bug Issues describing a bug in go-toml. label May 23, 2022
@jmank88
Copy link
Contributor Author

jmank88 commented May 23, 2022

I thought the latter sounded better initially too, but it seems to require returning the two different types back up the stack still. I went with a version of the former, which actually ended up enabling further simplification, but now I've simplified away errors that were tested, so might need to dial it back... 🤔

case reflect.Interface:
i, err := parseInteger(value.Data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface case is still ambiguous, and thus limited to int64 currently.

@pelletier
Copy link
Owner

Thinking more about it, I remembered that the TOML spec only handles int64 numbers:

https://github.com/toml-lang/toml/blob/cbf3b13128fb717517afbd22f8fcb665a0b0b035/toml.md?plain=1#L459-L461

I think that's why I initially just went with strconv.ParseInt, even for unsigned numbers. Did you encounter this issue in the wild or were you testing specifically for overflows?

@jmank88
Copy link
Contributor Author

jmank88 commented May 24, 2022

Oh, curious.

Did you encounter this issue in the wild or were you testing specifically for overflows?

We are converting from a legacy config which contains uint64 fields (as well as big.Int - which I guess is not actually supported natively either 🤔 ). Regardless of whether or not this is a fundamental problem, it was the asymmetry of being able to Encode/Marshal large values that cannot be Decoded/Unmarshaled that brought this to our attention. Should the encoder be limited to producing valid toml? Or is that on the users?

@moorereason
Copy link
Contributor

Much discussion about this in toml-lang/toml#538.

As I understand it: implementations MAY support integers larger than int64 but can't guarantee that other implementations will support anything beyond int64. It would be valid but irregular TOML.

@pelletier
Copy link
Owner

Sorry for the slow response. I'm trying to think this through. I am a bit hesitant to support integers outside int64, as that's what the spec describes. The reason for it is that by deviating too much from the specification, people using go-toml just fine for some files may be locked into using it without realizing it – meaning the data is no longer portable, because other libraries may have different behavior for numbers greater than max int64. For folks who expect larger numbers, a type implementing encoding.TextMarshaler may be better suited – and actually portable. If we go down that route, I think @jmank88 is right: the library should prevent the user from serializing a uint64 greater than max int64, so that there is symmetry between encoding and decoding.

@jmank88
Copy link
Contributor Author

jmank88 commented May 28, 2022

No problem. We came to similar a conclusion, and have accepted that we'll have to use quoted strings for large ints and high precision floats. Closing this since a pivot to limiting encoded values instead would be a fundamentally different PR.

@jmank88 jmank88 closed this May 28, 2022
pelletier added a commit that referenced this pull request Jun 1, 2022
As brought up on #782, there is an asymetry between numbers go-toml can encode
and decode. Specifically, unsigned numbers larger than `math.Int64` could be
encoded but not decoded.

We considered two options: allow decoding of those numbers, or prevent those
numbers to be decoded.

Ultimately we decided to narrow the range of numbers to be encoded. The TOML
specification only requires parsers to support at least the 64 bits integer
range. Allowing larger numbers would create non-standard TOML documents, which
may not be readable (at best) by other implementations. It is a safer default to
generate documents valid by default. People who wish to deal with larger numbers
can provide a custom type implementing `encoding.TextMarshaler`.

Refs #781
pelletier added a commit that referenced this pull request Jun 1, 2022
As brought up on #782, there is an asymetry between numbers go-toml can encode
and decode. Specifically, unsigned numbers larger than `math.Int64` could be
encoded but not decoded.

We considered two options: allow decoding of those numbers, or prevent those
numbers to be decoded.

Ultimately we decided to narrow the range of numbers to be encoded. The TOML
specification only requires parsers to support at least the 64 bits integer
range. Allowing larger numbers would create non-standard TOML documents, which
may not be readable (at best) by other implementations. It is a safer default to
generate documents valid by default. People who wish to deal with larger numbers
can provide a custom type implementing `encoding.TextMarshaler`.

Refs #781
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues describing a bug in go-toml.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants