separate unsigned integer decoding to support full range of uint64 #782

jmank88 · 2022-05-22T13:18:34Z

Drafted a quick fix for #781

Happy to refactor, add tests, etc.

pelletier · 2022-05-23T11:42:48Z

Thank you for submitting the PR! I think it's directionally right, but I'd love to see if we can refactor it.

Maybe having a parseInteger function return the cleaned up bytes + base (like (data []byte, int base), and have unmarshalInteger pick which strconv.Parse* function to call.

An other option is to pass a boolean to parseInteger to tell it whether to use signed or unsigned.

For performance I think the latter is better (because it allows to ultimately remove the underscores cleaning that allocates), but the first one is probably easier to reason about. If you're willing to update the PR I'm happy with either; we can always revisit it later. As for testing, as long as the coverage / report action is happy, I'm happy :)

jmank88 · 2022-05-23T23:37:27Z

I thought the latter sounded better initially too, but it seems to require returning the two different types back up the stack still. I went with a version of the former, which actually ended up enabling further simplification, but now I've simplified away errors that were tested, so might need to dial it back... 🤔

jmank88 · 2022-05-23T23:40:35Z

unmarshaler.go

 	case reflect.Interface:
+		i, err := parseInteger(value.Data)


This interface case is still ambiguous, and thus limited to int64 currently.

pelletier · 2022-05-24T09:52:02Z

Thinking more about it, I remembered that the TOML spec only handles int64 numbers:

https://github.com/toml-lang/toml/blob/cbf3b13128fb717517afbd22f8fcb665a0b0b035/toml.md?plain=1#L459-L461

I think that's why I initially just went with strconv.ParseInt, even for unsigned numbers. Did you encounter this issue in the wild or were you testing specifically for overflows?

jmank88 · 2022-05-24T12:14:18Z

Oh, curious.

Did you encounter this issue in the wild or were you testing specifically for overflows?

We are converting from a legacy config which contains uint64 fields (as well as big.Int - which I guess is not actually supported natively either 🤔 ). Regardless of whether or not this is a fundamental problem, it was the asymmetry of being able to Encode/Marshal large values that cannot be Decoded/Unmarshaled that brought this to our attention. Should the encoder be limited to producing valid toml? Or is that on the users?

moorereason · 2022-05-24T14:23:29Z

Much discussion about this in toml-lang/toml#538.

As I understand it: implementations MAY support integers larger than int64 but can't guarantee that other implementations will support anything beyond int64. It would be valid but irregular TOML.

pelletier · 2022-05-28T17:09:33Z

Sorry for the slow response. I'm trying to think this through. I am a bit hesitant to support integers outside int64, as that's what the spec describes. The reason for it is that by deviating too much from the specification, people using go-toml just fine for some files may be locked into using it without realizing it – meaning the data is no longer portable, because other libraries may have different behavior for numbers greater than max int64. For folks who expect larger numbers, a type implementing encoding.TextMarshaler may be better suited – and actually portable. If we go down that route, I think @jmank88 is right: the library should prevent the user from serializing a uint64 greater than max int64, so that there is symmetry between encoding and decoding.

jmank88 · 2022-05-28T21:25:10Z

No problem. We came to similar a conclusion, and have accepted that we'll have to use quoted strings for large ints and high precision floats. Closing this since a pivot to limiting encoded values instead would be a fundamentally different PR.

As brought up on #782, there is an asymetry between numbers go-toml can encode and decode. Specifically, unsigned numbers larger than `math.Int64` could be encoded but not decoded. We considered two options: allow decoding of those numbers, or prevent those numbers to be decoded. Ultimately we decided to narrow the range of numbers to be encoded. The TOML specification only requires parsers to support at least the 64 bits integer range. Allowing larger numbers would create non-standard TOML documents, which may not be readable (at best) by other implementations. It is a safer default to generate documents valid by default. People who wish to deal with larger numbers can provide a custom type implementing `encoding.TextMarshaler`. Refs #781

pelletier added the bug Issues describing a bug in go-toml. label May 23, 2022

support decoding full uint64 range - fixes pelletier#781

e098af8

jmank88 force-pushed the uint64-full-range branch from c8227ce to e098af8 Compare May 23, 2022 23:30

jmank88 commented May 23, 2022

View reviewed changes

englercj mentioned this pull request May 25, 2022

uint64_t values cause compiler errors marzer/tomlplusplus#122

Closed

jmank88 closed this May 28, 2022

pelletier mentioned this pull request Jun 1, 2022

Encode: add bound check for uint64 > math.Int64 #785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

separate unsigned integer decoding to support full range of uint64 #782

separate unsigned integer decoding to support full range of uint64 #782

jmank88 commented May 22, 2022

pelletier commented May 23, 2022

jmank88 commented May 23, 2022

jmank88 May 23, 2022

pelletier commented May 24, 2022

jmank88 commented May 24, 2022

moorereason commented May 24, 2022

pelletier commented May 28, 2022

jmank88 commented May 28, 2022

separate unsigned integer decoding to support full range of uint64 #782

separate unsigned integer decoding to support full range of uint64 #782

Conversation

jmank88 commented May 22, 2022

pelletier commented May 23, 2022

jmank88 commented May 23, 2022

jmank88 May 23, 2022

Choose a reason for hiding this comment

pelletier commented May 24, 2022

jmank88 commented May 24, 2022

moorereason commented May 24, 2022

pelletier commented May 28, 2022

jmank88 commented May 28, 2022