-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fmt: %q
quotes out-of-range codepoints to strings containing invalid runes
#51526
Comments
Change https://go.dev/cl/390424 mentions this issue: |
I think this is happening in strconv.QuoteRune not fmt. It's also what the Unicode replacement character is for. That rune is the explicit error indicator. |
Seems plausible, but
Maybe? Again, at the very least I would expect to see an explicit description in the documentation: the word “replacement” doesn't even appear in the |
I was about to write what @rsc said. I don't think fmt should behave differently from strconv in this regard. But perhaps documentation could be improved. This is not what I would call a "recovered error", or even a formatting error (such as use of an invalid verb or missing argument). This is just a question of how to represent invalid UTF-8, and there is lots of consistent precedent for that. |
Change https://go.dev/cl/390436 mentions this issue: |
Also switch float64 NaN encoding to use hexadecimal, and accept hexadecimal encoding for all other integer types too. (That gives us the flexibility to change the encodings in either direction in the future without breaking earlier Go versions.) Out-of-range runes encoded using "%q" were previously replaced with the Unicode replacement charecter, losing their values. Out-of-range ints and uints on 32-bit platforms were previously rejected. Now they are wrapped instead: an “interesting” case with a large int or uint found on a 64-bit platform likely remains interesting on a 32-bit platform, even if the specific values differ. To verify the above changes, I have made TestMarshalUnmarshal accept (and check for) arbitrary differences between input and output, and added tests cases that include values in valid but non-canonical encodings. I have also added round-trip fuzz tests in the opposite direction for most of the types affected by this change, verifying that a marshaled value unmarshals to the same bitwise value. Updates #51258 Updates #51526 Fixes #51528 Change-Id: I7727a9d0582d81be0d954529545678a4374e88ed Reviewed-on: https://go-review.googlesource.com/c/go/+/390424 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/390816 mentions this issue: |
…ints and runes Also switch float64 NaN encoding to use hexadecimal, and accept hexadecimal encoding for all other integer types too. (That gives us the flexibility to change the encodings in either direction in the future without breaking earlier Go versions.) Out-of-range runes encoded using "%q" were previously replaced with the Unicode replacement charecter, losing their values. Out-of-range ints and uints on 32-bit platforms were previously rejected. Now they are wrapped instead: an “interesting” case with a large int or uint found on a 64-bit platform likely remains interesting on a 32-bit platform, even if the specific values differ. To verify the above changes, I have made TestMarshalUnmarshal accept (and check for) arbitrary differences between input and output, and added tests cases that include values in valid but non-canonical encodings. I have also added round-trip fuzz tests in the opposite direction for most of the types affected by this change, verifying that a marshaled value unmarshals to the same bitwise value. Updates #51258 Updates #51526 Fixes #51528 Change-Id: I7727a9d0582d81be0d954529545678a4374e88ed Reviewed-on: https://go-review.googlesource.com/c/go/+/390424 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> (cherry picked from commit 7419bb3) Reviewed-on: https://go-review.googlesource.com/c/go/+/390816 Trust: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What did you do?
Called
fmt.Sprintf("%q", x)
with variousint32
values that are not valid as UTF-8 runes (https://go.dev/play/p/wbvbvuKIWCC?v=gotip).What did you expect to see?
The documentation for the
fmt
package says that%q
formats the value as “a single-quoted character literal safely escaped with Go syntax.”Per https://go.dev/ref/spec#Rune_literals, “The escapes
\u
and\U
represent Unicode code points so within them some values are illegal, in particular those above0x10FFFF
and surrogate halves.So I expected
fmt.Sprintf
to emit either an invalid\u
or\U
literal (since these values are too large for\x
or\0
escapes), or an explicit format-error string of some sort (along the lines of https://pkg.go.dev/fmt#hdr-Format_errors).At the very least, I expected to see some explicit description of the invalid-range behavior in the
fmt
package docs!What did you see instead?
Invalid codepoints uniformly (and lossily) quoted to the Unicode replacement character:
(CC @robpike)
The text was updated successfully, but these errors were encountered: