Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
fmt: inconsistent formatting of unicode with %c and %q #14569
( Note that strconv should only be looked at within rune/int32 range so minInt64 and maxInt64 here are not relevant due to the explicit rune type conversion in the call to strconv in this example program).
I would expect fmt %q to be similar to strconv.QuoteRune within rune/int32 range (including negative numbers). E.g. -1 and 1114112 print a quoted utf.RuneError.
For values outside int32 range i would expect fmt %c and %q to behave similar. Either both print a badVerb error string or both print an utf8.RuneError (quoted in case of %q). If they print an error string then so should probably %U too.
Another possibility is that any invalid unicode point could be rejected by fmt formatting with a badVerb error string for %c %q %U.
Can/Should fmt be changed to handle these cases more consistently?
Having looked at this a bit more i would argue the following:
Returning a badVerb error string for any integer and %q or %c is a bug in my opinion since the documentation defines these verbs are ok for any integer. I also dont see any other case in fmt where badVerb triggers bases on value and not based on type.
If however "the character represented by the corresponding Unicode code point" means if there exist no character for the Unicode code point it should be an error then the current behavior of returning utf8.RuneError for other invalid runes below utf8.MaxRune is a bug. Not returning an Error however is explicitly documented in the code as "// If the character is not valid Unicode, it will print '\ufffd'.".
Either way it seems inconsistent with the documentation to me.
My proposed resolution would therefore be to return utf8.RuneError (escaped for %q) for any invalid rune regardless of the integer type or if its > utf8.MaxRune.
This should make it also easier to check for an invalid Unicode code point since instead of checking for an error string and utf8.RuneError one can now only check for the later. The character for RuneError would be RuneError before and after the change.
Also this behavior can be implemented solely in the fmtC (better renamed and moved to fmt_c) and fmt_qc functions with no range checks outside these functions.
As this came up in the report #40175 again I would like to continue the discussion here.
My idea would still be that badVerb does not trigger for value ranges but only for types and that all integers that do not map to a valid unicode point are printed as RuneError ('\uFFFD') rune. This would align with the
Otherwise I think it should be documented that