You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some procs like toRunes sometimes map invalid bytes to 0xFFFD, which is undocumented.
I've found 9a59842, which changed the behavior of fastRunAt from "raise assertion error on invalid UTF-8" to "return garbage on invalid UTF-8".
Here's what the Unicode conformance document says on handling invalid UTF-8:
For example, in UTF-8 every code unit of the form 110xxxx must be followed by a code unit of the form 10xxxxxx. A sequence such as 110xxxxx 0xxxxxxx is ill-formed and must never be generated. When faced with this ill-formed code unit sequence while transforming or interpreting text, a conformant process must treat the first code unit 110xxxxx as an illegally terminated code unit sequence—for example, by signaling an error, filtering the code unit out, or representing the code unit with a marker such as U+FFFD replacement character.
The text was updated successfully, but these errors were encountered:
Some procs in the
unicode
module produce confusing results when given invalid UTF-8 as input.Some procs like
toRunes
sometimes map invalid bytes to0xFFFD
, which is undocumented.I've found 9a59842, which changed the behavior of
fastRunAt
from "raise assertion error on invalid UTF-8" to "return garbage on invalid UTF-8".Here's what the Unicode conformance document says on handling invalid UTF-8:
The text was updated successfully, but these errors were encountered: