You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the colorized string contains UTF-8 characters, the resulting string from strip_style() is no longer of type "UTF-8" but of type "unknown". The reason for that is that strip_style() uses gsub() with parameter useBytes=TRUE. Manual for gsub states that:
The main effect of ‘useBytes = TRUE’ is to avoid errors/warnings
about invalid inputs and spurious matches in multibyte locales,
but for ‘regexpr’ it changes the interpretation of the output. It
inhibits the conversion of inputs with marked encodings, and is
forced if any input is found which is marked as ‘"bytes"’ (see
‘Encoding’).
If R is running under a locale which is non-UTF-8 (i.e. the value of l10n_info()[["UTF-8"]] is FALSE), this may lead to various "interesting" side effects.
CRAN uses a en_US.ISO-8859-15 locale in one of the platforms for checking. Any package that uses UTF-8 characters in combination with crayon may have unexpected and hard to track bugs when tested by CRAN. This very situation happened to me (package colorDF) and the bug was infuriating to replicate and track.
The text was updated successfully, but these errors were encountered:
Btw. I also suggest using cli instead crayon, cli handles UTF-8 strings correctly, and it does not rely on base R's nchar() which is incorrect in many cases, it also gives you RGB colors, bright variants, lots of ansi string operations, etc.
If the colorized string contains UTF-8 characters, the resulting string from
strip_style()
is no longer of type "UTF-8" but of type "unknown". The reason for that is thatstrip_style()
usesgsub()
with parameteruseBytes=TRUE
. Manual for gsub states that:If R is running under a locale which is non-UTF-8 (i.e. the value of l10n_info()[["UTF-8"]] is FALSE), this may lead to various "interesting" side effects.
Demonstration:
The result is:
Now, consider this code:
The result depends on the locale:
under UTF-8 enabled locale (e.g. en_US.UTF-8), the results are:
Under another locale, e.g. en_US.ISO-8859-15, the results are
What would be the expected output under ISO-8850-15
Why is that important?
CRAN uses a en_US.ISO-8859-15 locale in one of the platforms for checking. Any package that uses UTF-8 characters in combination with crayon may have unexpected and hard to track bugs when tested by CRAN. This very situation happened to me (package colorDF) and the bug was infuriating to replicate and track.
The text was updated successfully, but these errors were encountered: