You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently let a vector that is consistently in a non-UTF-8 encoding (like an all Latin1 vector) through obj_maybe_translate_encoding() without translation. We do this:
a) For performance reasons with those vectors
b) This is what base R does
However, the new vec_order() will always convert to UTF-8, and it would certainly simplify things if we always converted to UTF-8 everywhere.
I don't think it would change the actual behavior of any functions that use this translation helper, since you'd still end up with a consistent encoding that you can compare in.
I think this would remove the need for obj_maybe_translate_encoding2(), because you could always expect that the result of obj_maybe_translate_encoding() would be in UTF-8, making it comparable to any other UTF-8 vector.
It would also greatly simplify the translation code, especially for lists, which already have weird translation rules (it does a first pass through the list to check if any element needs translation, then does a second pass to translate all elements if any elements required translation).
Also consider renaming it to proxy_translate_utf8()
The text was updated successfully, but these errors were encountered:
Would likely make #1245 more straightforward
We currently let a vector that is consistently in a non-UTF-8 encoding (like an all Latin1 vector) through
obj_maybe_translate_encoding()
without translation. We do this:a) For performance reasons with those vectors
b) This is what base R does
However, the new
vec_order()
will always convert to UTF-8, and it would certainly simplify things if we always converted to UTF-8 everywhere.I don't think it would change the actual behavior of any functions that use this translation helper, since you'd still end up with a consistent encoding that you can compare in.
I think this would remove the need for obj_maybe_translate_encoding2(), because you could always expect that the result of
obj_maybe_translate_encoding()
would be in UTF-8, making it comparable to any other UTF-8 vector.It would also greatly simplify the translation code, especially for lists, which already have weird translation rules (it does a first pass through the list to check if any element needs translation, then does a second pass to translate all elements if any elements required translation).
Also consider renaming it to
proxy_translate_utf8()
The text was updated successfully, but these errors were encountered: