We currently let a vector that is consistently in a non-UTF-8 encoding (like an all Latin1 vector) through obj_maybe_translate_encoding() without translation. We do this:
a) For performance reasons with those vectors
b) This is what base R does
However, the new vec_order() will always convert to UTF-8, and it would certainly simplify things if we always converted to UTF-8 everywhere.
I don't think it would change the actual behavior of any functions that use this translation helper, since you'd still end up with a consistent encoding that you can compare in.
I think this would remove the need for obj_maybe_translate_encoding2(), because you could always expect that the result of obj_maybe_translate_encoding() would be in UTF-8, making it comparable to any other UTF-8 vector.
It would also greatly simplify the translation code, especially for lists, which already have weird translation rules (it does a first pass through the list to check if any element needs translation, then does a second pass to translate all elements if any elements required translation).
Also consider renaming it to proxy_translate_utf8()
The text was updated successfully, but these errors were encountered:
Would likely make #1245 more straightforward
We currently let a vector that is consistently in a non-UTF-8 encoding (like an all Latin1 vector) through
obj_maybe_translate_encoding()
without translation. We do this:a) For performance reasons with those vectors
b) This is what base R does
However, the new
vec_order()
will always convert to UTF-8, and it would certainly simplify things if we always converted to UTF-8 everywhere.I don't think it would change the actual behavior of any functions that use this translation helper, since you'd still end up with a consistent encoding that you can compare in.
I think this would remove the need for obj_maybe_translate_encoding2(), because you could always expect that the result of
obj_maybe_translate_encoding()
would be in UTF-8, making it comparable to any other UTF-8 vector.It would also greatly simplify the translation code, especially for lists, which already have weird translation rules (it does a first pass through the list to check if any element needs translation, then does a second pass to translate all elements if any elements required translation).
Also consider renaming it to
proxy_translate_utf8()
The text was updated successfully, but these errors were encountered: