Skip to content

Consider making obj_maybe_translate_encoding() always translate to UTF-8 #1246

@DavisVaughan

Description

@DavisVaughan

Would likely make #1245 more straightforward

We currently let a vector that is consistently in a non-UTF-8 encoding (like an all Latin1 vector) through obj_maybe_translate_encoding() without translation. We do this:

a) For performance reasons with those vectors
b) This is what base R does

However, the new vec_order() will always convert to UTF-8, and it would certainly simplify things if we always converted to UTF-8 everywhere.

I don't think it would change the actual behavior of any functions that use this translation helper, since you'd still end up with a consistent encoding that you can compare in.

I think this would remove the need for obj_maybe_translate_encoding2(), because you could always expect that the result of obj_maybe_translate_encoding() would be in UTF-8, making it comparable to any other UTF-8 vector.

It would also greatly simplify the translation code, especially for lists, which already have weird translation rules (it does a first pass through the list to check if any element needs translation, then does a second pass to translate all elements if any elements required translation).


Also consider renaming it to proxy_translate_utf8()

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions