-
Notifications
You must be signed in to change notification settings - Fork 3
is_utf8 helper #1
Comments
This will need C code to distinguish between "unknown" encoding and truly "all ASCII", basically an enhanced do_encoding() that also checks IS_ASCII(). Eventually this could be added upstream. |
What's the difference between unknown and ascii? I thought unknown implied ascii. |
Need to double-check. |
There' a dedicated ASCII bit that is set if the code of all characters is 127 or less. |
But that never gets used in |
Also, none of the internal string representation stuff is in the exported API, so I think that means doing checks in R with |
ASCII implies unknown, but not the other way round. It will be difficult to detect pure ASCII strings using Encoding() only. |
Yes, but it'll be accurate >95% of the time, I'd imagine |
- New `encoding()`, returns `"ASCII"` for pure ASCII strings and behaves identical to `base::Encoding()` otherwise. - New `all_utf8()`, returns an atomic logical that indicates if all elements of a character vector are UTF-8 encoded; this includes pure ASCII stringsi (#1). - Remove `Encoding<-` override, with documentation (#7).
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
We need some helper function that checks if a character vector is all utf-8.
Not sure how to distinguish this from a class based test for utf8.
The text was updated successfully, but these errors were encountered: