New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert 4 byte utf8 to two surrogate utf16 and back not implemented #851
Comments
@caolanm what do you think? |
Theoretically, C++11 provides such transformations in the |
@cuellius I know it well. I wrote about hunspell/csutil.cxx file's u16_u8 and u8_u16 functions! |
@caolanm In the hunspell/csutil.cxx source code commented in u8_u16 and u16_u8 functions as conversion 4 utf8 byte are not implemented yet. I think, the algorithm in #851 (comment) is could be useful. |
@caolanm I'm working on it. |
@caolanm There is in the Unicode standard 15.0 version arabic extension, which section is upper than 0xFFFF code. |
Converter function utf8s to utf16 has not capatiblity 4 byte utf8s to two utf16 surrogates and back. It could be resolved with
4 utf8->utf32->2 utf16 surrogates conversion. See: Unicode faq about "utf8, utf16, utf32".
The back conversion is implementable with same logic. The utf8s first byte xf0 to xf7.
It needs for languages/scripts which are in the codearea >0xffff. I think, for example Old Hungarian scripts.
The text was updated successfully, but these errors were encountered: