Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF-8 encoded surrogate pairs #5

Closed
shooshx opened this issue May 19, 2016 · 2 comments
Closed

Support UTF-8 encoded surrogate pairs #5

shooshx opened this issue May 19, 2016 · 2 comments

Comments

@shooshx
Copy link

shooshx commented May 19, 2016

some systems emit utf8 strings with surrogate pairs encoded as two 3-byte sequences. uftcpp does not support such an encoding and throws an exception about an invalid code point when encountering it.

@nemtrif
Copy link
Owner

nemtrif commented May 21, 2016

That is invalid UTF-8, so the library is doing the right thing.

@nemtrif nemtrif closed this as completed May 21, 2016
@shooshx
Copy link
Author

shooshx commented May 22, 2016

According to https://en.wikipedia.org/wiki/UTF-8#CESU-8
That is indeed invalid UTF-8 but there's a derivative standard callied CESU-8 in which it is infact valid.
There are systems (Android OS for instance) that do emit this encoding so the question of if its valid or not is irrelevant.
I would argue that "the right thing" would be to do the thing that is most useful and offers the broadest support. Otherwise any user who wants to use uftcpp on Android is going to run in to this issue and would eventually need to replace uftcpp with something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants