-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preg split not splitting some unicodes #665
Comments
These are Unicode surrogate code points. They don't correspond to a valid Unicode character. This is mojibake. You can't split them meaningfully into characters if they are not characters in the first place. What exactly is your expectation here? |
From the PCRE docs:
We may want to document, that surrogates are not supported; converting to Utf-8 first may yield the desired result. |
Yes the characters may be non meaningful.
It is a string of obfuscated data and I have to extract the unicode code
point value of it and add a constant and then get the character
corresponding to that code point value
I tried to make a method to deobfuscate a string. In java it works
correctly, but in php it fails
2021 ජූනි 7, සඳුදා 17:15 දින Kamil Tekiela ***@***.***>
ලිව්වා:
… These are Unicode surrogate code points. They don't correspond to a valid
Unicode character. This is mojibake. You can't split them meaningfully into
characters if they are not characters in the first place. What exactly is
your expectation here?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#665 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARTMGVFV46ELBAFFGBQHPPTTRSWPFANCNFSM46HMQ2KQ>
.
|
Java works with UTF-16, PHP' PCRE with UTF-8. |
From manual page: https://php.net/function.preg-split
These json encoded unicode characters in a string not splitted by the method
\ud876\ude54
preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY);
The text was updated successfully, but these errors were encountered: