New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grapheme_strlen shows different length of emoji ZWJ Sequence when compared to native #370
Comments
I forgot to share the code snippet used on the results above: https://3v4l.org/OPBFq#v8.0.10 |
Would you agree with considering that once #369 is merged, this issue can be closed? Aka we don't provide the most recent regexp to ppl that use older PCRE versions? Alternatively, would you mind looking at improving this regexp? I'm sure I generated it but I don't remember how. There might be a script somewhere in this repo or mayne in https://github.com/tchwork/utf8 |
Thanks for asking my input. This package requires PHP 7.1, which seems to use PCRE 8.38 according to 3v4l.org: https://3v4l.org/S1bPl On the PHP versions made available by 3v4l, 8.32 is used on PHP versions bellow 5.5.9, but I'm not sure if this will always be the case. Is it possible for PHP 7.1+ to be running PCRE 8.32..? |
It seems PCRE 8.32 made it's way into PHP core in 2013: php/php-src@357ab3c And has been replaced with 8.35 in 2014: php/php-src@dd0e96c I guess it's fine to drop support for the old PCRE_VERSION. It would be ideal if this could be enforced in composer.json through https://jubianchi.github.io/semver-check/#/^10%20||%20^8.34/8.34%202013-12-15 Or https://jubianchi.github.io/semver-check/#/%3E%208.32/8.34%202013-12-15 |
Actually, only PCRE2 (10+) is able to handle the initial |
I'm going to close here because nobody worked on this. Ppl should upgrade to PCRE 10+ (or contribute a fix here ;) ) |
Take the following emoji for instance: 👩👩👦👦
This emoji consists of four different emojis glued together by Zero Width Joiner characters, as seen on https://emojipedia.org/family-woman-woman-boy-boy/.
When checking the length with grapheme_strlen(), it returns 1, while this library returns 4.
This is possibly due to a bug on the GRAPHEME_CLUSTER_RX regex.
This bug should only happen on PCRE_VERSION < 8.32, however, when combined with the bug #369 , it applies to all PCRE_VERSION that contains a date timestamp, which seems to be the default format.
Therefore, the
grapheme_strlen
function in this polyfill is likely to provide incorrect results, such as in this example:Expected result
grapheme_strlen('👩👩👦👦')
:Actual result with the custom cluster
grapheme_strlen('👩👩👦👦')
:The text was updated successfully, but these errors were encountered: