Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require mbstring with v3 #157

Closed
boesing opened this issue Aug 5, 2021 · 6 comments
Closed

Require mbstring with v3 #157

boesing opened this issue Aug 5, 2021 · 6 comments
Labels
Enhancement Won't Fix This will not be worked on

Comments

@boesing
Copy link
Member

boesing commented Aug 5, 2021

Feature Request

Q A
New Feature yes
BC Break yes

Summary

As this library supports PSR-6 and PSR-16, it also needs ways to handle multibyte strings which are allowed as per these recommendations.
To fix #154 properly, a multibyte check needs to be done when a key exceeds a certain limit.
This is actually only the case for the redis adapter, so until v3, the redis adapter will decrease its maximum key length to 65534 as 65535 is the maximum number to be used as a specifier for pcre2.

https://3v4l.org/GURWn

@boesing boesing added this to the 3.0.0 milestone Aug 5, 2021
@boesing boesing added this to To do in Cache v3 via automation Aug 5, 2021
@ghostwriter
Copy link
Contributor

Actually you may not need the mbstring extension:

if (strlen($string) != strlen(utf8_decode($string)))
{
    echo 'string is unicode';
}

@boesing
Copy link
Member Author

boesing commented Aug 5, 2021

@ghostwriter As written in Slack, this would then require ext-xml.

@ghostwriter
Copy link
Contributor

@boesing right.

also came up with this:

if (strlen($string) !== strlen(html_entity_decode($string, ENT_QUOTES , 'ISO-8859-1')))
{
    echo 'string is unicode';
}
// another method

html_entity_decode is an internal function. https://3v4l.org/UZOM1

var_dump(in_array('html_entity_decode', get_defined_functions()['internal'])); // bool(true)

@boesing
Copy link
Member Author

boesing commented Aug 6, 2021

The main thing I dont like here is, that strlen is slow depending on the key length. Here, we have to do it twice.

There were performance checks in #154 (comment) regarding performance and thus I'd like to avoid doing it twice. Thus, requiring mbstring seems the most proper way of veryfing the length of a string with a minimal footprint.

@boesing
Copy link
Member Author

boesing commented Aug 6, 2021

Closing this with the rationale, that allowing keys longer than 65534 will lead to performance problems when validating cache keys. Another reason is, that mb_strlen does not return the byte length but the string length.
UTF-8 characters which consume 4-bytes should be treated as 4 characters but with mb_strlen they're treated as 1 character.

@boesing boesing closed this as completed Aug 6, 2021
Cache v3 automation moved this from To do to Done Aug 6, 2021
@boesing boesing added the Won't Fix This will not be worked on label Aug 6, 2021
@boesing boesing removed this from Done in Cache v3 Aug 6, 2021
@boesing boesing removed this from the 3.0.0 milestone Aug 6, 2021
@ghostwriter
Copy link
Contributor

good point.

For the record I was trying to find a solution to the particular issue. I’m not against adding mbstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Won't Fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants