-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mb_detect_encoding(): wrong results with null $encodings #9008
Comments
I don't know if it helps or not, but after installing Composer dependencies, the point echo mb_detect_encoding("Oops!") . PHP_EOL; Then run ./vendor/bin/phpunit --filter testRetrievesAllClasses tests/Integration/Analysis/ClasslikeListProviderTest.php The output on my system is:
Although passing echo mb_detect_encoding("+") . PHP_EOL; Outputs:
Amazingly enough, adding
If you try this: var_dump(mb_detect_order()); , you'll see that its size increases from 2 (containing only "ASCII" and "UTF-8") to 80. Amazing! |
Indeed: https://gitlab.com/Serenata/Serenata/-/blob/master/src/Bootstrap.php#L55-61 |
@cmb69, nice catch. 👏 So maybe the bug actually is, the order of the encodings passed to |
Simple reproducer: https://3v4l.org/arIJ5. The difference is because the second php-src/ext/mbstring/mbstring.c Lines 2503 to 2518 in edb173c
@alexdowad, shouldn't we call |
@cmb69, what path through the code is taken if php-src/ext/mbstring/mbstring.c Lines 2555 to 2557 in edb173c
|
@alexdowad, this is not about php-src/ext/mbstring/mbstring.c Lines 2730 to 2734 in edb173c
And since php-src/ext/mbstring/mbstring.c Lines 2742 to 2748 in edb173c
However, if |
Ah, I see. One option is to automatically filter out non-text encodings from current_detect_order_list whenever it's set. Do you think that's a good or bad idea? |
Thanks to @machitgarha for finding this problem, by the way. |
Ah, right. That may confuse users though ( |
Maybe it's better to just copy The non-text encodings will go away in a future release of PHP anyways. |
Passing `null` to `$encodings` is supposed to behave like passing the result of `mb_detect_order()`. Therefore, we need to remove the non- encodings from the `elist` in this case as well. Thus, we duplicate the global `elist`, so we can modify it.
mb_detect_encoding()
sometimes not working with null
as its second argument* PHP-8.1: Fix GH-9008: mb_detect_encoding(): wrong results with null $encodings
@cmb69 and @alexdowad, thank you for fixing the issue! |
Description
First of all, a special thanks to the progress made by all developers!
Calling
mb_detect_encoding()
with the second argument beingnull
may return a really strange result, while changing it tomb_detect_order()
fixes the issue. Based on the official PHP documentation, they must behave the same.I tried something like four hours, but cannot figure out what the problem exactly is. Even I cannot create a small re-producable example; maybe the problem is environment- or context-dependent.
Steps to Reproduce
d3c9dcb3426a9b5ffe442a436e2179063ea6c9d7
(current master).composer install
../vendor/bin/phpunit
.You should see there are lots of failures (i.e. 346). So what the hell? Wait, start editing the file
src/Analysis/SourceCodeReading/TextToUtf8Converter.php
, go to line 15 and change it fromto
. Now re-run
./vendor/bin/phpunit
. With this simple change, the issue is (almost) fixed and only one failure remains.I'm pretty sure that this is a new bug introduced in PHP 8.1. I compiled both PHP 8.0.19 and 8.1.8 the same way, with default configurations, and ran PHPUnit under both.
Under PHP 8.0, passing both values as the second argument works perfectly, and ALL tests pass. Under PHP 8.1, I expect
$encoding
to hold eitherASCII
orUTF-8
(or maybe evenfalse
), but it strangely equals toUUENCODE
. Also, the one remaining failed test after the change is also related tomb_detect_encoding()
, and I expect the same, but it randomly returnsUUENCODE
orUTF-7
, causing the+
signs of the string to disappear aftermb_convert_encoding()
(line 22 of the file).PHP Version
PHP 8.1.7, PHP 8.1.8
Operating System
Fedora Workstation 36
The text was updated successfully, but these errors were encountered: