Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mb_string issue for multiple locales #235

Closed
wants to merge 1 commit into from

Conversation

tuupke
Copy link
Contributor

@tuupke tuupke commented Apr 21, 2021

For specific locales, the '8BIT' encoding is undefined. The '8bit'
encoding however, is defined for all locales. When this encoding is used
for a locale that does not support it, the runtime will exit with a
fatal error.

This issue does not occur often, only when the relevant locales have
been generated and are loaded using setlocale. Below is a testscript
which can be used to verify this for all installed locales. Note, before
testing, ensure that all (relevant) locales have been generated,
otherwise the testscript will not show expected output.

This was tested on Ubuntu 20.10 and Arch (both up-to-date) for php
7.0 through 8.0 for all locales. The following locales where found to be
not working as expected:

  • az_AZ
  • crh_UA
  • ku_TR
  • ku_TR.utf8
  • tr_CY 8BIT
  • tr_CY.utf8
  • tr_TR
  • tr_TR.utf8
  • tt_RU@iqtelif
<?php

// Register an exception handler to throw exceptions on warnings
// so the actual issue can be caught
set_error_handler(function ($errno, $errstr, $errfile, $errline ) {
    throw new ErrorException($errstr, $errno, 0, $errfile, $errline);
});

echo phpversion()."\n";

// Load all locales and construct a string containing mb characters
$locales = array_filter(explode("\n", `locale -a`));
$buffer = "ğĞıİöÖüÜşŞçÇ";

foreach ($locales as $locale) {
    setlocale(LC_ALL, $locale);
    try {
        $norm = mb_strlen($buffer, '8bit');
    } catch (Exception $e) {
        echo "$locale 8bit failed\n";
    } catch (Error $e) {
        echo "$locale 8bit failed\n";
    }

    try {
        $norm = mb_strlen($buffer, '8BIT');
    } catch (Exception $e) {
        echo "$locale 8BIT failed\n";
    } catch (Error $e) {
        echo "$locale 8BIT failed\n";
    }
}

This can be tested using docker:
docker run --rm -v $(pwd):/test php:7.0 bash -c 'apt update -qqq; apt install locales-all -y -qqq; php /test/test.php'

This also fixes the issue reported in #232 and perhaps the issue reported in #200.

For specific locales, the '8BIT' encoding is undefined. The '8bit'
encoding however, is defined for all locales. When this encoding is used
for a locale that does not support it, the runtime will exit with a
fatal error.

This issue does not occur often, only when the relevant locales have
been generated and are loaded using `setlocale`. Below is a testscript
which can be used to verify this for all installed locales. Note, before
testing, ensure that all (relevant) locales have been generated,
otherwise the testscript will not show expected output.

This was tested on Ubuntu 20.10 and Arch (both up-to-date) for php
7.0 through 8.0 for all locales. The following locales where found to be
not working as expected:

 - az_AZ
 - crh_UA
 - ku_TR
 - ku_TR.utf8
 - tr_CY 8BIT
 - tr_CY.utf8
 - tr_TR
 - tr_TR.utf8
 - tt_RU@iqtelif

```
<?php

// Register an exception handler to throw exceptions on warnings
// so the actual issue can be caught
set_error_handler(function ($errno, $errstr, $errfile, $errline ) {
    throw new ErrorException($errstr, $errno, 0, $errfile, $errline);
});

echo phpversion()."\n";

// Load all locales and construct a string containing mb characters
$locales = array_filter(explode("\n", `locale -a`));
$buffer = "ğĞıİöÖüÜşŞçÇ";

foreach ($locales as $locale) {
    setlocale(LC_ALL, $locale);
    try {
        $norm = mb_strlen($buffer, '8bit');
    } catch (Exception $e) {
        echo "$locale 8bit failed\n";
    } catch (Error $e) {
        echo "$locale 8bit failed\n";
    }

    try {
        $norm = mb_strlen($buffer, '8BIT');
    } catch (Exception $e) {
        echo "$locale 8BIT failed\n";
    } catch (Error $e) {
        echo "$locale 8BIT failed\n";
    }
}
```

This can be tested using docker:
`docker run --rm -v $(pwd):/test php:7.0 bash -c 'apt update -qqq; apt install locales-all -y -qqq; php /test/test.php'`
@tuupke
Copy link
Contributor Author

tuupke commented Apr 21, 2021

Since this issue also occurs on older versions, it might be a good idea to apply the same change to older branches too. Especially since projects stuck on php7 will need to use the old(er) versions.

@SamMousa
Copy link
Collaborator

Direct this PR to the V4 branch please!

@tuupke tuupke changed the base branch from master to v4 April 22, 2021 08:52
@tuupke tuupke changed the base branch from v4 to master April 22, 2021 08:53
@tuupke
Copy link
Contributor Author

tuupke commented Apr 22, 2021

Having changed the PR without having checked the differences between v4 and master, it is probably easier to close this PR and I rebase that commit on the v4 and create a new one. Though it seems to me that current master is still experiencing this issue. Though tbf, I haven't looked into this thoroughly at all.

@SamMousa
Copy link
Collaborator

Current master is a WIP for v5 and it definitely also has this issue. But v4 is in a releasable state so if you direct it there I will merge and cherry-pick to master later.

@tuupke
Copy link
Contributor Author

tuupke commented Apr 22, 2021

Makes sense, PR#236 created, closing this one.

@tuupke tuupke closed this Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants