Skip to content

Fix UKRAINIAN LETTERS I in CP866 #11213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fix UKRAINIAN LETTERS I in CP866 #11213

wants to merge 1 commit into from

Conversation

vstavskyi
Copy link

No description provided.

@nielsdos nielsdos requested a review from alexdowad May 10, 2023 08:25
@nielsdos
Copy link
Member

Thanks for your contribution.
Please note that you need to target the lowest supported bugfix branch as a target. So in your case that means the branch named PHP-8.1, not PHP-8.0.28 (also not PHP-8.1.x, just PHP-8.1). We can then merge it upwards to all versions that need the patch.

@vstavskyi vstavskyi closed this May 10, 2023
@youkidearitai
Copy link
Contributor

As PHP 8.1 or newer, replaced to ext/mbstring/libmbfl/filters/mbfilter_singlebyte.c.

@vstavskyi
Copy link
Author

@youkidearitai
More complicated patch is needed for mbfilter_singlebyte.c

@youkidearitai
Copy link
Contributor

I have a question. I'm sorry if irrelevant. Because I don't know well to Ukrainian.
Code page 866 seems isn't include U+0406 and U+0456 from Wikipedia CP866 page.

I saw Ukrainian and Belarusian variants, calls CP1125, CP866U and CP866NAV or RUSCII. These character code include U+0406 and U+0456.
Unfortunately, PHP mbstring isn't include these character codes.

Should we add these characters codes?

@vstavskyi
Copy link
Author

vstavskyi commented May 10, 2023

Ukraininan letters i looks like Latin letters i
but they're missing in CP866
Wikipedia Ukrainian alphabet

$s = "\u{0406}\u{0456}";
mb_convert_encoding($s, 'CP866');

result 0x3F 0x3F
with my patch result 0x49 0x69

@youkidearitai
Copy link
Contributor

Ukrainian letters i looks like Latin letters i but they're missing in CP866

@vstavskyi You said that "CP866 not convert Ukrainian letters i". However, character encoding is based on character set. CP866 is not include Ukrainian letters i. Therefore, that means can't convert to Ukrainian letters i.

I think CP866 is incomplete character encoding judging from Ukrainian.

@vstavskyi
Copy link
Author

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants