Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mb_substr() for correct abbreviation of non-ASCII characters #651

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

xalt7x
Copy link
Contributor

@xalt7x xalt7x commented May 21, 2024

When using substr() or another method to reduce a string to/by 1 byte, many UTF-8 characters are lost (displayed as � ). Switching to mb_substr() fixes this.

When using substr() or another method to reduce a string to/by 1 byte,
many UTF-8 characters are lost (displayed as � ). Switching to mb_substr() fixes this.
@xalt7x
Copy link
Contributor Author

xalt7x commented May 21, 2024

The problem is easily reproducible with Cyrillic/Ukrainian characters (e.g., "Джон Дое" as the User/Owner name, or "Навички обслуговування клієнтів" string for "Key Skills").

fix_cyrillic_abbreviation

Additional information:

If you’re working with strings encoded as UTF-8 you may lose characters when you try to get a part of them using the PHP substr function. This happens because in UTF-8 characters are not restricted to one byte, they have variable length to match Unicode characters, between 1 and 4 bytes.

@RussH
Copy link
Member

RussH commented Sep 16, 2024

Thanks @xalt7x !

@RussH RussH merged commit e7c1ab1 into opencats:master Sep 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants