Skip to content

misc/codepoint_width: handle partially ill-formed UTF-8#17792

Merged
kasper93 merged 1 commit into
mpv-player:masterfrom
afishhh:partially-ill-formed-wcwidth
Apr 21, 2026
Merged

misc/codepoint_width: handle partially ill-formed UTF-8#17792
kasper93 merged 1 commit into
mpv-player:masterfrom
afishhh:partially-ill-formed-wcwidth

Conversation

@afishhh
Copy link
Copy Markdown
Contributor

@afishhh afishhh commented Apr 21, 2026

Previously the function just bailed on invalid input, this instead makes it count how many replacement characters would be shown by a terminal complying with the Unicode specification's recommendation here: https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G66453

Fixes #17773 (comment)


I wasn't sure if it's better to go through all the call sites of bstr_decode_utf8 to adjust them to this bstr * change so I just split out an inner function, but there's like <10 call sites so could also change the behavior of bstr_decode_utf8 itself.

Now whether this is correct: it seems to work and the examples from the spec pass, I also found https://hsivonen.fi/broken-utf-8/ while researching and that post links a test page that I quickly converted into assertions for the test here: afishhh@a7d4080. Those also pass but I left them out since there's a lot of them and thought it might be overkill.
My terminal (kitty) also passes these tests, don't know about others.

Can also confirm \xff + aaaaaaaaaaaaaaa... in term-status-msg no longer fills the terminal with junk.

Previously the function just bailed on invalid input, this instead makes
it count how many replacement characters would be shown by a terminal
complying with the Unicode specification's recommendation here:
https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G66453

Fixes mpv-player#17773 (comment)
Copy link
Copy Markdown
Member

@kasper93 kasper93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@kasper93 kasper93 merged commit 7ec11ad into mpv-player:master Apr 21, 2026
29 checks passed
@afishhh afishhh deleted the partially-ill-formed-wcwidth branch April 22, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants