Skip to content

Improve console.lua item width heuristic with term_disp_width#17773

Merged
kasper93 merged 2 commits into
mpv-player:masterfrom
afishhh:console-menu-wcwidth
Apr 21, 2026
Merged

Improve console.lua item width heuristic with term_disp_width#17773
kasper93 merged 2 commits into
mpv-player:masterfrom
afishhh:console-menu-wcwidth

Conversation

@afishhh
Copy link
Copy Markdown
Contributor

@afishhh afishhh commented Apr 18, 2026

player/{lua,javascript}: add mp.utils.terminal_display_width function

console.lua: use terminal_display_width when picking longest menu item

Previously this function would just count bytes which was very easily
broken by incantations like OK which take up 52 bytes while being just
2 normal width characters visually.

terminal_display_width correctly counts OK as 2 visual characters.

Note: GitHub strips all these in their web UI but there are many Unicode
combining characters after each letter of the OKs above that produce
a "cursed" text effect.

Fixes #17772


Considerations:

  • bikeshed the function's name?
  • term_disp_width skips ESC [ sequences and handles tabs in a particular way which probably doesn't match ASS so those may break this somewhat

Alternatives:

  • Expose just grapheme segmentation instead (more effort)

@na-na-hi
Copy link
Copy Markdown
Contributor

This is better implemented as a native function rather than a command to avoid command calling overhead. The whole reason of doing this in the first place is for performance, and this calculation will be potentially done on lots of items.

@guidocella
Copy link
Copy Markdown
Contributor

This changes the time to calculate the longest item in the property list from 50 microseconds to 0.2 seconds.

@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 19, 2026

This changes the time to calculate the longest item in the property list from 50 microseconds to 0.2 seconds.

Oh, I assumed commands were fast enough for this since escape-ass is also a command.

This is better implemented as a native function

Native function? Like in player/{javascript,lua}.c's utils_fns? Or is there some other mechanism I'm not aware of.

@guidocella
Copy link
Copy Markdown
Contributor

Yeah but escape-ass is only called on the few visible items. I don't know if a lua.c would be fast enough either for all items.

@guidocella
Copy link
Copy Markdown
Contributor

The easiest fix is to use prompt .. ("a"):rep(9) .. "\n" .. longest_item in width_overlay.data. Then it will be at least as long as the prompt + aaaaaaaaa. And you can always scroll horizontally with shift + left/right.

@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 19, 2026

This changes the time to calculate the longest item in the property list from 50 microseconds to 0.2 seconds.

On my system the current approach brings up the time on properties to ~0.1s, putting the iteration itself in a command gets it down to 0.025s but I wouldn't be surprised if most of that is still overhead from copying the whole list and strings.

Probably the only way to make the call fast enough is to operate on Lua strings without copying them which may a bit too arcane.

The easiest fix is to use prompt .. ("a"):rep(9) .. "\n" .. longest_item in width_overlay.data. Then it will be at least as long as the prompt + aaaaaaaaa. And you can always scroll horizontally with shift + left/right.

Well that's a little bit of a disappointing fix, grapheme segmentation can definitely be fast enough so it's unfortunate to give up because of call overhead.

Still definitely better than having the menu be like 30 pixels wide but even in my original case it results in ugly barely cut-off lines:
image

@guidocella
Copy link
Copy Markdown
Contributor

guidocella commented Apr 19, 2026

On my system the current approach brings up the time on properties to ~0.1s, putting the iteration itself in a command gets it down to 0.025s but I wouldn't be surprised if most of that is still overhead from copying the whole list and strings.

Yeah it's similar for me if I build with optimizations.

local str = 'どっちもこっちも\n'

local s = mp.get_time()
for i = 1, 10000 do
    mp.command_native({'terminal-display-width', str})
end
print('10k calls: ' .. mp.get_time() - s)

str = str:rep(10000)

s = mp.get_time()
mp.command_native({'terminal-display-width', str})
print('1 call with 10k items: ' .. mp.get_time() - s)

10k calls: 0.177s
1 call with 10k items: 0.015s

Doing the measurement within the command with 10k concatenated items:
mp.command_native({'terminal-display-width', ('どっちもこっち\n'):rep(10000)}): 0.017s

Calling a modified command that calls term_disp_width 10k items with 1 item:
mp.command_native({'terminal-display-width', 'どっちもこっち\n'}): 0.019s

So the overhead is from multiple commands, not from copying strings.

EDIT: I enabled optimization but still had ASan enabled here lol, it will be faster without it.

@kasper93
Copy link
Copy Markdown
Member

kasper93 commented Apr 19, 2026

This is better implemented as a native function rather than a command to avoid command calling overhead. The whole reason of doing this in the first place is for performance, and this calculation will be potentially done on lots of items.

I agree. commends are mostly for users to use, here we expose C function for our internal use. No need to go through command.

Oh, I assumed commands were fast enough for this since escape-ass is also a command.

Arguably, it could also be native function. It's not however called in hot loops, unlike the term_disp_width. Also escape-ass is more generic and can be useful for external scripts too. While disp-width is pretty specific to our usecase. Either way, maybe we can change that.

Generally there is reluctance to add native entrypoints, but I feel like lua integration in mpv is almost core level at this point, so we shouldn't be scared about adding internal entry points if it helps make code clearer or better.

@avih
Copy link
Copy Markdown
Member

avih commented Apr 19, 2026

This is better implemented as a native function rather than a command to avoid command calling overhead. The whole reason of doing this in the first place is for performance, and this calculation will be potentially done on lots of items.

I agree. commends are mostly for users to use, here we expose C function for our internal use. No need to go through command.

I disagree. I think that watever is exposed to builtin scripts should also be exposed to 3rd party scripts.

So that we don't end up with scripts which only we can implement and 3rd party authors can't.

No comment on what's the best solution is on this specific issue, but strong opinion that we should not use hidden APIs in internal scripts.

EDIT: and non-script clients too, although I don't know how much this specific function would be useful to non-script clients, but this should be the rule that we try to follow.

@guidocella
Copy link
Copy Markdown
Contributor

I just realized this but we can simply skip the calculation and use the whole window width if there are many items. The property list and history always use the whole window anyway.

@kasper93
Copy link
Copy Markdown
Member

I disagree. I think that watever is exposed to builtin scripts should also be exposed to 3rd party scripts.

No one says otherwise.

Comment thread player/command.c Outdated
char *text = cmd->args[0].v.s;

const unsigned char *cut_pos;
int width = term_disp_width(bstr0(text), INT_MAX, &cut_pos);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to do this then it should also accept an optional max_width argument so it can exit early when the limit is hit instead of hardcoding INT_MAX.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't hurt but I'm not sure whether there are valid use-cases for this. At least not without also exposing cut_pos.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the menu specifically, you can pass the osd width and if any of the items hit it we can stop processing further items, and particular item that hit the limit won't be needlessly processed fully.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many characters fit depends on the OSD width, font family and size though. The idea is to calculate the terminal width of all items and then use compute_bounds only on the longest item to get the accurate variable font width. I don't know if you can approximate the max needed width and end early based on that.

Ending early for individual items is not important because we already truncate huge lines when the menu is opened, originally because json-subprocess-result made libass completely freeze mpv.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to omit this functionality until a valid use-case for it is found, an optional argument can always be added later (though it'd have to be a bit funny and return an object with cut_pos too I guess).

@afishhh afishhh force-pushed the console-menu-wcwidth branch from 43adbcb to 3bb9e6d Compare April 19, 2026 19:27
@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 19, 2026

Re-inplemented this as a function in mp.utils.

It now takes <10ms when listing properties which is definitely better.

Did forget to add an interface-changes file, will do that in next push (perhaps when it's decided whether we want to expose cutting too).

Also I think the thing about it incorrectly returning 8 was just me lying, or maybe measured something different (will check later whether I can reproduce a wrong width, perhaps I measured with ytsubconcerter ZWSP junk which would still make the result wrong).

Copy link
Copy Markdown
Member

@kasper93 kasper93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@guidocella
Copy link
Copy Markdown
Contributor

It takes 6ms for 11k properties now. Nice.

@verygoodlee
Copy link
Copy Markdown
Contributor

context_menu.lua has a similar width calculation function, should also apply utils.terminal_display_width to it.

local longest = ""
for _, title in pairs(titles) do
if #title > #longest then
longest = title
end
end

@verygoodlee
Copy link
Copy Markdown
Contributor

verygoodlee commented Apr 20, 2026

Note that term_disp_width doesn't accpet invalid input, it always returns 0 when encountering invalid utf8 characters.

mpv/misc/codepoint_width.c

Lines 679 to 683 in 1be21a3

int cp = bstr_decode_utf8(str, &str);
// Stop processing on any invalid input
if (cp < 0)
return 0;

In this case, the 2nd item is actually much wider than the 1st item, but its calculation result is 0 due to invalid character.

local input = require 'mp.input'

mp.add_forced_key_binding('a', function()
    input.select({
        prompt = 'Foobar:',
        items = {
            ('A'):rep(20),
            (''):sub(1, 1) .. ('A'):rep(200),
        },
    })
end)
2026-04-20 104627

@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 20, 2026

Note that term_disp_width doesn't accpet invalid input, it always returns 0 when encountering invalid utf8 characters.

The correct behavior here is probably to make it consider maximal subparts of ill-formed sequences to have a width of 1, under the assumption the terminal will replace them with a U+FFFD replacement character (this is the behavior kitty docs recommend, not sure how universal it is). It shouldn't bail out on invalid UTF-8, setting term-status-msg to \xffaaaaaaaaaaa... with a sufficient amount of a's to make it wrap will break the display because of this.

So this is something that should be fixed in term_disp_width (one thing I notice is that it'd require bstr_decode_utf8 to be able to return a next offset even if it fails to decode a codepoint).

context_menu.lua has a similar width calculation function, should also apply utils.terminal_display_width to it.

Will apply it to that later, thanks.

afishhh added 2 commits April 20, 2026 22:56
Previously this function would just count bytes which was very easily
broken by incantations like `OK` which take up 52 bytes while being just
2 normal width characters visually.

`terminal_display_width` correctly counts `OK` as 2 visual characters.

Note: GitHub strips all these in their web UI but there are many Unicode
combining characters after each letter of the `OK`s above that produce
a "cursed" text effect.

Fixes mpv-player#17772
@afishhh afishhh force-pushed the console-menu-wcwidth branch from 3bb9e6d to f16257b Compare April 20, 2026 20:57
@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 20, 2026

Added a file to interface-changes and adapted the snippet pointed out by @verygoodlee.

The ill-formed UTF-8 issue still exists but it should probably be addressed in a separate PR.
It seems like a pretty rare case (admittedly #17772 is also pretty rare but I'd expect it's more common than invalid UTF-8, unless I'm wrong and someone genuinely has a case where invalid UTF-8 ends up in one of these menus).

@na-na-hi
Copy link
Copy Markdown
Contributor

admittedly #17772 is also pretty rare but I'd expect it's more common than invalid UTF-8

If mpv is not built with uchardet then invalid UTF-8 is pretty common. Many music files have metadata encoded in GB18030 and SHIFT_JIS.

@kasper93
Copy link
Copy Markdown
Member

admittedly #17772 is also pretty rare but I'd expect it's more common than invalid UTF-8

If mpv is not built with uchardet then invalid UTF-8 is pretty common. Many music files have metadata encoded in GB18030 and SHIFT_JIS.

That's true, we can resolve this in next PR.


There is no documentation for mp.utils.terminal_display_width, but it being mostly internal function, we can leave it as that.

@verygoodlee
Copy link
Copy Markdown
Contributor

It seems like a pretty rare case (admittedly #17772 is also pretty rare but I'd expect it's more common than invalid UTF-8, unless I'm wrong and someone genuinely has a case where invalid UTF-8 ends up in one of these menus).

The string slicing function string.sub() in Lua is based on bytes, If the author of 3rd-party scripts does not notice this, they might cut off the string at wrong position, which has even occurred in 1st-party script #15944.

@afishhh
Copy link
Copy Markdown
Contributor Author

afishhh commented Apr 21, 2026

master...afishhh:mpv:partially-ill-formed-wcwidth implements handling of ill-formed UTF-8 in term_disp_width.

I can submit that first if illegal UTF-8 is such a concern.

@kasper93 kasper93 merged commit ac3b604 into mpv-player:master Apr 21, 2026
30 checks passed
afishhh added a commit to afishhh/mpv that referenced this pull request Apr 21, 2026
Previously the function just bailed on invalid input, this instead makes
it count how many replacement characters would be shown by a terminal
complying with the Unicode specification's recommendation here:
https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G66453

Fixes mpv-player#17773 (comment)
@afishhh afishhh deleted the console-menu-wcwidth branch April 21, 2026 17:32
kasper93 pushed a commit that referenced this pull request Apr 21, 2026
Previously the function just bailed on invalid input, this instead makes
it count how many replacement characters would be shown by a terminal
complying with the Unicode specification's recommendation here:
https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G66453

Fixes #17773 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calculated select menu width too small with many Unicode combining characters

7 participants