Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give API to measure the space that a string occupies #218

Open
be5invis opened this issue Jul 8, 2018 · 12 comments
Open

Give API to measure the space that a string occupies #218

be5invis opened this issue Jul 8, 2018 · 12 comments
Labels
Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Conhost For issues in the Console codebase
Milestone

Comments

@be5invis
Copy link

be5invis commented Jul 8, 2018

This is an extension to #57.
Under a certain console/PTY, assume the font family/size is specified, give a string, and return the space (a bit mask of the character matrix?) it would occupy.

@zadjii-msft zadjii-msft added Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Conhost For issues in the Console codebase labels Jul 9, 2018
@zadjii-msft
Copy link
Member

IIRC, determining the width of a string is a pretty hard problem actually. There are all sorts of crazy Unicode edge cases to handle, there are some assumptions we make in the code manually (eg box-drawing chars are single-width always).

Adding @adiviness as he has been working in that area quite a lot.

I'd really doubt that we'd be adding another API to conhost. Is there an equivalent API on *nix that we could use for inspiration?

@be5invis
Copy link
Author

box-drawing chars are single-width always

It is false for CJK languages since 1980s using some fonts, but true for other fonts (like my Sarasa Gothic).

@be5invis
Copy link
Author

I have seen multiple libraries trying to "guess" the actual width of a string, like

https://github.com/martinheidegger/varsize-string
https://github.com/sindresorhus/widest-line
https://github.com/sindresorhus/string-width

If we have an accurate API then it would greatly help people writing console applications.

@vapier
Copy link

vapier commented Jul 21, 2018

on the unix side, there isn't a great API for terminal emulators. the closest are the wcwidth and wcswidth functions. they effectively operate on code points.

most programs (both terminal emulators and editors/tools) tend to just use wcwidth which mostly works OK as long as you stick to the simpler things: code points that are unambiguously narrow (1) or wide (2) or unprintable (-1), or you assume combining characters (0) always "attach" to the previous printable codepoint. anything involving zero width joiners is out the window, as are any scripts involving more complicated rules, or RTL scripts.

common examples of complicated rules:

  • ತ್ಯ is ತ್ (U+0CA4 U+0CCD) followed by ಯ (U+0CAF). wcwidth would count those as 1/0/1 (which is correct), but when considered together, it should be 1.
  • ఫ్ట్వే is ఫ్ (U+0C2B U+0C4D) followed by ట్ (U+0C1F U+0C4D) followed by వే (U+0C35 U+0C47). wcwidth would count those as 1/0/1/0/1/0 (which is correct), but when considered together, it should be 1.

wcswidth should be able to calculate the right answer, but i don't think most implementations handle these graphemes correctly either.

the original question was about the rendering box needed for a particular grapheme in a particular font. this shouldn't matter, but in practice, a lot of fonts (including monospace ones) aren't consistent in their widths/heights. they can be narrower or wider than a single cell requiring manual intervention to center/scale them in the respective cells. freetype/fontconfig are the standard font related libraries in the unix world for rendering.

along those lines, wide-characters (i.e. CJK) should be taking up two cells even if the font gets it wrong. otherwise you easily run out of sync with the console's idea of cursor location and the remote application's idea of cursor location. i grok that this might be a fundamental limitation in the existing Windows console code and is not trivial to resolve.

hth.

@JFLarvoire
Copy link

JFLarvoire commented Aug 31, 2018

+1 for having such a function.
I have a command-line tool called dirc.exe for comparing directories side by side, and the column alignment breaks if file names contain 0-width characters or double-width ideograms.
I tried using open-source versions of wcswidth(), but this is not fool-proof: The Windows console does not always size characters as expected by that function. What we really need is the console itself telling us what it will display.
And contrary to what @zadjii-msft is said in his comment above, I think this is easy to do: The console can simply write the string into a private hidden screen buffer (using the very same code it uses for displaying it in the visible buffer), and report how much the hidden buffer cursor moved.

@kghost
Copy link

kghost commented Oct 15, 2018

The most important thing is not how you measure the width, it is important that the measurement of terminal app and console app agree with each other. When the width doesn't match, it will mess up all ncurses apps or tmux/screen.

So instead of providing another platform dependent function, I strongly suggest using a widely used library like utf8proc (with this patch) to determine charactor width. It follows the Unicode standard mostly.

And there are characters with situational width, depending on locales. Make sure you app can handle this or just use the library.

@miniksa miniksa added the Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. label Jan 18, 2019
@miniksa miniksa added this to the Backlog milestone Jan 18, 2019
@be5invis
Copy link
Author

@kghost @miniksa
What we finally need is a proper support for text shaping in the console, which is not a well-studied area. Its concept may be close to justification, which is another not-well-studied-area...
But, how about let the terminal (that uses ConPTY) to guide Conhost how to associate cells with text runs? Then, they can have arbitrary-complex rendering/layout/justification. Just imagine, complex script like Indic scripts worked in Console!

@zadjii-msft
Copy link
Member

From @alabuzhev in #10592

It is not uncommon for text mode apps to organise and display data in a table-like way with multiple columns. To do so, for any arbitrary string an app should be able to calculate its visible length, i.e. the number of screen cells occupied, and truncate it or append with spaces to fit into the desired column.

Historically the most popular way to do so is just take the string size is characters, e.g. string.size() (here and below "character" means wchar_t), assuming that each character occupies exactly one cell. It is extremely easy and for the USA and Europe it usually "just works". Except when it doesn't. Sooner or later unusual characters go slipping through the cracks and that assumption goes out with a bang: the rendered string is actually longer (or shorter) than expected and all the following characters are shifted. And all the following lines as well. Oops. You've probably seen that already somewhere.

So, to make sure that everything works even with unusual characters, apps need to do something smarter and treat different characters differently. There are ways to do that, e.g. using external libraries or Unicode data directly. There's one, just one small problem with that approach: text mode apps don't and can't render anything directly. The actual rendering happens in a different process in an unpredictable way: the number of occupied cells could depend on the OS version, the console host, the console mode, the API used, the output codepage, the active font, the colour of the character (yes), and so on and so forth.

In other words, to do the right thing, it's not enough to fully support Unicode and take into account character widths, grapheme clusters etc. An app needs to ask itself "what would renderer do?" first. And it's not exactly trivial to find out. Even the methods that worked in the past, e.g. checking the OS version or querying the console font, are now deprecated and either don't work without advanced magic or don't work at all in Terminal.

So, are there any reasonable ways / recommendations to predict the renderer behaviour and say for sure "if I print this particular string, the cursor will move exactly N characters to the right"? (not even to mention RTL, that's a different PITA).

@viktor-podzigun
Copy link

fyi: there is new Unicode Terminal Complex Script Support, or TCSS proposal

@tig
Copy link

tig commented Jan 23, 2024

fyi: there is new Unicode Terminal Complex Script Support, or TCSS proposal

@DHowett-MSFT - I'm super interested in helping with this. At the minimum, you can count on Terminal.Gui as being a test case. Please feel free to reach out (tig (at) kindel (dot) com).

@zadjii-msft
Copy link
Member

fyi: there is new Unicode Terminal Complex Script Support, or TCSS proposal

In fact, @DHowett is listed on the author line of that proposal 😅

@j4james
Copy link
Collaborator

j4james commented Jan 23, 2024

Note that there's also Contour's Unicode Core proposal, which has already been adopted by a number of other terminals, and at least one application that I'm aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

No branches or pull requests

9 participants