Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funadmental Flaws inherent in the design of AtlasEngine #16132

Closed
AffluentOwl opened this issue Oct 11, 2023 · 5 comments
Closed

Funadmental Flaws inherent in the design of AtlasEngine #16132

AffluentOwl opened this issue Oct 11, 2023 · 5 comments
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@AffluentOwl
Copy link

AffluentOwl commented Oct 11, 2023

Windows Terminal version

No response

Windows build number

No response

Other Software

No response

Steps to reproduce

  1. Install an OpenType font using contextual alternates like Numderline. https://thume.ca/numderline/
  2. Set the terminal font to this font
  3. Type 123456 in the terminal

Expected Behavior

The underlines properly display under digits in the thousands places as configured by the font.

Actual Behavior

The text displays as 123456 with no underlines.

@AffluentOwl AffluentOwl added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Oct 11, 2023
@AffluentOwl
Copy link
Author

AffluentOwl commented Oct 11, 2023

Correct me if something has changed in the implementation, but the general concept underlying AtlasEngine is that a series of Unicode points (extended grapheme cluster). The basic assumption inherent to the design is that Unicode points far away, cannot affect the rendering of other Unicode points. However, this just isn't the way Unicode/OpenType text rendering is specified.

Some exceptions with some brief research:

  1. The font for example shows the case when OpenType GSUB tables allow a number in the thousands position to receive an underline, but the exact same number in the ones position does not. And in particular, this example shows how even standard ASCII characters cannot be special cased along a fast rendering path.
  2. While this might be considered a stylistic enhancement in English, this feature is required in languages like Arabic, but because that's a script language and entire words turn into a single grapheme this might not show bugs easily in the current implementation, however, I suspect with more careful investigation the underlying issue could be found in another language (or perhaps even in Arabic) where two graphemes depend upon each across a grapheme boundary.
  3. In the general case, OpenType tables provide a Turing complete language which can operate over strings of infinite size, so no code point in a string is safe to be pre-shaped or shaped in isolation.
  4. Consider that SVG's can be stored in OpenType fonts, including animations, which prevents static pre-rendering.
  5. Far reaching unicode points like U+206E (national digit shapes), which affect all glyphs which appear after it in a run, and would override the nominal number 1 glyph, with the digits for that language, like the special glyphs in arabic(and many other languages) for the number. These act a lot like ANSI colors in that they can appear anywhere in a document and affect random spans. And really any character in Unicode Chapter 4.12 Characters with Unusual Properties - Complex expression format control (scoped). Especially the Bidirectional Ordering Controls Chapter and Stateful Format Controls 23.2 + 23.3.

Input

U+202E THIS IS A TEST 123 U+202C 789 

Becomes

‮ THIS IS A TEST 123 ‬ 789 

The crux of the issue is that the concept of AtlasEngine fundamentally violates the documented requirements of Uniscribe to operate on "entire paragraphs" of text at a time, as the smallest possible unit. Some of these issues likely can't be fixed without rewriting Uniscribe. Others might be worth fixing by rewritting small parts of Uniscribe outside of it (like range tracking the RTL stack as colors are done now). And this will be an ongoing issue, where for each version of Unicode that is released, will need to be reviewed for new exceptions the Terminal needs to implement rather than being transparently taken care of by Uniscribe.

However, if the performance benefits are deemed worth the non-conformance with OpenType / Uniscribe for certain scenarios -- it should be documented exactly what features are missing from the Windows Terminal's custom implementation of OpenType and Uniscribe so users can make an informed decision if AtlasEngine is best for their use case or if their use case demands higher correctness.

[1] https://unicode.org/reports/tr29/
[2] https://blog.janestreet.com/commas-in-big-numbers-everywhere/
[3] https://litherum.blogspot.com/2019/03/addition-font.html
[4] https://colorfonts.langustefonts.com/howto.html
[5] https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf
[6] https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf
[7] https://learn.microsoft.com/en-us/windows/win32/intl/displaying-text-with-uniscribe

@zadjii-msft
Copy link
Member

Huh, interesting...
image

Can you share what version of the Terminal you're using, and your settings.json file? We're pretty sure this is supposed to work 😄

@zadjii-msft zadjii-msft added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Oct 11, 2023
@DHowett
Copy link
Member

DHowett commented Oct 11, 2023

(For the rest of your notes that don't pertain specifically to Numderline but to Unicode clustering, shaping, and our compliance as a whole, thanks for writing them up so concisely! We'll need to wait until @lhecker is back from his time off before we have a comprehensive response though.)

@AffluentOwl
Copy link
Author

AffluentOwl commented Oct 11, 2023

Sorry I wasn't able to come up with a good minimal repro yet of the purest form of what I wanted to demonstrate, as I was running into other bugs (design choices?). Playing with this more, I think that the current implementation seems to be turning real lines from the file into psuedo-lines, that break on N bytes of data instead of N glyphs of data (or some measured width). This completely breaks the rendering in the middle of glyphs. (regardless of AtlasEngine)

1 - Line Break on Bytes

1..100 | ForEach-Object { Write-Host "a" -NoNewLine }; 1..10 | ForEach-Object { Write-Host "`u{0364}`u{0365}" -NoNewLine }

Actual

image

Expected

Notepad
image

Edge/Chrome/Harfbuzz
image

Word/Uniscribe (Clips to line extents, but renders vertically)
image

Breaking the combining glyph is definitely undesirable, but it's stacking all the combining marks on top of each other is due to some flag passed to Uniscribe, perhaps designed to constrain line height, but Word shows it could be changed to chrome style rendering with anti-aliased text + transparency.

2 - No Unicode Line Breaking

The next related issue is that the line breaks do not use anything close to the Unicode Line Breaking rules. So this happens:

1..10 | ForEach-Object { Write-Host "111000" -NoNewLine }; Write-Host " " -NoNewLine; 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine

Actual

image

Expected

Notepad
image

I think this could be more arguably justified, or perhaps given as an option to users to use proper Unicode line breaking or not. But the main issue which becomes obvious is that the underlines no longer underline the expected sets of 3 digits.

3 - Irreversible window resizes

Also the way lines attempt to be recombined when resizing the window feels quite janky if the user has no scrollback buffer, because as the user widens and narrows the window, they lose their data, as the resize operation is not isomorphic. To me, the notion of a true logical line understood by the system would feel more natural. The user has no way to guarantee they can scroll back, since they might not be able to control if 1 long line consumes their whole 1000 lines of scrollback buffer.

4 - Scoped Control Characters

I tested with the RTL override and it didn't seem supported at all by the terminal. But these seem like a pretty scary / open question.

Write-Host "`u{202E}ABC`u{202C}_`u{202E}" -NoNewLine; 1..100 | ForEach-Object { Write-Host "ABC" -NoNewLine }

Actual

image

Expected

Notepad
image

U+206E (National Digit Shapes) also seems to be ignored. Requires changing Control Panel -> Regional Format -> Arabic (Saudi Arabia).

Write-Host "1234567890 `u{206E}1234567890"

Actual

image

Expected

Notepad
image

5 -Wide Spanning OpenType lookup tables

So related to the first repro example I gave doesn't hold up as the assumption I wrote about the engine doesn't seem to be true at the moment (but perhaps that's the next step in the works?), I think that the terminal currently gets lucky that it has not yet implemented #1860 with support for infinitely wide lines, because that will open the full extend of this bug, assuming whole lines need to be shaped all at once and will sometimes be too big to all be in memory/processed at once.

But I think it is reasonable for users to expect paragraphs of text they output on the terminal to still support contextual alternates (like Numderline) and other shaping within their paragraph (long line in this case) without the terminal injecting its own formatting / breaking the user's formatting.

I'd argue this is a quite common occurrence, more than the fist glance 1 of 80 characters is a forced line break make it a rate 1.25% occurrence per line, as users with small or resized / actively resizing windows they will go through every size and hit all of those edge cases.

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Oct 11, 2023
@zadjii-msft zadjii-msft added the Needs-Discussion Something that requires a team discussion before we can proceed label Oct 18, 2023
@lhecker
Copy link
Member

lhecker commented Oct 24, 2023

Sorry for responding late. I forgot to set myself a reminder for responding to this.
Allow me to respond to each of the 5 points above in order:

1 - Line Break on Bytes

That issue is fixed in the latest AtlasEngine version in Windows Terminal 1.18 and later:
image

2 - No Unicode Line Breaking

That is unfortunately something we do intentionally. Terminals traditionally do not adhere to many parts of the Unicode spec since they were designed before Unicode was a thing. For instance, vim has a built-in functionality to print Hebrew/Arabic text in reverse, because it expects that the hosting terminal doesn't support RTL overrides/detection. The same is true for line breaks and it's traditionally expected that proper word-wise line breaks don't exist. But we don't properly support grapheme clusters either and that's making the issue worse than it should be. We're tracking this with #8000 and I'm actively working on implementing grapheme cluster support right now (and have been for a while).
Additionally, a new Unicode working group is currently forming to discuss these issues and it's possible that in the future we might have a Unicode spec that specifies how this should be handled.
After we got grapheme cluster support, we could consider adding support for TR14 line breaks, but I would not be in favor of implementing it right away, because I suspect it would not be widely used at all (most terminals don't implement anything like that either after all) and thus not be worth maintaining.

3 - Irreversible window resizes

We're tracking this at #15976. It'll unfortunately take a while to get this addressed.

4 - Scoped Control Characters

RTL overrides are tracked in #12711. I'll look into the U+206E support.

5 - Wide Spanning OpenType lookup tables

I'm not entirely sure I understand you there... Are you saying we should shape entire lines of text at time, without the terminal breaking them into lines to fit them into the viewport width? (This might be difficult to achieve due to the previous "No Unicode Line Breaking" point.)


All in all, none of the above are related to AtlasEngine specifically yet, apart from the U+206E support. We could open smaller, more specific issues instead.

@carlos-zamora carlos-zamora added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Attention The core contributors need to come back around and look at this ASAP. labels Oct 25, 2023
@zadjii-msft zadjii-msft removed the Needs-Discussion Something that requires a team discussion before we can proceed label Oct 30, 2023
@microsoft microsoft locked and limited conversation to collaborators Nov 1, 2023
@carlos-zamora carlos-zamora converted this issue into discussion #16252 Nov 1, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
Development

No branches or pull requests

5 participants