Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially Addresses #2616. Support combining sequences that don't normalize #2932

Merged
merged 21 commits into from
Oct 29, 2023

Conversation

BDisp
Copy link
Collaborator

@BDisp BDisp commented Oct 24, 2023

Partially addresses (but does not fix) #2616

Pull Request checklist:

  • I've named my PR in the form of "Fixes #issue. Terse description."
  • My code follows the style guidelines of Terminal.Gui - if you use Visual Studio, hit CTRL-K-D to automatically reformat your files before committing.
  • My code follows the Terminal.Gui library design guidelines
  • I ran dotnet test before commit
  • I have made corresponding changes to the API documentation (using /// style comments)
  • My changes generate no new warnings
  • I have checked my code and corrected any poor grammar or misspellings
  • I conducted basic QA to assure all features are working

@BDisp BDisp requested a review from tig as a code owner October 24, 2023 21:43
UnitTests/ConsoleDrivers/ContentsTests.cs Outdated Show resolved Hide resolved
UnitTests/ConsoleDrivers/ContentsTests.cs Outdated Show resolved Hide resolved
@tig
Copy link
Collaborator

tig commented Oct 25, 2023

PR to this PR incoming with suggested fixes to Unit Tests...

@BDisp BDisp marked this pull request as ready for review October 25, 2023 16:56
Copy link
Collaborator

@tig tig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge #2934 and see my comment.

I don't think this fix does what you think it does.

Terminal.Gui/ConsoleDrivers/ConsoleDriver.cs Outdated Show resolved Hide resolved
@BDisp
Copy link
Collaborator Author

BDisp commented Oct 25, 2023

@tig I already finished this.

@tig
Copy link
Collaborator

tig commented Oct 25, 2023

@tig I already finished this.

I don't think your test in TabView is working. Here's what it should look like:

image

Here's what I see:

image

Another test (in Unicode scenario):

sb.Append ('e');
sb.Append ('\u0301');
sb.Append ('\u0301');
testlabel = new Label ($"Should be an e with two accents: {sb}") { X = 20, Y = Pos.Y (label), Width = Dim.Percent (50), CanFocus = true, HotKeySpecifier = new Rune ('&') };
Win.Add (testlabel);

image

Like this:

image

Copy link
Collaborator

@tig tig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is working. See my comments.

UICatalog/Scenarios/TabViewExample.cs Outdated Show resolved Hide resolved
Terminal.Gui/Drawing/Cell.cs Outdated Show resolved Hide resolved
@tig
Copy link
Collaborator

tig commented Oct 25, 2023

Perhaps we shouldn't try normalizing at all? Like this:

image

Re: NetDriver - I suspect Console.Write can't handle combining marks.

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 25, 2023

Perhaps we shouldn't try normalizing at all? Like this:

There is no problem on the AddRune because on the UpdateScreen I can get the correct string in the output variable,

Re: NetDriver - I suspect Console.Write can't handle combining marks.

Yes I believe is that.

@tig
Copy link
Collaborator

tig commented Oct 25, 2023

Perhaps we shouldn't try normalizing at all? Like this:

There is no problem on the AddRune because on the UpdateScreen I can get the correct string in the output variable,

Do you have an objection to replacing the current code in this PR with the code I suggested above that removes the calls to normalize? (with modifications to deal with CombiningMarks being a list)

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 25, 2023

Do you have an objection to replacing the current code in this PR with the code I suggested above that removes the calls to normalize?

@tig did you already saw what it do? You are replacing the previous p

Do you have an objection to replacing the current code in this PR with the code I suggested above that removes the calls to normalize?

This is worst.

image

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 25, 2023

Absolute chaos.

image

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 25, 2023

@tig sorry but I don't have more head to handle with this. I think it's reasonable for now, attended to the difficult degree you added to this.

@BDisp BDisp marked this pull request as draft October 26, 2023 13:33
@BDisp
Copy link
Collaborator Author

BDisp commented Oct 26, 2023

Without normalize WindowsDriver returns as follow. Separated combining marks output as double ´´. See the dealignment at right.

Captura de ecrã 2023-10-26 164927

@BDisp BDisp marked this pull request as ready for review October 26, 2023 17:31
@tig
Copy link
Collaborator

tig commented Oct 26, 2023

I dove in deep this morning and I think I understand how all this should work much better. I composed a message on my home computer but didn't have a chance to finish my thoughts before I had to run. As soon as I get home I'll finish it.

The net is: We can make combining marks work on both NetDriver and WindowsDriver, but the current design of AddRune/AddStr/UpdateScreen will not work. In addition TextFormatter (and others) are doing things in such a way that will not work.

@@ -175,11 +175,22 @@ public void AddRune (Rune rune)
// Normalize to Form C (Canonical Composition)
string normalized = combined.Normalize (NormalizationForm.FormC);

Contents [Row, Col - 1].Rune = (Rune)normalized [0]; ;
if (Contents [Row, Col - 1].Rune != default && Contents [Row, Col - 1].Rune != (Rune)' ') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient. Many, many, characters cannot be combined with a combining mark, not just ' '. Try, for example [.

WT (and other platforms) have sophisticated algorithms for figuring out whether a combining mark can combine with a base character. I am trying to find more details on these to see what we can do.

You can easily test this in WT and PS with (Caskadia Cove Nerd Font):

image

In Unicode, not all base characters can be combined with combining characters. Whether a base character can be combined with a combining character depends on the rules defined in the Unicode Standard. The Unicode Consortium, the organization responsible for maintaining the Unicode Standard, specifies these rules.

Here are some general guidelines to determine if a base character can be combined with a combining character:

Character Composition Model: Unicode follows a character composition model, which means that some characters can be composed from a base character and one or more combining characters. This allows for a wide range of characters and diacritics to be represented in text.

Compatibility: Unicode defines compatibility characters and compatibility composition rules to ensure that combining characters work as expected with base characters. If a base character is defined as a "combining character target," it means that it can be combined with one or more combining characters.

Normalization Forms: Unicode defines normalization forms (NFC, NFD, NFKC, NFKD) to handle character composition and decomposition. NFC (Normalization Form C) is the most commonly used form for combining characters, and it ensures that text is represented in a composed form when possible.

Character Properties: Unicode assigns specific properties to each character, including whether it can be used as a base character and whether it can combine with combining characters. You can refer to the Unicode Character Database (UCD) to check the properties of individual characters.

Combining Class: Combining characters are assigned a "combining class" value, which determines their combining behavior. Base characters and combining characters are combined according to their combining class values.

Compatibility Decomposition: Some base characters can be combined using compatibility decomposition mappings, even if they don't have explicit combining characters. These mappings are defined in the Unicode Standard.

To determine whether a specific base character can be combined with a combining character, you can consult the Unicode Standard documentation, specifically the Unicode Character Database (UCD) and the Unicode Technical Reports related to normalization and composition. Additionally, Unicode-aware text processing libraries and programming languages often provide functions or methods for handling character composition and decomposition, making it easier to work with combined characters in text processing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tig I only had to do that because of the CharacterMap scenario using the space to separate the runes to avoid incorrect output. I don't have enough knowledge to do better.

@tig
Copy link
Collaborator

tig commented Oct 27, 2023

@tig try this code on WindowsDriver on the WriteToConsole method. It seems to support true color but I need help.

Happy to. May be a few days though.

I'm not seeing how this is relevant to combining sequences though. Can you explain?

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 27, 2023

No it's not. Forget.

@tig
Copy link
Collaborator

tig commented Oct 27, 2023

  • Combining Marks have GetColumns() == 0
  • TextFormatter.Draw is currently doing var runeWidth = Math.Max (rune.GetColumns (), 1); and thus incrementing the column when a combining mark is encountered.
  • AddRune always assumes a combining mark is 0 width and should be combined with the preceding Cell (unless Col == 0).
  • AtlasEngine (Console.Write("a\u0301\u0301")) DOES render multiple CMs correctly.

XaOqxdo 1

However, it ALSO moves the cursor to the right for each CM. Thus Console.Write("a\u0301\u0301\u0328x") results in:

image

Note, this is the same thing you see with echo "[au{0301}u{0301}x]":

image

If I SetCursorPostion() to the row/col to just to the right of the a, WT 'forgets' the CM.

In otherwords, internally AtlasEngine is holding each CM as a used column.

AtlasEngine also, treats individual CMs (Console.Write("\u0301")) as 1 column width glyphs if in the 1st column.

image

Unless, I'm missing something, until AtlasEngine changes, on Windows, we cannot really support NON-NORMALIZED combining marks in a way that users would expect. Any glyph made up of a base char and n or more combining chars will use base.ColumnWidth() + n columns.

I think for us to support combining marks correctly we need to do the following:

  • Always try to normalize. This at least lets the most common combining mark use-case (base + 1 CM) work in many cases.
  • Have ConsoleDriver.AddRune treat a combining mark as a 1 column wide glyph. E.g. if rune.IsCombiningMark() == true then ConsoleWidth == 1.
  • Have ConsoleDriver.AddStr enumerate the runes in str and add them to ConsoleDriver.Contents such that Cell.CombiningMarks is setup correctly (if str[0] is a CM, special case).
  • Drivers' UpdateScreen
    • treat any Cell.Rune.IsCombiningMark() == true as taking 1 column.
    • use Cell.Rune.CombiningMarks().Count > 0 to append the marks to output.

@tig
Copy link
Collaborator

tig commented Oct 27, 2023

Please see my PR to this PR: BDisp#162

I do not think #2616 can be fixed at this time. I think the best we can do is basically ignore CMs that can't be normalized. My PR to your PR does this.

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 27, 2023

Please see my PR to this PR: BDisp#162

I do not think #2616 can be fixed at this time. I think the best we can do is basically ignore CMs that can't be normalized. My PR to your PR does this.

Done @tig. Thanks.

@BDisp
Copy link
Collaborator Author

BDisp commented Oct 27, 2023

@tig why you write u301 twice here \u0301\u0301\u0328?

@tig
Copy link
Collaborator

tig commented Oct 27, 2023

@tig why you write u301 twice here \u0301\u0301\u0328?

It was just an example of 3 CMs.

@tig tig changed the title Fixes #2616. Support combining sequences that don't normalize Partially Addresses #2616. Support combining sequences that don't normalize Oct 27, 2023
@BDisp
Copy link
Collaborator Author

BDisp commented Oct 27, 2023

@tig why you write u301 twice here \u0301\u0301\u0328?

It was just an example of 3 CMs.

I see. For have sure it could be normalized. You don't know if there is 3 different combining marks that can normalize on terminal? It would be better.

@tig
Copy link
Collaborator

tig commented Oct 28, 2023

@tig why you write u301 twice here \u0301\u0301\u0328?

It was just an example of 3 CMs.

I see. For have sure it could be normalized. You don't know if there is 3 different combining marks that can normalize on terminal? It would be better.

\u0301\u0301\u0328 will not normalize! It results in

image

e\u0301\u0301\u0328 will not normalize. It results in:

image

You can pile on as many \u0301 as you want:

image

e\u0301 WILL normalize! AtlasEngine does NOT attempt to normalize (note used cell before ]. Also note the if you copy the glyph and paste it you get e\u301 back. If it were normalizing you'd get \u00e9 (Latin Small Letter E With Acute - é U+000e9).

image

@tig tig merged commit aa8b952 into gui-cs:v2_develop Oct 29, 2023
1 check passed
@BDisp BDisp deleted the v2_combining-normalize-fix_2616 branch October 29, 2023 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants