Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPP incorrect display unicode character with circle or gap #14822

Open
zev-zakaryan opened this issue Mar 2, 2024 · 3 comments
Open

NPP incorrect display unicode character with circle or gap #14822

zev-zakaryan opened this issue Mar 2, 2024 · 3 comments

Comments

@zev-zakaryan
Copy link

Description of the Issue

Incorrect display unicode character with circle (dummy consonant) and gap as in image.
Example in Thai language of NPP (above) compare to Windows notepad (below).

  • NPP side have circle as dummy consonant when " ้" vowel properly follow "ข", also with " ู" and "ม" inconsistently.
  • There're some gap and overlap area.
    NotepadPP

Steps to Reproduce the Issue

  1. Just copy this example text "หาข้อมูล" and paste a lot
  2. Finish step

Expected Behavior

It should be shown consistency as in Windows notepad, no circle or gap as highlighted in NPP side

Actual Behavior

There're circle or gap as highlighted in NPP side.

Debug Information

Notepad++ v8.6.2 (64-bit)
Build time : Jan 14 2024 - 02:16:00
Path : D:\npp\notepad++.exe
Command Line :
Admin mode : OFF
Local Conf mode : ON
Cloud Config : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 1909
OS Build : 18363.1854
Current ANSI codepage : 1252
Plugins :
JSMinNPP (1.2205)
mimeTools (3)
NppConverter (4.5)
NppExport (0.4)

@mpheath
Copy link
Contributor

mpheath commented Mar 3, 2024

I can see this behavior also in SciTE so seems to be Scintilla related. Circles with upper or lower accent marker.
I notice the issue pasted inline and not using duplicate line with "หาข้อมูล" per line or separating "หาข้อมูล" with pipe | .
The columns of where the circles display is constant. Not in length but with the same column positions.

The diff from one column with a circle to the next shows a pattern.

col diff
100
199 99
364 65
463 99
628 65
727 99
892 165
991 99
1156 165
1255 99
1420 165
1519 99
1684 165
1783 99
1948 165
2047 99
2212 165
2311 99

Tried GDI and DirectWrite (1, 2, 3). Tried MS Sans Serif and Tahoma fonts which are supposed to be Thai language compatible .

Perhaps @nyamatongwe might be able to drop in here for a comment else you can post an issue over at Scintilla .

@nyamatongwe
Copy link

Scintilla breaks text into runs with a maximum length of 300 bytes. This was done because some platform APIs broke when given large pieces of text or took excessive time.

After a text segment exceeds 300 bytes it is divided into smaller segments of less than 100 bytes. This division occurs (in Document::SafeSegment), in order of priority, at (1) spaces or tabs; at (2) punctuation; or at (3) last whole code point. It doesn't look at richer language attributes like combining characters which appears to be the case here. It would be possible to implement the Unicode text segmentation algorithm or a subset.

Spaces or punctuation are commonly found in text, limiting the frequency of problems with the implementation.

@zev-zakaryan
Copy link
Author

zev-zakaryan commented Mar 3, 2024

@nyamatongwe I would like to point out that I found it with normal space case. It's just that I want to find easy reproducible step and this way I don't have to reveal my real data. But as it would be useful as reference, this is it:

NotepadPP2

It's only character 33 after space which is normal in Thai language (space is used to separated the sentence, not the word). By the way, I understand that it's related to Scintilla which NPP may cant do anything about it, just FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants