Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exotic unicode characters show up as garbage if no other more common unicode characters are on the same line #5754

Closed
Lebon14 opened this issue Jun 7, 2019 · 10 comments

Comments

@Lebon14
Copy link

Lebon14 commented Jun 7, 2019

Description of the Issue

Notepad++ 7.7 will not insert exotic unicode / UTF-8 BOM correctly unless another, more common, unicode character is added on the same line.

Steps to Reproduce the Issue

  1. Copy this line in Notepad++ as a UTF-8 text file:

⑨Till You Know

It will show as a block.
2. Add this character at the end of the same line:

Expected Behavior

⑨ Character should show up as long as document is in UTF-8 regardless of what's on the line. I've also noticed this behavior with the ♡ and ⑥ character.

Actual Behavior

When ⑨ or ♡ are inserted without another unicode character on the line, it shows up as garbage character ("□").

Debug Information

Notepad++ v7.7 (64-bit)
Build time : May 19 2019 - 13:05:35
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS : Windows 7 (64-bit)
Plugins : DSpellCheck.dll mimeTools.dll NppConverter.dll PythonScript.dll

Comments

What's funny is that, despite showing up as "□", copy-pasting the character, you'll get the actual character when pasting...? I seriously don't know the scope of this; but it seems to be limited to very limited and exotic characters in the unicode space.

I dunno if it's related to this: ##5671

Here's a "GIF" of the issue:
https://gyazo.com/ba9402422dc6b32729b88c05bb9df161

@Lebon14 Lebon14 changed the title Exotic unicode characters show up as garbage if no other exotic unicode characters is on the same line Exotic unicode characters show up as garbage if no other more common unicode characters is on the same line Jun 7, 2019
@Lebon14 Lebon14 changed the title Exotic unicode characters show up as garbage if no other more common unicode characters is on the same line Exotic unicode characters show up as garbage if no other more common unicode characters are on the same line Jun 7, 2019
@rddim
Copy link
Contributor

rddim commented Jun 7, 2019

It is caused by the font. Change the font from Settings > Style Configurator... > and tick Enable global font

npp_exotic_unicode

@MetaChuh
Copy link

MetaChuh commented Jun 7, 2019

@Lebon14

thank you for your debug information.

displaying of certain characters and character combinations within a line or string, requires a font that contains all of those characters, in order to prevent windows from using fallback fonts.
this behaviour is also the same in older versions of notepad++.

here is a screenshot of your provided text, using arial unicode ms as an example:

you can either set a global font, that contains all characters you require and is available on your system, or if you prefer your current font settings for every day use, you can create a user defined language with a font that suits those files.

many thanks and best regards.


general notes:

for general questions, or if it is not sure, whether your issue is directly related to
the notepad++ source code, please visit us at the notepad++ community forum, and
search if related topics exists. you are welcome to post either at similar
topics, or to create a new topic at Help Wanted or General Discussion.
(no extra account is needed, just use your github account to sign in)

                        Click here to visit the Notepad++ Community Forum                         

@MetaChuh MetaChuh closed this as completed Jun 7, 2019
@rddim
Copy link
Contributor

rddim commented Jun 7, 2019

@MetaChuh
Hm, Default Style is also good solution, I will play with it :)

@xylographe
Copy link
Contributor

@Lebon14
For testing you can use the Unifont.
It has glyphs for every printable code point in the BMP (Unicode Basic Multilingual Plane).

@Lebon14
Copy link
Author

Lebon14 commented Jun 7, 2019

Oh really? But why does it shows up when there's other unicode chars and not on its own?

But I love and used to Courier New so much :(

@xylographe
Copy link
Contributor

But why does it shows up when there's other unicode chars and not on its own?

Yes, this is very strange. All I can say is that and display fine when using Consolas. Obviously, since the glyph is missing, this character is shown as the Unicode replacement character (U+FFFD). And with Unifont all three are shown without glitches.

@MetaChuh
Copy link

MetaChuh commented Jun 7, 2019

@Lebon14

yes, most of us do prefer courier new.
i for example use this font as default font for regular files, and have created a user defined language, with e.g. arial unicode ms for such cases.

you could create an user defined language specifically for e.g. m3u files.

@Lebon14
Copy link
Author

Lebon14 commented Jun 7, 2019

Yes, this is very strange. All I can say is that and display fine when using Consolas. Obviously, since the glyph is missing, this character is shown as the Unicode replacement character (U+FFFD). And with Unifont all three are shown without glitches.

"Yes, this is very strange". This is why I opened this bug. The characters are INDEED within Courier New but why aren't they displaying correctly?

And, now, the bug has been closed because "Well, just change the font!" totally disregarding the fact that their might be a, albeit minor, bug here.

I tried all the other ones mentionned but I liked none of them.

@MetaChuh
Copy link

MetaChuh commented Jun 7, 2019

@Lebon14

The characters are INDEED within Courier New but why aren't they displaying correctly?

none of the characters ➈ u+2788, ⑥ u+2465, and ♡ u+2661 are part of courier new, and font fallback behaviour on windows is known to work erratic.

please run %windir%\system32\charmap.exe and have a look for yourself, which font contains which characters.
note: you can type in the unicode number if you open the advanced view, to search for specific characters, so you don't need to scroll though all of them.

many thanks and best regards.

@xylographe
Copy link
Contributor

xylographe commented Jun 8, 2019

Thank you, @MetaChuh.
I wasn't aware, font fallback works in NPP — well, sort of, apparently… ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants