Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make ZERO WIDTH SPACE have some visual component #8284

Closed
GhbSmwc opened this issue May 19, 2020 · 56 comments
Closed

[Feature Request] Make ZERO WIDTH SPACE have some visual component #8284

GhbSmwc opened this issue May 19, 2020 · 56 comments
Labels

Comments

@GhbSmwc
Copy link

GhbSmwc commented May 19, 2020

Try sorting this:

https://pbs.twimg.com/media/
https://twitter.com/
​https://pbs.twimg.com/media/

Somehow, the last URL is positioned wrong.

Wait.., there is a zero-width character on the third (before the h in https) which is "​" (select to see that character). I think NP++'s show symbol should also show such characters.

@GhbSmwc GhbSmwc changed the title Sort lines lexicographically bug Sort lines lexicographically bug (solved) May 19, 2020
@xylographe
Copy link
Contributor

The character is U+200B (Unicode name: ZERO WIDTH SPACE).

@sasumner
Copy link
Contributor

I think NP++'s show symbol should also show such characters.

What's your opinion on what it should look like?

@GhbSmwc
Copy link
Author

GhbSmwc commented May 19, 2020

A yellow line, much like when you make it show spaces and tabs.

@sasumner sasumner changed the title Sort lines lexicographically bug (solved) [Feature Request] Make ZERO WIDTH SPACE have some visual component May 19, 2020
@sasumner
Copy link
Contributor

@GhbSmwc I took the liberty of renaming your issue; hopefully that is okay with you. :-)

@sasumner
Copy link
Contributor

This is going to depend upon Scintilla.

@sasumner sasumner added the scintilla dependent Can't be considered for N++ implementation unless/until Scintilla changes label May 19, 2020
@sasumner
Copy link
Contributor

This was "addressed" by the Scintilla project back in 2013: https://sourceforge.net/p/scintilla/feature-requests/980/

Their response was to add an API so that you can represent any character you want, "specially":
https://www.scintilla.org/ScintillaDoc.html#SCI_SETREPRESENTATION

The representation mechanism is the "white text on black box" variety, much like line-endings look when visually enabled in Notepad++:

image

Here's an example of what it would look like to set the representation to "" (an empty string), the narrowest possible thing, which is harmonious with a "zero-width" character:

image

I suppose it is "workable" but I'd say it isn't ideal.

@Daksol
Copy link

Daksol commented Nov 20, 2020

This request has come up more than once if you look back through the Issues history. I encountered this problem when "screen scraping" data out of certain HTML pages.

If this is not taken up, then maybe add a Caution to NPP interface to say that the option "View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.

Only way to detect a Zero Width Space is to move a cursor along a line of text while watching the character position number in the status bar,

@Daksol
Copy link

Daksol commented Nov 20, 2020

Other issues raised covering the same topic

@sasumner
Copy link
Contributor

sasumner commented Nov 20, 2020

@Daksol

This request has come up more than once

And note that it remains an "open" request

"View-> Show Symbol -> Show All Characters" does NOT show characters such as ZeroWidthSpace and friends.

The "Show All Characters" verbage is historical and comes from a time when "space", "tab" and "end-of-line" characters were the only things that weren't shown. Today that menu item might be "Show space, tab, CR, LF"

The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: #5062 (comment)

You can have these characters visible RIGHT NOW if you are willing to use a Pythonscript to control them.

@donho
Copy link
Member

donho commented Nov 22, 2020

@sasumner

The author of N++ has declined my proposal for making such things as you mention visible as a core Notepad++ optional feature, but has kept the door somewhat open by this: #5062 (comment)

The issue #5062 talked about the BOM, which is visible, but removed by Notepad++, so I won't reopen it since it's not an issue. OTOH, I agree the invisible Unicode character is an issue.
So let's add Show Zero Width Non Breaking Space menu command between Show End Of Line & Show All Character, and make Show All Character command shows also ZWNBS chars.

@donho donho added the accepted label Nov 22, 2020
@sasumner
Copy link
Contributor

@donho

Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)

5062 was also used as a springboard for the general discussion of invisible characters.

let's add Show Zero Width Non Breaking Space menu command

But...there are many more invisible UTF-8 characters than that, that users could want.
Hmmm...
Let me come up with a proposal for how to best handle this?

@donho
Copy link
Member

donho commented Nov 23, 2020

Correct about BOM. BOM should never be visible, except I guess HexEditor plugin? (Not our problem!)

Yes, HexEditor does some dirty hacking to show these 3 bytes.

But...there are many more invisible UTF-8 characters than that, that users could want.

Then let's add "Show all invisible character" menu command - I don't think users care about showing which kind of invisible characters.

Let me come up with a proposal for how to best handle this?

OK. Please keep it simple & stupid.

@Daksol
Copy link

Daksol commented Nov 24, 2020

@sasumner. Thanks for getting this back on the program, so to speak.

Some thoughts on what would be best way of handling hidden characters - this derived from the usecases I was encountering when I first discovered zero width space and friends.

My real world interaction with hidden chars has often been when troubleshooting web pages, urls et al. And what I have often found myself doing is putting a string into (say Excel) and adding formulae to show the Unicode code for each item. That then shows up the non-breaking spaces, zero-width spaces etc. Not very convenient to say the least.

  • For the most common hidden, unprintable and whitespace characters
    • Examples: CR, LF, Tab, Space as at present, (suggest adding NonBreakingSpace here)
    • With Show-All-Chars:
      • show as descriptive block - exactly what happens in Notepad++ currently
  • For other unprintable chars
    • Examples: Zero width space
    • With Show-All-Chars:
      • either show as dark-shaded rectangular block, and with onmouseover show the hex value
      • or as highlighted group of characters showing including Unicode value, eg %200B

@sasumner
Copy link
Contributor

@Daksol said

Some thoughts on what would be best way of handling hidden characters...

Notepad++ only has limited control over the display; the display is primarily controlled by Scintilla.
The only possible thing that N++ can do with these characters is what is shown with CR and LF in my previous posting HERE.
So any such characters would have to look like those.

@ArkadiuszMichalski
Copy link
Contributor

@sasumner But indicator for this not_visible_chars must be the orange dot (same as for standard space)? I would prefer to somehow distinguish a normal space from the other characters.

@sasumner
Copy link
Contributor

sasumner commented Nov 24, 2020

@ArkadiuszMichalski said:

must be the orange dot

Scintilla controls the "orange dot" behavior, for but only for U+0020 character.
The best Notepad++ can do for making any new characters visible is like what it does for CR and LF.
In the default theme this means whiteish characters on a black background.

@sasumner
Copy link
Contributor

So here's a sampling of what could be done, stolen from a forgotten discussion thread on the Community site:

image

@ArkadiuszMichalski
Copy link
Contributor

OK thx, looks good, at least for me.

@sasumner
Copy link
Contributor

Here's the complete list as of now, that I plan to implement:

image

For some of these, I just made up an abbreviation so that the representation, the white letters on the black background, doesn't become too big/wide. I don't know if there is a standard abbreviation...does anyone?

Comments are most welcome.

More "simple and stupid" spec'cing out of this feature to come...

@donho
Copy link
Member

donho commented Nov 26, 2020

@sasumner
Are they (in the list above) all not displayed currently in Notepad++ ?

@sasumner
Copy link
Contributor

Are they (in the list above) all not displayed currently in Notepad++ ?

Correct.
I do not know if the list is complete, but all of those currently in the above list are currently invisible in N++.
Unless someone says "You forgot about 'foo' !" we can start with the above list.

@sasumner
Copy link
Contributor

@donho

Here's the file the above image was made from, if you want to "see" how N++ displays the invisible characters:

invisible_utf8_chars.txt

@sasumner
Copy link
Contributor

BTW, here is what is displayed for the invisible_utf8_chars.txt in Windows' notepad.exe :

image

@donho
Copy link
Member

donho commented Nov 26, 2020

@sasumner
For me FF, NEL, LS & PS are displayed:

image

My debug info:

Notepad++ v7.9.1   (32-bit)
Build time : Nov  2 2020 - 01:03:56
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Enterprise (64-bit) 
OS Version : 2004
OS Build : 19041.630
Current ANSI codepage : 1252
Plugins : DSpellCheck.dll HexEditor.dll mimeTools.dll NppConverter.dll NppExport.dll NppXmlTreeviewPlugin.dll 

@ArkadiuszMichalski
Copy link
Contributor

With BOM I got this:
image

@sasumner
Copy link
Contributor

@ArkadiuszMichalski I see no changes in your two screenshots, except the lines that have been removed from the second file. This is not a problem, I'm just stating it for the record.

@sasumner
Copy link
Contributor

@donho said:

I'm confused... now 2060, 2066, 2067, 2068 and 2069 in both files you provided don't display on my 1st laptop.

5 Unicode chars in both files you provided do display on my 2nd laptop.

That confuses me as well. :-(

@donho
Copy link
Member

donho commented Nov 26, 2020

config.xml matters!

@sasumner
use this one to check:
config.zip

@sasumner
Copy link
Contributor

sasumner commented Nov 27, 2020

@donho said:

config.xml matters!

I presume that writeTechnologyEngine="0" in that config.xml is what is important?


So this N++ preference setting is going to impact how these characters appear:

image

To hopefully be very clear, this is how the invisible_utf8_chars_2_w_bom.txt file appears for me with "direct write" unticked:

image

And here's what it looks like for me with "direct write" ticked:

image

I'm not at all sure what this means for the future of this feature.

@donho How long is the "direct write" preference setting going to remain? Forever?
I turned it on and it has remained on forever for me.
No downside, only advantages.
To get good data on users that it wouldn't work well for, it should have been made to be on by default, and users could turn it off.
The way it was done, nobody notices it to turn it on and try.
Thus we get no feedback about users that it causes a problem for.

@donho
Copy link
Member

donho commented Nov 28, 2020

@sasumner

How long is the "direct write" preference setting going to remain? Forever?

I did read feedback that in some environments there's the performance issue when "direct write" is ON. So our experience can not be representative for the whole community. Furthermore, I don't want to scarify existing option (which could concern the performance issue) for a new feature which is benefit for less people.

The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON. What do you think?

@sasumner
Copy link
Contributor

The workaround for me is just treat 2060, 2066, 2067, 2068 and 2069 as invisible chars so there's no ambiguity whether if "direct write" is ON

Yes, that can work.
Or, we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.

@donho
Copy link
Member

donho commented Nov 28, 2020

@sasumner

we detect when direct-write is "on" and then don't include those in the list of characters to set a new representation for.

Yes, it could work if Scintilla has a message for that. The Notepad++'s option regarding "direct write" in Parameters class cannot be counted on because it's applied to next session.

@sasumner
Copy link
Contributor

if Scintilla has a message for that.

SCI_GETTECHNOLOGY

@sasumner
Copy link
Contributor

sasumner commented Dec 4, 2020

@sasumner
Copy link
Contributor

@sasumner
Copy link
Contributor

One more that this feature would probably help: https://community.notepad-plus-plus.org/topic/20490/invisible-spaces

@alankilborn
Copy link
Contributor

@ArkadiuszMichalski I think the "scintilla dependent" label can be removed from this issue. Scintilla already supports everything needed to allow any normally-invisible character to be shown. I suppose it is left to the programs that embed Scintilla to decide how they want to do this; at least up to now Scintilla has not added a default visual component to more than the "low ASCII" range (NULL, SOH, STX, ..., GS, RS, US).

@ArkadiuszMichalski ArkadiuszMichalski removed the scintilla dependent Can't be considered for N++ implementation unless/until Scintilla changes label Aug 9, 2022
@Daksol
Copy link

Daksol commented Aug 9, 2022

@ArkadiuszMichalski - Good to see some movement on this one. fyi There is another Open Issue #8530 which covers the same ground as #8284 which could perhaps also be updated.

Edit: 25Aug2022 - corrected typo in link

@alankilborn
Copy link
Contributor

Good to see some movement on this one.

What "movement"??
:-)

@Daksol
Copy link

Daksol commented Feb 8, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants