Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal #332

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

JulienPalard
Copy link
Contributor

This is literally the smallest PR I've ever done.

It removes a zero width no-break space.

But this char was breaking the inline literal next to it, see in this page, the ``is_dark_font_color`` should have been interpreted by Sphinx and rendered in red:

Capture d’écran du 2022-10-05 22-27-06

The removed character is obviously not rendered in github "files changed" interface. Not in git diff, and git show --color-words either. Not in your editor, and not in your terminal, ... The character is a space. And a space with no width!!!

If you really want to see it, a git show | cat -A can be helpfull, you'll see something like:

-in the ``mainwindow.py`` file we import the M-oM-;M-?``is_dark_font_color``
+in the ``mainwindow.py`` file we import the ``is_dark_font_color``

But the paragraph is way longer than that so it's a bit hard to spot.

For the curious the M-... notation denotes bytes in the range [128;255]. The 32 first of this range are then treated as if they were in the range [0; 32] and displayed using the ^ notation, so \x80 is M-^@, and the other ones are just substracted by 128, so \xa0 is M- (yes a space).

So M-o is \x6f + 128 (\x6f is the value for o in the ASCII table) = \xef. M-; is \xbb and M-? is \xbf. Gives us the sequence \xef\xbb\xbf.

Still curious? The file is encoded using UTF-8, so to decode this UTF-8 sequence we need to extract relevant bits from it. In binary it looks like:

11101111 10111011 10111111

The leading 1110 means "There's 3 bytes for this char" (Count the ones, three ones → three bytes. The zero is just a delimiter). The trailing two bytes starts with "10" meaning "we're trailing bytes".

If we drop those markers (1110 and 10 in front of bytes) and keep the remaining bits we're left with 1111111011111111, which evaluates to 65279, which is in hexadecimal0xfeff. Yes, you recognize it, it's a BOM. Because yes a BOM is just a ZERO WIDTH NO-BREAK SPACE, isn't it beautiful?

Do we really have to do the bit manipulation to discover what this character was? Obviously not, just use emacs' M-x describe char on it:

             position: 4646 of 14699 (32%), column: 380
            character:  (displayed as ) (codepoint 65279, #o177377, #xfeff)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0xFEFF
               script: arabic
               syntax: w 	which means: word
             to input: type "C-x 8 RET feff" or "C-x 8 RET ZERO WIDTH NO-BREAK SPACE"
          buffer code: #xEF #xBB #xBF
            file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix)
              display: by this font (glyph code):
    ftcrhb:-GOOG-Noto Naskh Arabic UI-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x5D5)

Character code properties: customize what to show
  name: ZERO WIDTH NO-BREAK SPACE
  old-name: BYTE ORDER MARK
  general-category: Cf (Other, Format)
  decomposition: (65279) ('')

And this is literally the longest PR description I've written.

Copy link

@LaurensDeV LaurensDeV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@aprofeit
Copy link

aprofeit commented Oct 5, 2022

Is this a one off? Or could there be others. Look how many PR words you had to write to fix this one!

vvilliamperez
vvilliamperez approved these changes Oct 5, 2022
Copy link

@ad-m-ss ad-m-ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@JulienPalard
Copy link
Contributor Author

Is this a one off? Or could there be others. Look how many PR words you had to write to fix this one!

I discovered it while working on a new test on sphinxlint, and according to sphinx-lint this is the only one.

Just for good measure, I just tried git grep $'\xef\xbb\xbf', and don't see any other.

@OJFord
Copy link

OJFord commented Oct 5, 2022

The removed character is obviously not rendered in github "files changed" interface.

For whatever idle curiosity it's worth, it is for me, in the GitHub Android app:

Screenshot_20221005-232106~2.png

i.e. obviously not in terms of a source character or anything with width, but it does similarly affect Github's own rendering.

@MFogleman
Copy link

Any idea what introduced the zero width whitespace?

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Oct 5, 2022

Any idea what introduced the zero width whitespace?

It was commit 23a4f28 in PR #81 ; it was copied from a Google doc but I've verified it wasn't in the Google Doc script text, and there wasn't any weird formatting anywhere near that location. While I'm really not sure what caused this, I believe most likely scenario is due to the non-US keyboard layout that I believe the author had, one in which ` is not present on the keyboard and must instead be typed via a special escape sequence, it may have been accidentally mistyped when trying to type a ` instead (since that was done between the Google Doc script and here).

Also, I couldn't find any other non-ASCII characters used throughout the docs, except for those that were intended, so this was indeed a one-off.

@CAM-Gerlach CAM-Gerlach changed the title Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal. Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal Oct 5, 2022
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This...this is a thing of beauty.

LGTM!

@CAM-Gerlach CAM-Gerlach merged commit ee6f392 into spyder-ide:master Oct 5, 2022
@krmbzds
Copy link

krmbzds commented Oct 5, 2022

PSA for Mac Users

+ space = non-breaking space

Mitigation

  1. Install Karabiner
  2. Import Disable alt+spacebar (nonbreaking space) rule

Long-term Solution

  • Switch to Linux
  • Use a keyboard that has a standard keyboard layout

P.S.

Excuse me for dropping this here. Any soul we can save might potentially benefit humanity (or prevent some catastrophe) in the future.

@JulienPalard
Copy link
Contributor Author

Beware, non-breaking space is not zero width non-breaking space.

(And non-breaking space is usefull, in french at least, because we put them before ?, ! and so on: we want them to be spaced, but not cause a newline, it's ugly to have the ? at the start of a line, far away from the last word of the question.)

@krmbzds
Copy link

krmbzds commented Oct 6, 2022

Beware, non-breaking space is not zero width non-breaking space.

(And non-breaking space is usefull, in french at least, because we put them before ?, ! and so on: we want them to be spaced, but not cause a newline, it's ugly to have the ? at the start of a line, far away from the last word of the question.)

@JulienPalard Thanks! I did not know what non-breaking space was useful for (and I initially confused ZWNBSP with NBSP).

@JulienPalard JulienPalard deleted the mdk-typo branch October 6, 2022 11:31
@Disembaudio
Copy link

Hopefully this works here, but I have this little invisible guy. >>   <<

@CAM-Gerlach
Copy link
Member

It's not quite invisible...especially in a monospace font :)

@Disembaudio
Copy link

It's not quite invisible...especially in a monospace font :)

It's imperceptible. Ha. I think it's part of a flag emoji. 🇪🇲 All I know is a few of these will render my Gboard invisible – [ ٹ  ̣   ̴̴ ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants