-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal #332
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Is this a one off? Or could there be others. Look how many PR words you had to write to fix this one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I discovered it while working on a new test on sphinxlint, and according to sphinx-lint this is the only one. Just for good measure, I just tried |
Any idea what introduced the zero width whitespace? |
It was commit 23a4f28 in PR #81 ; it was copied from a Google doc but I've verified it wasn't in the Google Doc script text, and there wasn't any weird formatting anywhere near that location. While I'm really not sure what caused this, I believe most likely scenario is due to the non-US keyboard layout that I believe the author had, one in which Also, I couldn't find any other non-ASCII characters used throughout the docs, except for those that were intended, so this was indeed a one-off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This...this is a thing of beauty.
LGTM!
PSA for Mac Users⌥ + space = Mitigation
Long-term Solution
P.S.Excuse me for dropping this here. Any soul we can save might potentially benefit humanity (or prevent some catastrophe) in the future. |
Beware, non-breaking space is not zero width non-breaking space. (And non-breaking space is usefull, in french at least, because we put them before |
@JulienPalard Thanks! I did not know what non-breaking space was useful for (and I initially confused ZWNBSP with NBSP). |
Hopefully this works here, but I have this little invisible guy. >> << |
It's not quite invisible...especially in a monospace font :) |
It's imperceptible. Ha. I think it's part of a flag emoji. 🇪🇲 All I know is a few of these will render my Gboard invisible – [ ٹ ̣ ̴̴ ] |
This is literally the smallest PR I've ever done.
It removes a zero width no-break space.
But this char was breaking the inline literal next to it, see in this page, the
``is_dark_font_color``
should have been interpreted by Sphinx and rendered in red:The removed character is obviously not rendered in github "files changed" interface. Not in
git diff
, andgit show --color-words
either. Not in your editor, and not in your terminal, ... The character is a space. And a space with no width!!!If you really want to see it, a
git show | cat -A
can be helpfull, you'll see something like:But the paragraph is way longer than that so it's a bit hard to spot.
For the curious the
M-...
notation denotes bytes in the range[128;255]
. The 32 first of this range are then treated as if they were in the range[0; 32]
and displayed using the^
notation, so\x80
isM-^@
, and the other ones are just substracted by 128, so\xa0
isM-
(yes a space).So
M-o
is\x6f + 128
(\x6f
is the value foro
in the ASCII table) =\xef
.M-;
is\xbb
andM-?
is\xbf
. Gives us the sequence\xef\xbb\xbf
.Still curious? The file is encoded using UTF-8, so to decode this UTF-8 sequence we need to extract relevant bits from it. In binary it looks like:
The leading
1110
means "There's 3 bytes for this char" (Count the ones, three ones → three bytes. The zero is just a delimiter). The trailing two bytes starts with "10" meaning "we're trailing bytes".If we drop those markers (
1110
and10
in front of bytes) and keep the remaining bits we're left with1111111011111111
, which evaluates to 65279, which is in hexadecimal0xfeff
. Yes, you recognize it, it's a BOM. Because yes a BOM is just aZERO WIDTH NO-BREAK SPACE
, isn't it beautiful?Do we really have to do the bit manipulation to discover what this character was? Obviously not, just use emacs'
M-x describe char
on it:And this is literally the longest PR description I've written.