You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This follows the word of the Unicode standard:
http://unicode.org/faq/unsup_char.html
"""
Q: Which characters should be displayed as a visible but blank space?
A: This is the easy one: all the characters that have the White_Space
property, also generically known as “whitespace characters”. This
set includes SPACE, of course, but also such characters as the tab
control character, NO-BREAK SPACE, LINE SEPARATOR, and so on. For
the full list, see the White_Space values in PropList.txt.
"""
However, I'm not sure if we want to do this this way. Note that
White_Space, as of Unicode 7.0, includes:
$ grep '; White_Space' PropList.txt
0009..000D ; White_Space # Cc [5] <control-0009>..<control-000D>
0020 ; White_Space # Zs SPACE
0085 ; White_Space # Cc <control-0085>
00A0 ; White_Space # Zs NO-BREAK SPACE
1680 ; White_Space # Zs OGHAM SPACE MARK
2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE
2028 ; White_Space # Zl LINE SEPARATOR
2029 ; White_Space # Zp PARAGRAPH SEPARATOR
202F ; White_Space # Zs NARROW NO-BREAK SPACE
205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE
3000 ; White_Space # Zs IDEOGRAPHIC SPACE
That's in fact all of GC=Zs/Zp/Zl plus U+0009..000D and U+0085.
Of those, all the GC=Zs ones have a compatibility decomposition
to space already, so they were getting this treatment already,
with the benefit that client could override that fallback by
overriding decompose_compatibility() function, and in fact
LibreOffice already does that. If we commit this change, clients
wouldn't be able to override that anymore.
So this change is essentially about ASCII control chars 9..D and
U+0085 NEL as well as U+2028/U+2029 LINE/PARAGRAPH SEPARATOR.
Perhaps I should limit this change to just those?
My personal feeling is that those characters are actually better
always rendered as space, or never rendered as space. Relying on
whether the font supports those is only one particular reading of
the Unicode standard. Unicode says show space if "the rendering
system doesn't fully support them". We can also read this as "if
client did indeed pass them to HarfBuzz". I think I like that
reading for the newline-like characters.
0 commit comments