New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noto Sans CJK or Source Han Sans cannot be opened properly #1534
Comments
|
@davelab6, can you send me a U. F. O. of this typeface? |
|
No, since fontforge won't open it I can't make a ufo |
|
FontLab can't make a U. F. O.? |
|
We've got #1581 about lifting TTF restrictions for this font (it violates the spec, I think that there was similar discussion with unifont in that other issue). What do other font editors do with it? |
|
I feel #1581 might not be the root cause here. The language-specific subsets of Noto Sans CJK are much smaller (around half the size of) than the general Noto Sans CJK fonts, so they can't all contain exactly 65,535 glyphs. Yet they all have this problem. |
|
@ahyangyi, I agree. I think that there's some confusion regarding the CID maps. My understanding of these concepts is rather limited, but I'm taking a look. |
|
@davelab6, I bump my previous inquiry about a U. F. O. from FontLab. I am able to build a U. F. O. of Source Han Sans TWHK (from OpenType) in Glyphs, but it is missing fontinfo.plist for some reason, so it is difficult to test in FontForge. |
|
Argh, is this still the case? am going through the issue list to see the mis-re-encoding ( #3080 ) has been seen by somebody else. This looks like it. |
|
It would be nice if What's happening is that the Source CJK fonts / Noto CJK fonts have about 400 glyphs having more than 2 coding points; actually about a dozen have 3. All but one is silently dropped by fontforge. So you get about 400 coding points missing when re-encoding. The worst part of fontforge behaviour is that it uses the last one it sees as authoritative - that's often the CJK Compat variant range, rather than the lower CJK Unified region. So the lower and more often-used CJK Unified code range ended up having about 400 glyphs missing. |
|
My freetype-py script to fix fontforge's encoding problem is up at https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/subfonts-script-generate.py |
|
Followed the link and installed freetype-py, it still didn't fix the problem. I still see blanks for all glyphs after opening the font file in FontForge. Here is the installation outputs which was successful. Should the problem be fixed after freetype-py installed? ip@LINUXMINT182 ~ $ git clone https://github.com/rougier/freetype-py.git Installed /usr/local/lib/python2.7/dist-packages/freetype_py-1.2-py2.7.egg |
|
No - merely installing freetype-py does not fix the issue. My python script (which depends on both fontforge's python extension and freetype-py) writes a new and corrected version of the cjk San font when run. You will also need to adapt slightly what format the new corrected version is in. I needed subsetted type1, but you probably want cff . |
|
Thanks for responding.
Could you please provide us with a step by step instruction what to do to get this font fixed by your script so we are not clueless with what we do?
Really appreciate it.
Thanks,
Ip Smile
From: HinTak
Sent: Friday, October 20, 2017 3:46 PM
To: fontforge/fontforge
Cc: ipsmile ; Comment
Subject: Re: [fontforge/fontforge] Noto Sans CJK or Source Han Sans cannot be opened properly (#1534)
No - merely installing freetype-py does not fix the issue. My python script (which depends on both fontforge's python extension and freetype-py) writes a new and corrected version of the cjk San font when run. You will also need to adapt slightly what format the new corrected version is in. I needed subsetted type1, but you probably want cff .
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
SourceHanSans-Regular-fixed.zip I splitted the 8 files into 3 groups to work around github's 25MB upload limit. SourceHanSansCN-Regular.otf These are enhanced versions of the original to work around fontforge's inability to cope with glyphs having multiple code points, by duplicating the glyphs to separate the code points. Hence they are large than the original; also on the way the glyph gains names. |
|
cjk-multi-fix-all-8.sh.txt The extra .txt at the end is just to fool github's upload (which only allows files of certain types). cjk-multi-fix.py is a tidied-up-for-this-purpose version of the script I mentioned above ( #1534 (comment) ). cjk-multi-fix-all-8.sh is a simple 8-line shell script, the precise 8 command lines for converting the fedora-shipped source san fonts from their system location to a new version of the fonts in the current directory. |
|
HinTak, thanks a great deal. The revised fonts you provided are all visible in Fontforge now. As a user of Fontforge, I would still like to see that Fontforge can handle this problem gracefully. I also encountered similar problems in opening other otf fonts. |
|
The script I wrote is generic and should work on most fonts...
|
|
@HinTak I'm trying to use cjk-multi-fix.py on NotoSans CJK but run into the following error:
|
|
Which font did you try it on? - oh. you did you NotoSans CJK. There are a few variants of them (superttc, etc, language specific ones, locale specific ones, etc). I'd like to know which one you tried.
My script won't split ttc's , BTW. As shown above, I was using it on the language split variants.
|
|
@HinTak thanks for the quick reply! I'm trying to use NotoSansCJKsc-Medium.otf and NotoSansCJKtc-Medium.otf separately. |
|
I have the same problem as @thehen for the NotoSans CJKjp Regular font. I think it's because the CJK Noto fonts just have too many glyphs for a flattened font. |
|
These two were generated on Ubuntu 19.04 with something similar to: (unlike Redhat Fedora, Ubuntu does not ship freetype-py... hence the PYTHONPATH...) |
|
many thanks ! |
It seems this version is not correctly rendered on some OS. Screenshots taken from Win11 zh_CN. |
|
@ttimasdf those error messages don't look like they're from FontForge -- at least I can't find them with grep in the source base. What generated them? |
@skef these messages are from @HinTak 's script, and the comment I made above is about the wierd glyphs in the modified version of Noto CJK SC from #1534 (comment) My problem is solved using the script posted in #1534 (comment) to convert the fonts by myself, and change the FontForge output format to TTF. |
|
I'm curious about this one as well. I can open NotoSansTC-Bold.otf in Fontforge then "CID > Flatten" and "Generate Fonts" as a TTF which seems to work although normal spaces get shown as tofu... and if I copy text and paste it elsewhere it's mostly gibberish... even if I go "Encoding > Reencode > Unicode Full". I'm just looking to get a TTF that can display Traditional Chinese and English. It seems so close and yet so far. |
|
Interestingly I used FontForge to convert the WenQuanYi font (https://packages.ubuntu.com/bionic/fonts-wqy-zenhei) from TTC to TTF and that worked all right for the particular strings I needed to print. No "tofu" space like with NotoSansTC-Bold... |
|
Well, the fonts shipped by the OSes (whichever it is) are quite adequate for general use. This issue is about a bug / design flaw in font forge which can only be seen with some large fonts. Some large fonts (Noto in particular) can have some glyph shapes have multiple glyph ids. Easier to explain if you know chinese, but let's make up a trivial example: a horizontal bar mid-way up, ie "-" . This is just the dash/hyphen in English. But the font designer can also reuse the shape (if it is suitably wide) as the chinese character "1". And the chinese character "2" is two horizontal bars, and the font designer can decide to reuse the shape "=" for equal sign. While this is rare, it is useful to save spaces when you have a lot of glyph shapes. Font forge cannot cope , and assign the higher / rarer-used unicode value to the shape when flattening, if such duplicate decision is made. My script de-duplicates by assigning the lower/ more common value to a glyph when flattening during conversion. You lose the usage of the higher unicode values but they tend to be rarer so don't matter in most cases. WQY likely doesn't make this decision of saving space on identical shapes having different meanings. |
|
The message from the script tells you that some glyph shapes ( a few hundred out of 655xx ) are de-associated from the higher unicode range 0x02914Du to 0x02F9F4u , for example from above. This is one of the higher CJK extension range for rare/historical chinese. So unless you are writing electronically a scholarly article on ancient Chinese text, you probably won't miss those. |
|
@HinTak Thanks, that's a useful explanation of the problem! I had always assumed that the Noto CJK issues relate to I honestly don't know how useful it is to be able to edit Noto CJK in FontForge because it's such a complex font. Why not use the AFDKO toolchain which is how it's intended to be built? What am I missing here? |
|
Like normally when I want to contribute a change to an open source font, if the font was made in free software, I use whatever software the font was built in. For Noto CJK that's AFDKO. |
|
In my case I need to:
|
|
What changes need to be made to arbitrary fonts? and What do you mean by an SVG font? SVG-in-OT or deprecated font? |
|
I was first notified of this flawed interaction between fontforge and NotoCJK* in a TeX related mailing list I subscribe to. For general usage (with Linux/fontconfig etc) of course they work fine.
|
|
Basically what I do, is I allow users to upload custom fonts, which I then convert to an svg font using font forge with python (open the font, flatten it, do a couple other operations, then export as "svg"), then parse that svg and convert to commands that JavaScript uses to draw the glyphs onto a canvas element. It's not really an ideal flow, but it predates webfonts, and at this point changing to use webfonts would be a pretty massive investment, especially without breaking existing user content. And there some advantages to it, for example, the text is rendered exactly the same across different browsers and OSes, and you can do interesting transformations (think word art) that aren't really possible using the canvas text drawing primatives. My use case is probably pretty unusual, but I figured I'd share it. |
|
Thanks for the explanation, @HinTak . I know Chinese reasonably well but I don't think I understand fonts and FontForge well enough to fully comprehend the concept of glyph shapes taking multiple glyph IDs. If I open NotoSansTC-Bold.otf (downloaded via Google web site rather than through a Linux package manager), go to "CID", and choose NotoSansTC-Bold-Proportional, I can see a hyphen... but yeah no real connection to Unicode it seems. Thanks for the high level explanation. Something for me to look into further... |
|
@minusdavid you mis-worded it a bit. "Glyph id" is a tech term in the opentype specification and you cannot hand-wave it... One shape is indeed one glyph id (to be predantic, some shapes may not [yet] have glyph ids, and are somewhat dead/unused/work-in-progress, within a font file). The problem is that the noto fonts occasionally have one visual shape (one glyph id) corresponding to multiple logical meanings (multiple unicode points). Glyph id is an index to visual shapes/drawings, unicode value is an index to logical / textual meanings. |
|
@HinTak I understand Unicode code points although I don't know fonts and FontForge well enough to see where that mapping happens. But if I understand you correctly, it sounds like when you have 1 glyph mapping to multiple Unicode code points, FontForge will flatten the font so that the glyph just maps to the highest Unicode code point and leaves the lower Unicode code points undefined? So then when I use FontForge to re-encode specifically as Unicode the table is incomplete at the lower end? I suppose that could make sense with a hyphen but I don't think it would explain why the space is missing? Except I suppose multiple "subfonts?" might have spaces and FontForge gets confused and just doesn't define a space at all as a result? In the end, I manually added a space and hyphen to round out the ASCII range, so it wasn't a big drama, but I am interested in understanding what's happening overall haha. |
|
For instance, if I open "NotoSansTC-Bold.otf" with FontForge and look at say the letter "A" (which is under NotoSansTC-Bold-Alphabetic), it says it's Unicode Char is "A" which is right, but it says it's Unicode value is "U+ff21" and Glyph Name is "uniE6C7", but the real Unicode code point for "A" is U+0041, so... I'm confused haha. |
|
Maybe it's just too large of a font for me to wrap my head around as a beginner too perhaps. |
|
U+FF21 is full-width 'A' . See https://www.compart.com/en/unicode/U+FF21 . There is also a half-width 'A' unicode value. The half width character is a Japanese construct - to mean a "A" that is half the width so two of those will fit a kanji space. In Arabic and perhaps others, for example, there are variants of "space" character - "non-breaking spaces" where it is preferred not to break a line there. Perhaps it is one of those, why the space shape have multiple unicode values. |
|
@HinTak It prints massages below: |
|
@chianjin that message does not come from my script - maybe one of the dependent components? Anyway, please post the exact place where you got the font file... |
|
@chianjin my script updated ( still at https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/cjk-multi-fix.py ) - interesting issue: the glyph count is off by 2, as fontforge automatically inserts two glyphs, .null , CR , when they are missing. These two were mandatory up to opentype 1.7, and not any more with 1.8 . So I think what happened is this: newer Source Han CJK no longer have them. i.e. the script used to work for older Source Han CJK (which contain those), Source Han CJK got updated to the latest spec, fontforge hasn't (or may not want to, for font usages on older systems). |
|
BTW, I am seeing piles and piles of "No glyph with unicode U+0XXXX in font", just using fontforge to load one of the Source Hans's (not even talk about later, or with any python scripts, just starting fontforge with one of those, plain). I am afraid it is a fontforge issue - the glyphs are duplicated in the CJK Compatibility Ideographs range, and fontforge cannot cope with "one glyph <-> multiple unicode points". glyphs in the "CJK Compatibility Ideographs" range are necessarily of that nature - one shape for multiple unicode values. Read https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs |



Open NotoSansCJK-Regular.otf (https://code.google.com/p/noto/) or SourceHanSans-Regular.otf (http://sourceforge.net/projects/source-han-sans.adobe/files/ in SourceHanSansOTF-1.000.zip).
Only U+FF21..FF5A are displayed with glyphs.
The text was updated successfully, but these errors were encountered: