Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noto Sans CJK or Source Han Sans cannot be opened properly #1534

Open
moyogo opened this issue Jul 16, 2014 · 52 comments
Open

Noto Sans CJK or Source Han Sans cannot be opened properly #1534

moyogo opened this issue Jul 16, 2014 · 52 comments

Comments

@moyogo
Copy link

moyogo commented Jul 16, 2014

Open NotoSansCJK-Regular.otf (https://code.google.com/p/noto/) or SourceHanSans-Regular.otf (http://sourceforge.net/projects/source-han-sans.adobe/files/ in SourceHanSansOTF-1.000.zip).
Only U+FF21..FF5A are displayed with glyphs.

@davelab6
Copy link
Member

FontLab doesn't like the TTFs but the OTFs are OK:

screen shot 2014-07-16 at 13 11 33 1

screen shot 2014-07-16 at 13 12 09 1

@frank-trampe
Copy link
Contributor

@davelab6, can you send me a U. F. O. of this typeface?

@davelab6
Copy link
Member

No, since fontforge won't open it I can't make a ufo

@frank-trampe
Copy link
Contributor

FontLab can't make a U. F. O.?

@adrientetar
Copy link
Member

We've got #1581 about lifting TTF restrictions for this font (it violates the spec, I think that there was similar discussion with unifont in that other issue). What do other font editors do with it?

@ahyangyi
Copy link
Contributor

I feel #1581 might not be the root cause here. The language-specific subsets of Noto Sans CJK are much smaller (around half the size of) than the general Noto Sans CJK fonts, so they can't all contain exactly 65,535 glyphs. Yet they all have this problem.

@frank-trampe
Copy link
Contributor

@ahyangyi, I agree. I think that there's some confusion regarding the CID maps. My understanding of these concepts is rather limited, but I'm taking a look.

@frank-trampe
Copy link
Contributor

@davelab6, I bump my previous inquiry about a U. F. O. from FontLab.

I am able to build a U. F. O. of Source Han Sans TWHK (from OpenType) in Glyphs, but it is missing fontinfo.plist for some reason, so it is difficult to test in FontForge.

@davelab6 davelab6 removed their assignment Oct 27, 2014
@HinTak
Copy link

HinTak commented Jun 7, 2017

Argh, is this still the case? am going through the issue list to see the mis-re-encoding ( #3080 ) has been seen by somebody else. This looks like it.

@HinTak
Copy link

HinTak commented Jun 9, 2017

It would be nice if MultipleEncodingsToReferences() does the right thing - it currently does not.

What's happening is that the Source CJK fonts / Noto CJK fonts have about 400 glyphs having more than 2 coding points; actually about a dozen have 3. All but one is silently dropped by fontforge. So you get about 400 coding points missing when re-encoding.

The worst part of fontforge behaviour is that it uses the last one it sees as authoritative - that's often the CJK Compat variant range, rather than the lower CJK Unified region. So the lower and more often-used CJK Unified code range ended up having about 400 glyphs missing.

@HinTak
Copy link

HinTak commented Jun 9, 2017

My freetype-py script to fix fontforge's encoding problem is up at

https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/subfonts-script-generate.py

@ipsmile
Copy link

ipsmile commented Oct 20, 2017

Followed the link and installed freetype-py, it still didn't fix the problem. I still see blanks for all glyphs after opening the font file in FontForge.

Here is the installation outputs which was successful. Should the problem be fixed after freetype-py installed?

ip@LINUXMINT182 ~ $ git clone https://github.com/rougier/freetype-py.git
Cloning into 'freetype-py'...
remote: Counting objects: 998, done.
remote: Total 998 (delta 0), reused 0 (delta 0), pack-reused 998
Receiving objects: 100% (998/998), 1.12 MiB | 791.00 KiB/s, done.
Resolving deltas: 100% (620/620), done.
Checking connectivity... done.
ip@LINUXMINT182 ~ $ cd freetype-py
ip@LINUXMINT182 ~/freetype-py $ python setup.py install
running install
running bdist_egg
running egg_info
creating freetype_py.egg-info
writing freetype_py.egg-info/PKG-INFO
writing top-level names to freetype_py.egg-info/top_level.txt
writing dependency_links to freetype_py.egg-info/dependency_links.txt
writing manifest file 'freetype_py.egg-info/SOURCES.txt'
reading manifest file 'freetype_py.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'freetype_py.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/freetype
copying freetype/ft_errors.py -> build/lib.linux-x86_64-2.7/freetype
copying freetype/raw.py -> build/lib.linux-x86_64-2.7/freetype
copying freetype/ft_types.py -> build/lib.linux-x86_64-2.7/freetype
copying freetype/init.py -> build/lib.linux-x86_64-2.7/freetype
copying freetype/ft_structs.py -> build/lib.linux-x86_64-2.7/freetype
creating build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_encodings.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_style_flags.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_load_targets.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_face_flags.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_kerning_modes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_open_modes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_fstypes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_platforms.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_glyph_formats.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_name_ids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_stroker_linejoins.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_adobe_ids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_apple_ids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_stroker_linecaps.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_render_modes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_curve_tags.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_outline_flags.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_pixel_modes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_ms_ids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/init.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_glyph_bbox_modes.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_ms_langids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_mac_langids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/tt_mac_ids.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_stroker_borders.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_lcd_filters.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
copying freetype/ft_enums/ft_load_flags.py -> build/lib.linux-x86_64-2.7/freetype/ft_enums
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/freetype
copying build/lib.linux-x86_64-2.7/freetype/ft_errors.py -> build/bdist.linux-x86_64/egg/freetype
copying build/lib.linux-x86_64-2.7/freetype/raw.py -> build/bdist.linux-x86_64/egg/freetype
creating build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_encodings.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_style_flags.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_load_targets.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_face_flags.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_kerning_modes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_open_modes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_fstypes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_platforms.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_glyph_formats.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_name_ids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_stroker_linejoins.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_adobe_ids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_apple_ids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_stroker_linecaps.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_render_modes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_curve_tags.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_outline_flags.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_pixel_modes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_ms_ids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/init.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_glyph_bbox_modes.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_ms_langids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_mac_langids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/tt_mac_ids.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_stroker_borders.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_lcd_filters.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_enums/ft_load_flags.py -> build/bdist.linux-x86_64/egg/freetype/ft_enums
copying build/lib.linux-x86_64-2.7/freetype/ft_types.py -> build/bdist.linux-x86_64/egg/freetype
copying build/lib.linux-x86_64-2.7/freetype/init.py -> build/bdist.linux-x86_64/egg/freetype
copying build/lib.linux-x86_64-2.7/freetype/ft_structs.py -> build/bdist.linux-x86_64/egg/freetype
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_errors.py to ft_errors.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/raw.py to raw.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_encodings.py to ft_encodings.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_style_flags.py to ft_style_flags.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_load_targets.py to ft_load_targets.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_face_flags.py to ft_face_flags.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_kerning_modes.py to ft_kerning_modes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_open_modes.py to ft_open_modes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_fstypes.py to ft_fstypes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_platforms.py to tt_platforms.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_glyph_formats.py to ft_glyph_formats.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_name_ids.py to tt_name_ids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_stroker_linejoins.py to ft_stroker_linejoins.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_adobe_ids.py to tt_adobe_ids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_apple_ids.py to tt_apple_ids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_stroker_linecaps.py to ft_stroker_linecaps.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_render_modes.py to ft_render_modes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_curve_tags.py to ft_curve_tags.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_outline_flags.py to ft_outline_flags.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_pixel_modes.py to ft_pixel_modes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_ms_ids.py to tt_ms_ids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_glyph_bbox_modes.py to ft_glyph_bbox_modes.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_ms_langids.py to tt_ms_langids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_mac_langids.py to tt_mac_langids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/tt_mac_ids.py to tt_mac_ids.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_stroker_borders.py to ft_stroker_borders.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_lcd_filters.py to ft_lcd_filters.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_enums/ft_load_flags.py to ft_load_flags.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_types.py to ft_types.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/freetype/ft_structs.py to ft_structs.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying freetype_py.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying freetype_py.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying freetype_py.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying freetype_py.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/freetype_py-1.2-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing freetype_py-1.2-py2.7.egg
Copying freetype_py-1.2-py2.7.egg to /usr/local/lib/python2.7/dist-packages
Adding freetype-py 1.2 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/freetype_py-1.2-py2.7.egg
Processing dependencies for freetype-py==1.2
Finished processing dependencies for freetype-py==1.2

@HinTak
Copy link

HinTak commented Oct 20, 2017

No - merely installing freetype-py does not fix the issue. My python script (which depends on both fontforge's python extension and freetype-py) writes a new and corrected version of the cjk San font when run. You will also need to adapt slightly what format the new corrected version is in. I needed subsetted type1, but you probably want cff .

@ipsmile
Copy link

ipsmile commented Oct 21, 2017 via email

@HinTak
Copy link

HinTak commented Dec 4, 2017

SourceHanSans-Regular-fixed.zip
SourceHanSerifCJT-Regular-fixed.zip
SourceHanSerifKR-Regular-fixed.zip

I splitted the 8 files into 3 groups to work around github's 25MB upload limit.

SourceHanSansCN-Regular.otf
SourceHanSansJP-Regular.otf
SourceHanSansKR-Regular.otf
SourceHanSansTW-Regular.otf
SourceHanSerifCN-Regular.otf
SourceHanSerifJP-Regular.otf
SourceHanSerifKR-Regular.otf
SourceHanSerifTW-Regular.otf

These are enhanced versions of the original to work around fontforge's inability to cope with glyphs having multiple code points, by duplicating the glyphs to separate the code points. Hence they are large than the original; also on the way the glyph gains names.

@HinTak
Copy link

HinTak commented Dec 4, 2017

cjk-multi-fix-all-8.sh.txt
cjk-multi-fix.py.txt

The extra .txt at the end is just to fool github's upload (which only allows files of certain types).

cjk-multi-fix.py is a tidied-up-for-this-purpose version of the script I mentioned above ( #1534 (comment) ).

cjk-multi-fix-all-8.sh is a simple 8-line shell script, the precise 8 command lines for converting the fedora-shipped source san fonts from their system location to a new version of the fonts in the current directory.

@ipsmile
Copy link

ipsmile commented Mar 11, 2018

HinTak, thanks a great deal. The revised fonts you provided are all visible in Fontforge now.

As a user of Fontforge, I would still like to see that Fontforge can handle this problem gracefully. I also encountered similar problems in opening other otf fonts.

@HinTak
Copy link

HinTak commented Mar 12, 2018 via email

@thehen
Copy link

thehen commented May 16, 2018

@HinTak I'm trying to use cjk-multi-fix.py on NotoSans CJK but run into the following error:

The 'sfnt' format is currently limited to 65535 glyphs, and your font has 65966 of them.

@HinTak
Copy link

HinTak commented May 16, 2018 via email

@thehen
Copy link

thehen commented May 16, 2018

@HinTak thanks for the quick reply! I'm trying to use NotoSansCJKsc-Medium.otf and NotoSansCJKtc-Medium.otf separately.

@tmccombs
Copy link
Contributor

tmccombs commented Oct 24, 2018

I have the same problem as @thehen for the NotoSans CJKjp Regular font.

I think it's because the CJK Noto fonts just have too many glyphs for a flattened font.

@HinTak
Copy link

HinTak commented Nov 8, 2019

These two were generated on Ubuntu 19.04 with something similar to: (unlike Redhat Fedora, Ubuntu does not ship freetype-py... hence the PYTHONPATH...)

PYTHONPATH=freetype-py/build/lib.linux-x86_64-2.7/ fontforge -script  fontval-diag/examples/cjk-multi-fix.py NotoSansCJKtc-Bold.otf  NotoSansCJKtc-Bold-fixed.otf
PYTHONPATH=freetype-py/build/lib.linux-x86_64-2.7/ fontforge -script  fontval-diag/examples/cjk-multi-fix.py NotoSansCJKsc-Bold.otf  NotoSansCJKsc-Bold-fixed.otf

NotoSansCJKsc-Bold-fixed.zip
NotoSansCJKtc-Bold-fixed.zip

@brafxs

@brafxs
Copy link

brafxs commented Nov 8, 2019

many thanks !

@ttimasdf
Copy link

NotoSansCJKjp-Regular.zip

NotoSansCJKsc-Regular.zip

NotoSansCJKtc-Regular.zip

delete count and info from jp:

Duplicated 597 glyphs because some glyph map to multiple code points.
glyph count = 66132
***Unfortunately glyph count > 65535... we'll need to delete 597 glyphs.***
***Deleting 597 glyphs from unicode 0x028468u to 0x02F9F4u.***

delete count from tc:

Duplicated 390 glyphs because some glyph map to multiple code points.
glyph count = 65925
***Unfortunately glyph count > 65535... we'll need to delete 390 glyphs.***
***Deleting 390 glyphs from unicode 0x02914Du to 0x02F9F4u.***

delete count from sc:

Duplicated 431 glyphs because some glyph map to multiple code points.
glyph count = 65966
***Unfortunately glyph count > 65535... we'll need to delete 431 glyphs.***
***Deleting 431 glyphs from unicode 0x028D10u to 0x02F9F4u.***

It seems this version is not correctly rendered on some OS. Screenshots taken from Win11 zh_CN.

image

@skef
Copy link
Contributor

skef commented Nov 27, 2021

@ttimasdf those error messages don't look like they're from FontForge -- at least I can't find them with grep in the source base. What generated them?

@ttimasdf
Copy link

@ttimasdf those error messages don't look like they're from FontForge -- at least I can't find them with grep in the source base. What generated them?

@skef these messages are from @HinTak 's script, and the comment I made above is about the wierd glyphs in the modified version of Noto CJK SC from #1534 (comment)

My problem is solved using the script posted in #1534 (comment) to convert the fonts by myself, and change the FontForge output format to TTF.

@minusdavid
Copy link

minusdavid commented Jul 1, 2022

I'm curious about this one as well.

I can open NotoSansTC-Bold.otf in Fontforge then "CID > Flatten" and "Generate Fonts" as a TTF which seems to work although normal spaces get shown as tofu... and if I copy text and paste it elsewhere it's mostly gibberish... even if I go "Encoding > Reencode > Unicode Full".

I'm just looking to get a TTF that can display Traditional Chinese and English. It seems so close and yet so far.

@minusdavid
Copy link

Interestingly I used FontForge to convert the WenQuanYi font (https://packages.ubuntu.com/bionic/fonts-wqy-zenhei) from TTC to TTF and that worked all right for the particular strings I needed to print. No "tofu" space like with NotoSansTC-Bold...

@HinTak
Copy link

HinTak commented Jul 1, 2022

Well, the fonts shipped by the OSes (whichever it is) are quite adequate for general use. This issue is about a bug / design flaw in font forge which can only be seen with some large fonts. Some large fonts (Noto in particular) can have some glyph shapes have multiple glyph ids. Easier to explain if you know chinese, but let's make up a trivial example: a horizontal bar mid-way up, ie "-" . This is just the dash/hyphen in English. But the font designer can also reuse the shape (if it is suitably wide) as the chinese character "1". And the chinese character "2" is two horizontal bars, and the font designer can decide to reuse the shape "=" for equal sign. While this is rare, it is useful to save spaces when you have a lot of glyph shapes. Font forge cannot cope , and assign the higher / rarer-used unicode value to the shape when flattening, if such duplicate decision is made.

My script de-duplicates by assigning the lower/ more common value to a glyph when flattening during conversion. You lose the usage of the higher unicode values but they tend to be rarer so don't matter in most cases.

WQY likely doesn't make this decision of saving space on identical shapes having different meanings.

@HinTak
Copy link

HinTak commented Jul 1, 2022

The message from the script tells you that some glyph shapes ( a few hundred out of 655xx ) are de-associated from the higher unicode range 0x02914Du to 0x02F9F4u , for example from above. This is one of the higher CJK extension range for rare/historical chinese. So unless you are writing electronically a scholarly article on ancient Chinese text, you probably won't miss those.

@ctrlcctrlv
Copy link
Member

@HinTak Thanks, that's a useful explanation of the problem! I had always assumed that the Noto CJK issues relate to cmap (see also the section in the Community guidelines on CID maps, which is related but not the same).

I honestly don't know how useful it is to be able to edit Noto CJK in FontForge because it's such a complex font. Why not use the AFDKO toolchain which is how it's intended to be built? What am I missing here?

@ctrlcctrlv
Copy link
Member

Like normally when I want to contribute a change to an open source font, if the font was made in free software, I use whatever software the font was built in. For Noto CJK that's AFDKO.

@tmccombs
Copy link
Contributor

tmccombs commented Jul 2, 2022

In my case I need to:

  1. Convert the font to an svg font, which afaik you can't do with afdko
  2. Allow users to process arbitrary fonts, and Noto CJK, currently fails because of this.

@ctrlcctrlv
Copy link
Member

What changes need to be made to arbitrary fonts?

and

What do you mean by an SVG font? SVG-in-OT or deprecated font?

@HinTak
Copy link

HinTak commented Jul 2, 2022

I was first notified of this flawed interaction between fontforge and NotoCJK* in a TeX related mailing list I subscribe to. For general usage (with Linux/fontconfig etc) of course they work fine.

  • Some special usages like TeX, and with @tmccombs (which tools do you use to convert to svg outlines?), they need fonts in certain specific font formats.
  • some older font formats have internal limitations around 65536 shapes. That's large for most westerners, but not for the chinese who deals with historical and rare writings.

@tmccombs
Copy link
Contributor

tmccombs commented Jul 2, 2022

Basically what I do, is I allow users to upload custom fonts, which I then convert to an svg font using font forge with python (open the font, flatten it, do a couple other operations, then export as "svg"), then parse that svg and convert to commands that JavaScript uses to draw the glyphs onto a canvas element. It's not really an ideal flow, but it predates webfonts, and at this point changing to use webfonts would be a pretty massive investment, especially without breaking existing user content. And there some advantages to it, for example, the text is rendered exactly the same across different browsers and OSes, and you can do interesting transformations (think word art) that aren't really possible using the canvas text drawing primatives.

My use case is probably pretty unusual, but I figured I'd share it.

@minusdavid
Copy link

Thanks for the explanation, @HinTak .

I know Chinese reasonably well but I don't think I understand fonts and FontForge well enough to fully comprehend the concept of glyph shapes taking multiple glyph IDs.

If I open NotoSansTC-Bold.otf (downloaded via Google web site rather than through a Linux package manager), go to "CID", and choose NotoSansTC-Bold-Proportional, I can see a hyphen... but yeah no real connection to Unicode it seems.

Thanks for the high level explanation. Something for me to look into further...

@HinTak
Copy link

HinTak commented Jul 4, 2022

@minusdavid you mis-worded it a bit. "Glyph id" is a tech term in the opentype specification and you cannot hand-wave it... One shape is indeed one glyph id (to be predantic, some shapes may not [yet] have glyph ids, and are somewhat dead/unused/work-in-progress, within a font file). The problem is that the noto fonts occasionally have one visual shape (one glyph id) corresponding to multiple logical meanings (multiple unicode points). Glyph id is an index to visual shapes/drawings, unicode value is an index to logical / textual meanings.

@minusdavid
Copy link

@HinTak I understand Unicode code points although I don't know fonts and FontForge well enough to see where that mapping happens.

But if I understand you correctly, it sounds like when you have 1 glyph mapping to multiple Unicode code points, FontForge will flatten the font so that the glyph just maps to the highest Unicode code point and leaves the lower Unicode code points undefined? So then when I use FontForge to re-encode specifically as Unicode the table is incomplete at the lower end?

I suppose that could make sense with a hyphen but I don't think it would explain why the space is missing? Except I suppose multiple "subfonts?" might have spaces and FontForge gets confused and just doesn't define a space at all as a result?

In the end, I manually added a space and hyphen to round out the ASCII range, so it wasn't a big drama, but I am interested in understanding what's happening overall haha.

@minusdavid
Copy link

For instance, if I open "NotoSansTC-Bold.otf" with FontForge and look at say the letter "A" (which is under NotoSansTC-Bold-Alphabetic), it says it's Unicode Char is "A" which is right, but it says it's Unicode value is "U+ff21" and Glyph Name is "uniE6C7", but the real Unicode code point for "A" is U+0041, so... I'm confused haha.

@minusdavid
Copy link

Maybe it's just too large of a font for me to wrap my head around as a beginner too perhaps.

@HinTak
Copy link

HinTak commented Jul 4, 2022

U+FF21 is full-width 'A' . See https://www.compart.com/en/unicode/U+FF21 . There is also a half-width 'A' unicode value.

The half width character is a Japanese construct - to mean a "A" that is half the width so two of those will fit a kanji space.

In Arabic and perhaps others, for example, there are variants of "space" character - "non-breaking spaces" where it is preferred not to break a line there. Perhaps it is one of those, why the space shape have multiple unicode values.

@chianjin
Copy link

chianjin commented Jul 5, 2022

@HinTak
I use the cjk-multi-fix.py to fix SourceHanSerifTC-Regular.otf, but failed.

It prints massages below:
"The 'sfnt' format is currently limited to 65535 glyphs, and your font has 65992 of them."

@HinTak
Copy link

HinTak commented Jul 5, 2022

@chianjin that message does not come from my script - maybe one of the dependent components? Anyway, please post the exact place where you got the font file...

@HinTak
Copy link

HinTak commented Jul 6, 2022

@chianjin my script updated ( still at https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/cjk-multi-fix.py ) - interesting issue: the glyph count is off by 2, as fontforge automatically inserts two glyphs, .null , CR , when they are missing.

These two were mandatory up to opentype 1.7, and not any more with 1.8 . So I think what happened is this: newer Source Han CJK no longer have them. i.e. the script used to work for older Source Han CJK (which contain those), Source Han CJK got updated to the latest spec, fontforge hasn't (or may not want to, for font usages on older systems).

@HinTak
Copy link

HinTak commented Jul 6, 2022

BTW, I am seeing piles and piles of "No glyph with unicode U+0XXXX in font", just using fontforge to load one of the Source Hans's (not even talk about later, or with any python scripts, just starting fontforge with one of those, plain). I am afraid it is a fontforge issue - the glyphs are duplicated in the CJK Compatibility Ideographs range, and fontforge cannot cope with "one glyph <-> multiple unicode points". glyphs in the "CJK Compatibility Ideographs" range are necessarily of that nature - one shape for multiple unicode values.

Read https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests