New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cidmap parsing problem for Source Han Sans #1582
Comments
|
@medicalwei, I don't entirely understand. Is FontForge at fault, or is the input typeface at fault? |
|
The cidmap has a shorthand which can be mapped only one character in one line rather than a range of characters. I think it is a missing compliation issue of Fontforge. But I am wondering if it is a standard that Fontforge should follow, so I don't know which is at fault. |
|
So, to clarify, the upstream test case includes a map that individual character ranges, and FontForge fails to parse those since it expects to have an additional argument per line for ranges? |
|
I did some digging, and FontForge supports the begincodespacerange listing type but not the begincidchar listing type in CMap files. This seems to be the root of the problem; I'll take a look. |
|
@medicalwei I added support for the cidchar syntax in branch |
|
@frank-trampe Do you happen to have that branch around still? I've run into the same issue and I can verify the fix. Otherwise I'll attempt it myself and submit a pull request. |
|
Any updates on this issue? |
|
The branch is back. See #2827. |
|
Hi, guys. We're going to be entering a feature embargo shortly in preparation for a new release. It would be nice to get some testing and feedback on that branch so that we can get it merged. (Paging @zerng07, @jeffska, @medicalwei.) |
|
@Toufukun, I did not quite understand your issue, and I don't have time to go digging right now. In what file do you find that limit enforced? Is that the only place? The limit is on Unicode value, not encoding size, right? Is there no obvious reason for enforcing the limit as it is? What would be an appropriate limit? |
|
By using this branch, opening the source PS file (cidmap.ps.xx) following flatenbycmap with UTF32 upstream cidmap (such as UniSourceHanSansTW-UTF32-H), 李 U+674E and 理 u+7406 are not missing. [1] [2] However, opening the source PS file following flatenbycmap with UTF16 upstream cidmap (such as UniSourceHanSansTW-UTF16-H), "Encoding Too Large" problem described by Toufukun was occurred.[3] |
|
@zerng07, that's not a regression, though, right? |
|
No, that's not. The "Encoding Too Large" problem which encountered by flatenbycmap with UTF16 upstream cidmap had existed for sometime, not introduced by this branch. |
|
@frank-trampe I haven't been working on these things for quite a long time so maybe I can't remember very clearly. And I have Windows only. I don't know how to try the latest version. I just remember none of these files Adobe provided in the Some Han characters are encoded repeatly due to some compatibility reasons like 理 is encoded at |
|
@Toufukun try this build: https://ci.appveyor.com/project/fontforge/fontforge/build/1.0.192/artifacts As far as I know, the cidmap file type is a FontForge specific file type (I don't know how it's generated or made). The Adobe CMap files are used by the 'Flatten by CMap' option. Note on Windows, I think you have to manually specify the cidmap file path which should be under share/fontforge |
|
I'm having a similar issue. I'm using Source San Hans for a game. I can flatten using the UTF-32 cmap. Then I use a Fontforge script to remove all the unused characters, then I save the font to a ttf. However after that process, some characters are missing and no longer display. The characters that are missing all seem to be encoded as multiple Unicode characters with the same CID. |
|
I have a possibly related problem - #3080 - while the bulk of it seems to be that detached glyphs are trashing the encoding table, a few glyphs are missing from the conversion - that includes 李 U+674E being missing. |
|
(Sorry for repeating the cut-and-paste) It would be nice if MultipleEncodingsToReferences() does the right thing - it currently does not. What's happening is that the Source CJK fonts / Noto CJK fonts have about 400 glyphs having more than 2 coding points; actually about a dozen have 3. All but one is silently dropped by fontforge. So you get about 400 coding points missing when re-encoding. The worst part of fontforge behaviour is that it uses the last one it sees as authoritative - that's often the CJK Compat variant range, rather than the lower CJK Unified region. So the lower and more often-used CJK Unified code range ended up having about 400 glyphs missing. |
|
The glyph 李 for U+674E in the CJK Unified region is also encoded as U+F9E1 in the CJK Compat-Ideograph region; and unfortunately fontforge takes the latter as authoritative... |
|
My freetype-py script to fix fontforge's encoding problem is up at https://github.com/HinTak/freetype-py/blob/fontval-diag/examples/subfonts-script-generate.py |
|
I found the glyph missing on Source Han Sans still do exist right now. |
|
See if the "enhanced" version of Source Hans Sans/Serif Regular at |

This bug is separated from #1534 since opening the .otf directly isn't my case.
The cidmap parsing is incorrect for Source Han Sans. The links are cidmap of upstream and cidmap I manually converted. (which is parsed correctly)
Procedures
FlattenByCMapwith one of the test cases of below CIDMap Dropbox urls.What I see
U+674E0x2F9DFCIDMaps
Upstream (test case):
https://www.dropbox.com/s/05uj3touegoylcf/UniSourceHanSansTWHK-UTF32-H-Upstream
Converted and working reference:
https://www.dropbox.com/s/cr049gzowd9nw2n/UniSourceHanSansTWHK-UTF32-H-ShouldBe
The upstream source is from https://github.com/adobe-fonts/source-han-sans
The text was updated successfully, but these errors were encountered: