Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please delete Adobe-Identity cidmap, or treat it specially. #3084

Closed
HinTak opened this issue Jun 9, 2017 · 8 comments · Fixed by #4993
Closed

please delete Adobe-Identity cidmap, or treat it specially. #3084

HinTak opened this issue Jun 9, 2017 · 8 comments · Fixed by #4993

Comments

@HinTak
Copy link

@HinTak HinTak commented Jun 9, 2017

I think in the past or even current, Adobe Acrobat Reader, and also the (x)dvipdfmx/XeTeX people treat(ed) the Adobe-Identity cidmap as a direct cid/gid to Unicode mapping, for the purpose of text extraction and what not.

Well, the Adobe Source CJK fonts, and Google Noto CJK fonts definitely not - in their usage, it is just "because I said so" custom encoding and carries no meaning whatsoever in relation to Unicode.

Unfortunately fontforge takes the former interpretation, and this causes the corruption of the encoding vector during Re-encode ( #3080 ). Simply deleting the file and forces fontforge not to treat Adobe Identity as direct cid->Unicode mapping, forces fontforge to treat the cmap properly and fixes the problem seen in #3080 .

So I suggest either simply deleting the cidmap, or at least provide an scripting API to selectively disable its use. Otherwise you'll not be able to process Adobe Source CJK / Google Noto CJK properly.

@HinTak
Copy link
Author

@HinTak HinTak commented Jun 9, 2017

This is also the main cause of #3079 .

@HinTak
Copy link
Author

@HinTak HinTak commented Jun 10, 2017

I think only cid-key fonts embedded in pdf's should be treated as unicode direct - standalone opentype fonts with CFF outlines should use the cmap, and only the cmap, for coding purposes.

@HinTak
Copy link
Author

@HinTak HinTak commented Jun 12, 2017

One suggestion I might make would be to make it a user preference.

Anyway, my solution was to rename the file temporarily so that fontforge cannot find it.

This comes about because Ubuntu ships it separate as extra, and can convert the font somewhat more correctly than Fedora's. Fedora ships fontforge complete as one package. See the entire May/June traffic ( http://lists.nongnu.org/archive/html/cjk-list/ ) .

@HinTak
Copy link
Author

@HinTak HinTak commented Jun 15, 2017

See also comment from @kenlunde http://typedrawers.com/discussion/comment/28483#Comment_28483

 The Identity-H encoding is used to refer to glyphs by their CIDs regardless of their ROS (Registry, Ordering, and Supplement). Per the PDF Language Reference Manual, it maps two-byte character codes ranging from 0 to 65,535 to the same two-byte CID value, interpreted high-order byte first. It has nothing to do with a mapping from Unicode. That mapping is handled via explicit Unicode mappings, or via a ToUnicode mapping table, which maps said Identity-H CIDs to meaningful Unicode values.

@kenlunde
Copy link

@kenlunde kenlunde commented Jun 15, 2017

When processing a font that uses the Adobe-Identity-0 ROS, such as the open source Source Han Sans and Source Han Serif families, along with Kazuraki, it is prudent not to assume anything about its glyph set, and instead depend on the mappings in the 'cmap' table to derive Unicode mappings. In other words, Adobe-Identity-0 ROS OpenType/CFF fonts should be treated like typical TrueType fonts with regard to how their glyphs correspond to Unicode code points or sequences.

@frank-trampe
Copy link
Contributor

@frank-trampe frank-trampe commented Nov 29, 2017

@HinTak, if we fix the multiple encoding problem correctly, it would fix this too, right?

@HinTak
Copy link
Author

@HinTak HinTak commented Dec 3, 2017

No - Fontforge seems to merge Adobe-Identity as an extra cmap.

Alternatively I suppose the answer is yes - to fix the multiple encoding problem correctly in the practical sense (the common case of trying to edit Adobe San CJK) , Fontforge needs to be able to ignore Adobe-Identity cidmap somehow... So deleting/ignoring Adobe-Identity cidmap is part of the steps towards fixing the multiple encoding problem in Adobe San CJK.

@kenlunde
Copy link

@kenlunde kenlunde commented Dec 4, 2017

It seems that I need to re-state what I wrote on June 14th:

In other words, Adobe-Identity-0 ROS OpenType/CFF fonts should be treated like typical TrueType fonts with regard to how their glyphs correspond to Unicode code points or sequences.

This means that the Unicode mappings should be derived only from the font's 'cmap' table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants