Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerate through all glyphs in font #56

Open
carlossless opened this issue Nov 17, 2016 · 6 comments
Open

Enumerate through all glyphs in font #56

carlossless opened this issue Nov 17, 2016 · 6 comments

Comments

@carlossless
Copy link

carlossless commented Nov 17, 2016

Hi,

Great library. I was wondering whether it would be possible to get the full list of glyphs that are contained in a font and their unicode representations.

As far as I understand, right now there's only the ability to get single code characters from the character set. It would be cool if you could get multiple code points like U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F467 and then get the glyph for their combination - 👨‍👩‍👧‍👦.

@devongovett
Copy link
Member

This is a very hard problem. It would essentially involve doing glyph substitution in reverse, from glyphs to characters instead of characters to glyphs. GSUB coverage tables may make this possible for opentype, but it would certainly not be easy especially when you consider the chaining substitution tables that are available. AAT uses state tables, so it would also be difficult if not impossible to go backward from matching states to the character sequences that might produce them. There may be an infinite number of combinations of characters that yield a single glyph.

@carlossless
Copy link
Author

Thank you for this illuminating answer. Main reason why I wanted this feature was to extract all SBIX glyph PNGs and assign them their appropriate unicode name. I guess there should be other better ways of parsing the font in order to get those resources.

@Pomax
Copy link
Contributor

Pomax commented Nov 21, 2016

You could mine the CMAP table data for that particular use case, since Unicode characters are unrelated to the visual changes that GSUB can effect. Find the coverage ranges, and run through the supported glyphs, resolving their glyph outline(s) one entry at a time?

@devongovett
Copy link
Member

See http://github.com/devongovett/apple-color-emoji

@moyogo
Copy link

moyogo commented Nov 21, 2016

hb_input does that (for GSUB): https://github.com/googlei18n/nototools/blob/master/nototools/hb_input.py

@devongovett
Copy link
Member

Played around with this a bit this weekend. See #60. It works for AAT-based fonts like Apple Color Emoji for now. You can try it out on that branch like this (I'll add a public API at some point). It returns an array of possible strings that would produce the given glyph.

font._layoutEngine.getStringsForGlyph(1039);
// => ['\u{1F468}\u200D\u{1F469}\u200D\u{1F467}\u200D\u{1F466}']

font._layoutEngine.getStringsForGlyph(730);
// => ['\u{1F3C3}', '\u{1F3C3}\u200D\u2642', '\u{1F3C3}\u200D\u2642\uFE0F']

Will probably update my apple-color-emoji package to use this at some point. Much easier to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants