Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for all cmap table formats #105

Closed
fdb opened this issue Apr 3, 2015 · 20 comments · Fixed by #647
Closed

Provide support for all cmap table formats #105

fdb opened this issue Apr 3, 2015 · 20 comments · Fixed by #647
Assignees
Labels
enhancement Spec Related to the implementation of the Opentype specification

Comments

@fdb
Copy link
Contributor

fdb commented Apr 3, 2015

E.g. platformID = 1, encodingID = 0 as used in http://www.ivank.net/BRUSHSTP.ttf.

@fdb fdb mentioned this issue Apr 3, 2015
@Pomax
Copy link
Contributor

Pomax commented Jun 25, 2015

I'd somewhat advocate not bothering with this - the format is so old nothing makes these fonts anymore (the format 0 cmap is horrendendously inadequate for anything but toy fonts =). Adding support for more complex or new formats like 13/14 would be worth doing, but format 0 would add support for something we shouldn't even be using anymore.

@bitinn
Copy link

bitinn commented Jun 25, 2015

Looks like Apple just decided to use platformID = 0 for their default system font, see #139

@Jolg42
Copy link
Member

Jolg42 commented Jul 29, 2016

cmap 12 read support was just added with PR #207 😉

@fdb
Copy link
Contributor Author

fdb commented Jul 31, 2016

Any other important formats we should support?

@Jolg42
Copy link
Member

Jolg42 commented Jul 31, 2016

@fdb 4 is limited to 16 bit (Unicode Plan 1) & 12 to 32 bit (All Unicode Plans) they follow the same specification & it looks like they're the most common cmap tables.

I decompiled some fonts with FontTools & found that format 6 is also common.
So maybe the next step will be reading format 6 but if nobody is having a problem now, maybe we can wait before implementing it 😉

@Pomax
Copy link
Contributor

Pomax commented Jul 31, 2016

For proper opentype support, I'd consider cmap 4, 12, 13 and 14 essential: cmap 4 and 12 for "proper plain old unicode" support—4 mapping to UCS2, and 12 mapping to UCS4—and the (recently introduced) cmap 13 and 14 because opentype needs them for properly supporting many-to-one mapping, and variation selection mapping, respectively.

Although that said, many of the other formats are almost trivial to implement compared to subtables 4 and 12, so... I'd honestly just say "implement them all". If effort is already going into proper cmap handling, handling all of them is good target.

@Jolg42
Copy link
Member

Jolg42 commented Aug 1, 2016

@Pomax Nice to know!
I think that CMAP 12 writing is the most important right now but one day maybe we will support every format ;)

But before that we will need to change how the cmap tables are handled, because right now if the cmap 12 is found the cmap 4 is not read (this is not a problem as 12 is a superset including 4) but we can't do that if we're adding more formats.

By the way are the 13 & 14 well implemented now?

@Pomax
Copy link
Contributor

Pomax commented Aug 1, 2016

They're getting to.

I'm not sure why you'd skip 4 if 12 is found, though, but then I've not read the code in quite a while; keeping the UCS-2 and UCS-4 sets separate is generally a good idea, sometimes even with a cmap 0 for the 256 ANSI block, so the cmap parsing procedure is that you check which cmap subtables are available, then run through each of those to find your character index. The "does this character have an index according to this subtable" is a generally fast procedure, so you might "waste some time" looking in tables, but it will be negligible compared to the time necessary to render the glyph outline.

Also note that cmap 13 uses the exact same data structures and information coding as 12, except that the "start glyph" for a character range as used in 12 is simply considered "the only glyph" in 13, so if you have an implementation for 12 already, adding support for 13 (barring needing a rewrite on how characters are mapped through multiple cmap subtables of course) is virtually no extra work.

@Jolg42
Copy link
Member

Jolg42 commented Aug 1, 2016

@Pomax The cmap 12 support was recently added by @Vildan & I think it was just easier to skip 4 if 12 was found. If not, it will need a rewrite. For now, it's easier & performance-wise faster, but not future-proof!

Thanks for the details though!
Personally, I'm already busy with a lot of other things so feel free to contribute if you need to 😉

@Pomax
Copy link
Contributor

Pomax commented Aug 1, 2016

skipping 4 when 12 is found is a great way to not find characters that are definitely in the font, so filing an issue to make sure all sub tables are checked will be a good idea =)

as for contributing: I run an insane amount of projects already, so writing comments or just talking about how the opentype spec wants things done is a quick and easy job I am happy to do; reviewing code for whether an approach is sound is a bit more work, but typically still doable with a few 15 minutes here or there, but writing code is way more work than I have free time for at the moment =)

@fdb
Copy link
Contributor Author

fdb commented Aug 1, 2016

Hey @Pomax thanks for clearing that up. It sounds it'll be a good idea to keep all of them and do a lookup through them. Do you know if the spec says something about the order in which they should be looked up?

@Vildan
Copy link
Contributor

Vildan commented Aug 1, 2016

Because there are only format 4 and 12 now, and 12 is superset of 4, there is no need to read format 4 if a font has format 12 in it. And because cmap tables placed in ascending order, we can find format 12 before format 4. @Pomax, do you have an example when we skip characters if read only format 12? I ran this test on 4000+ fonts and didn't find a single font where format 4 gives some extra characters versus format 12

@Pomax
Copy link
Contributor

Pomax commented Aug 1, 2016

Rereading the spec, you're right; it quite literally says "Please note, that the content of format 12 subtable, needs to be a super set of the content in the format 4 subtable. The format 4 subtable needs to be in the cmap table to enable backward compatibility needs.". I'm curious if the OpenType spec revisions will remove this need for a cmap_4 in the future, but it does indeed fully justify not bothering with reading the subtable 4 format if format 12 is present.

@brawer
Copy link
Collaborator

brawer commented May 1, 2017

Here’s some test cases for cmap subtables; see README for how to run the test suite.

@laoshu133
Copy link

We create a font subset online DEMO that compares some of the differences between opentype.js and fonttools subsets, may be helpful.

http://fonter.dancf.com/examples/subset/

@mooman219
Copy link

mooman219 commented Aug 1, 2019

Technically by supporting format 12, you get format 13 for free right?

@jdimeo
Copy link

jdimeo commented Jan 14, 2020

I have a TON of PDFs that use 14. Just throwing my vote in for this- I have no idea what it's all about :-)

@Connum
Copy link
Contributor

Connum commented Nov 23, 2023

We are meanwhile supporting format 14 (via #581) as well as format/encoding 0 for platform 1 (via #634), which the issue was originally about. The provided example BRUSHSTP.ttf will load fine with the current master.

If anyone could provide a font using format 13, that would be great.

@Connum Connum self-assigned this Nov 23, 2023
@Connum Connum added enhancement Spec Related to the implementation of the Opentype specification labels Nov 23, 2023
@brawer
Copy link
Collaborator

brawer commented Nov 23, 2023

If anyone could provide a font using format 13, that would be great.

Added a test case using this font.

@Connum
Copy link
Contributor

Connum commented Nov 24, 2023

Format 13 will be supported via #647, which will close this issue. As discussed before, it's not worth the time to support obscure formats that will probably never be encountered in the wild. Anyone providing a real font with an unsupported format is still welcome to open a new issue for that, of course!

@yne yne closed this as completed in #647 Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Spec Related to the implementation of the Opentype specification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants