Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode unencoded glyphs as F0000 + hex(GID) #388

Open
davelab6 opened this issue Aug 20, 2014 · 69 comments

Comments

@davelab6
Copy link
Member

commented Aug 20, 2014

I was chatting with @twardoch today about how glyph names are the 'primary key' for fonts, because in any contemporary font you have so many unencoded glyphs, accessed with OpenType Layout logic... But unencoded glyphs are tricky to precisely call, because OTL logic is per-font. I mentioned that I might like to use the Unicode Private Use Area to encode otherwise-unencoded glyphs.

Adam kindly mentioned he already thought about this, and he concluded that the Private Use Plane A (Unicode Plane 15) is ideal for this, as its U+F0000..U+FFFFD so you can use a value of F0000 + hex(GID) to cleanly, logically, encode all unencoded glyphs.

Let's do it!

  • add a "3.10" cmap subtable
@davelab6 davelab6 added this to the 2014-Q3 Week 21 milestone Aug 20, 2014
@davelab6 davelab6 added the build label Aug 20, 2014
@twardoch

This comment has been minimized.

Copy link

commented Aug 20, 2014

All it requires is a simple tool which adds a "3.10" cmap subtable which maps glyph ids to the PUP A (sequentially, by adding 0xF0000 to the glyph ID). Because the codepoints will be in the same order as the glyph IDs, you can use the space-saving cmap format 12 which only defines the start and end of the cmap mapping range. So the added size overhead is small.

@vitalyvolkov

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2014

@behdad Is there any way to define that glyph is unencoded using fontTools?

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Sep 2, 2014

@hash3g you can find the glyph IDs in the GlyphOrder table, eg https://github.com/hash3g/yesevaone/blob/master/YesevaOne-Regular.ttf.GlyphOrder.ttx and you can find glyphs that are encoded in the cmap table. I guess you need to make a set of the glyph names and a set of the encoded glyph names and compare them to get the set of unencoded glyphs.

@twardoch

This comment has been minimized.

Copy link

commented Sep 2, 2014

I would advocate that it would be much simpler (and more storage-effective) if you just encode ALL glyphs as U+F0000 + GID.

  1. Check if there already is a cmap subtable with PID 3 EID 10 (3.10 for short). If it dors, skip to step 4.
  2. Create a cmap subtable in format 12 and assign it to the cmap table as 3.10
  3. Copy all mapping from the PID 3 EID 1 (3.1 for short) cmap subtable to the 3.10 subtable, as the spec requires 3.10 to be a superset of 3.1.
  4. "Blindly" assign mapping to all glyphs from GlyphOrder from U+F0000 to U+F0000 + len(GlyphOrder) - 1.

This has the advantage that cmap subtable format 12 uses an efficient storage for continuous code-to-GID ranges. With my method, you'll only create one such range, so it'll only add a few bytes to the size, and will be very fast.

This approach has an additional benefit:
As a user of such font, I am are not forced to address the properly (i.e. via Unicode) glyphs encoded using the F0000+ codes.

I can still use the proper Unicodes. But if I do so, the browser/app will always perform the Unicode processing and default OpenType Layout shaping for complex scripts. So I won't really have the guarantee that the glyph I'm seeing is actually the glyph assigned to the Unicode codepoint in the font's cmap. It will be for most Unicodes but for some codepoints, the "Unicode+OTL magic" will kick in.

But if I address even the "properly" encoded glyphs using the U+F000+ codepoint, I will have a WYSIWYG guarantee. Even more: with harfbuzz.js, I can run a JS port of HarfBuzz in the browser, take the output GIDs, add F000+ to them and have my own explicit custom OTL processing if I need to. So I'm completely in control and independent of any "browser magic".

@twardoch

This comment has been minimized.

Copy link

commented Sep 2, 2014

Here is my code that does exactly what I described above.

#! /usr/bin/python
# -*- coding: utf-8 -*-
# 
# pyftaddspuaabygids.py
# Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF) 
# by 0xF0000 + glyphID
#  
# Copyright (c) 2014 by Adam Twardoch
# 
# Licensed to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import fontTools.ttLib, sys, copy

def addSPUAByGlyphIDsMappingToCMAP(ttx):
    cmap = ttx["cmap"]
    # Check if an UCS-2 cmap exists
    for ucs2cmapid in ((3, 1), (0, 3), (3, 0)): 
        ucs2cmap = cmap.getcmap(ucs2cmapid[0], ucs2cmapid[1])
        if ucs2cmap: 
            break
    # Create UCS-4 cmap and copy the contents of UCS-2 cmap
    # unless UCS 4 cmap already exists
    ucs4cmap = cmap.getcmap(3, 10)
    if not ucs4cmap: 
        cmapModule = fontTools.ttLib.getTableModule('cmap')
        ucs4cmap = cmapModule.cmap_format_12(12)
        ucs4cmap.platformID = 3
        ucs4cmap.platEncID = 10
        ucs4cmap.language = 0
        if ucs2cmap: 
            ucs4cmap.cmap = copy.deepcopy(ucs2cmap.cmap)
        cmap.tables.append(ucs4cmap)
    # Map all glyphs to UCS-4 cmap Supplementary PUA-A codepoints 
    # by 0xF0000 + glyphID
    ucs4cmap = cmap.getcmap(3, 10)
    for glyphID, glyphName in enumerate(ttx.getGlyphOrder()):
        ucs4cmap.cmap[0xF0000 + glyphID] = glyphName

def usage():
    print "Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF) by 0xF0000 + glyphID"
    print "python %s inputfile[.otf|.ttf] outputfile[.otf|.ttf]" % sys.argv[0]

if len(sys.argv) == 3:
    inpath = sys.argv[1]
    outpath = sys.argv[2]
    ttx = fontTools.ttLib.TTFont(inpath, 0, verbose=0)
    addSPUAByGlyphIDsMappingToCMAP(ttx): 
    ttx.save(outpath)
    ttx.close() 
else: 
    usage()
@behdad

This comment has been minimized.

Copy link

commented Sep 3, 2014

I categorically reject this and think it's a bad idea. Nowhere in this report I see any reasoning for why this is needed or is a good idea.

@twardoch

This comment has been minimized.

Copy link

commented Sep 3, 2014

Ah, yes. We talked with Dave about this. Sorry it didn't become clear.

The idea is not to do this for production-ready fonts but for the purpose of development, to be used within the context of document-driven type design and similar such applications.

In a way, think of it as the "debug" mode of building fonts. Such debug mode might include other options that generate some redundant data (such as, well, glyph names! :) ) which is useful while designing but when building fonts in "release" mode, this stuff should not be included.

@behdad

This comment has been minimized.

Copy link

commented Sep 3, 2014

Ok, sure. Yeah, that would be useful.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2014

@behdad could you explain more about why you think this is a bad idea..? You think that if all Google Fonts have this feature, that it will increase the use of PUA characters and documents tightly bound to particular fonts in general usage?

@vitalyvolkov vitalyvolkov assigned andriyko and unassigned vitalyvolkov Sep 4, 2014
@behdad

This comment has been minimized.

Copy link

commented Sep 4, 2014

@davelab6 for the same reasons that non-Unicode encodings are bad. This is even worse, this is full custom encoding, meaning any text encoded in those is illegible to any text processing use.

@vitalyvolkov vitalyvolkov assigned davelab6 and unassigned andriyko Sep 4, 2014
@vitalyvolkov

This comment has been minimized.

Copy link
Contributor

commented Sep 4, 2014

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Sep 4, 2014

@behdad I am skeptical that this would find any general usage.

It its a secondary method that is not for text processing, but debugging: it is supplementing, not replacing, the unicode encodings and OTL tables.

Part of Document Driven Type Design is having good examples to refer to; specifically for the re-implementation of http://fuelproject.org/utrrs/index (which is the result of a 24-hour overnight sprint, but the concept is valid and needed.)

Since we don't have OTL processing in <canvas>, I figure this secondary encoding would be the best way to get that done. And the good examples will be in the production Fonts API.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Sep 4, 2014

@hash3g for now, can you make this optional in the same way as fontcrunch is optional, via bakery.yml and set up page?

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 5, 2016

Per TypeThursday's Laura Worthington article we should consider this, perhaps only for display fonts, if its become important for casual users of desktop fonts.

@anthrotype

This comment has been minimized.

Copy link
Member

commented Feb 5, 2016

What's this article you are referring to?

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 5, 2016

@anthrotype

This comment has been minimized.

Copy link
Member

commented Feb 5, 2016

ok. I hope you don't want to revive the idea of using PUA codes in released fonts...

@behdad

This comment has been minimized.

Copy link

commented Feb 5, 2016

Oh its not out yet. Stay tuned.

lol. Ping us when it is. That said, people have had bad ideas forever; doesn't mean we should support them. I'm more willing to implement a HarfBuzz tool to render arbitrary glyphs than to add a hack in fonttools.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 10, 2016

They will resort to some glyph palette insertion or some fancy PUA codes only if they're really “desperate” i.e. when they really have no other choice.

Right, that is why Laura uses the BMP PUA, and I agree that this would be better for her (and generally.)

One remaining question for me is if this should be done for all fonts or only OT-intense display fonts.

@twardoch

This comment has been minimized.

Copy link

commented Feb 10, 2016

I don’t have an opinion on that.
However, I want to add one thing: in early OpenType days, Adobe used a portion of the BMP PUA as a “corporate use area”, where they standardized certain codes for things like small caps, oldstyle numerals or certain ligatures. So, an oldstyle “3” orva smallcap “A” always had a certain code regardless of the Adobe font used. Now that was a bad idea because this practice created an illusion that these codes had some claim of universality, or longtime relevance. So they stopped using PUA after a few years.

But with purely font-specific encoding, I don’t see this as a problem. If you have a series of SPUA codepoints assigned to correspond to GIDs in a specific font, then everyone agrees that no machine “knows”, or is expected to know, what any of these codepoints “mean”. As long as all sides agree that no presumptions can be made, I think it’s fine.

In Adobe’s case U+F761 was semi-standardized as “smallcap A”, and all their early OTFs used U+F761 as small-cap A, so the danger was that some apps might start expecting that U+F761 just “means smallcap A”. But with the purely GID-oriented SPUA, U+F0761 will mean something else with every single font. So it really is “private”, and substituting fonts will yield unecpected results.

Which is fine because users will more likely not except any stability of this encoding and will use it mostly as an input mechanism for specific glyphs in very specific situations — often with the goal being print, or laser cut, or automated engraving. Most of these laser cut or engraving apps have no OT features UI and never will be.

So SPUA entry may be the only way for the user to get work done. If the user could find a better method, they’d already be using it.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 10, 2016

As long as all sides agree that no presumptions can be made, I think it’s fine.

I worry that glyphs/fontmake might create predictable map from common unicodes to GID ordering...

@felipesanches

This comment has been minimized.

Copy link
Member

commented Feb 10, 2016

I risk saying something stupid here, but if a strict mapping should not be infered to be normative, then can't these tools randomise mappings on purpose?

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 10, 2016

randomise mappings on purpose?

That seems wise, to mitigate the practical problem with PUA text.

@kenlunde

This comment has been minimized.

Copy link

commented Feb 11, 2016

PUA Text: Hope for best, expect the worst.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

@kenlunde I understand the categorical criticism of this, but I'm curious what your advise is for Laura Worthington. What should she do to serve her customers?

@kenlunde

This comment has been minimized.

Copy link

commented Feb 11, 2016

@davelab6: I wrote the following at the bottom of page 162 of CJKV Information Processing, Second Edition: "The use of PUA code points should be avoided at all costs, because their interpretation, in terms of character properties, and their interaction with legacy character set standards (in other words, interoperability) cannot be guaranteed."

With that said, Laura's article suggests that PUA usage is a necessary evil in order to access glyphs that are not directly encoded. However, the main caveat is that the more an implementation depends on PUA code points, the more closed said implementation is.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

the more closed said implementation is

I'm not entirely sure what you mean by 'closed,' please could you clarify? :)

I am framing this as fallback to help users of implementations that are so poorly done that this is the only way to make the font useful. It isn't that implementations should depend on the PUA codepoints for providing large glyph sets to users, it is that they are oblivious to this need and PUA is a workaround for people who are held hostage by these implementations' incompetence.

@kenlunde

This comment has been minimized.

Copy link

commented Feb 11, 2016

@davelab6: What I mean by closed is that the implementation is closely tied to the PUA mappings of a particular font, and changing the selected font to anything else is virtually guaranteed to result in illegible text.

Furthermore, in the context of font fallback, I would claim that PUA usage is at least an order of magnitude more dangerous than environments that do not employ it, because it is not clear from which font (or fonts) the glyphs are being displayed.

Also, for fonts with glyphs that are not encoded and require OpenType feature support to access them, there may also be metrics-related dependencies that would likely be overlooked by simpler apps, meaning that even if a user is somehow able to enter a glyph via a PUA code point, the resulting glyph may not behave as expected, due to limitations in the simpler authoring app.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

Hmm.

So, it seems that Laura should simply pyfeafreeze her font into a set of single-style font families, and I should set up a freezemyfont.com site to make it easy for regular people to freeze their libre OT fonts to use in such implementations, and maybe lobby Font Squirrel to add this to the Web Font Generator.

Would that be wiser than recommending SPUA-GID encoding?

@khaledhosny

This comment has been minimized.

Copy link

commented Feb 11, 2016

Where is the glyph positioning of all this (so 60s) PUA talk, it is not like OpenType is only about glyph replacements. OK you are not building fonts that need mark positioning or all this fancy stuff, but what about kerning?

@kenlunde

This comment has been minimized.

Copy link

commented Feb 11, 2016

@davelab6: I am not sure what pyfeafreeze is and its ramifications. I have no strong objections to someone who feels the need to use PUA code points, and my point is that those who decide to go down that path simply need to understand the consequences, which is that nothing is guaranteed, and any oddball behavior is likely to be related to the decision to use PUA code points.

@khaledhosny: That was sort of my point, specifically that there is more to glyphs that merely having them encoded. Perhaps such an approach works for Western fonts, but I can pretty much guarantee that it will crumble when non-Western fonts enter into the picture.

@twardoch

This comment has been minimized.

Copy link

commented Feb 11, 2016

All OpenType substitutions and positioning happen on the glyph level. Principally, it doesn't matter which codepoints a glyph is invoked through.

Of course since the OT Layout model relies on per-script shaping engines, reordering happens or certain features are automatically applied in a specific order when glyphs are invoked via their true Unicode codepoints.

If the same glyphs are invoked via PUA, the shaper probably classifies them as "DFLT" script. Indeed if a PUA codepoint is inserted in the middle of "true" Arabic or Devanagari or even Latin or Cyrillic text, some OTL engines may interpret that single glyph inserted via PUA as a separate run, and then would not execute feature interactions with meighboring glyphs (kern, mark etc.). Which does indeed pose a problem.

Other engines may fold that glyph into the dominant run and it will work. Inconsistent run itemization and inability to perform positionings or contextual substitutions across run boundaries is indeed a very weak aspect of OTL.

But if the PUA-invoked glyphs end up to be in the same run, universally applied default features like kern or mark, and any explicitly user specified features, both GSUB and GPOS, will work.

The pyftfeatfreeze method isn't problematic. It only remaps the "cmap" and typically does so within the same languagesystem, so the new default glyphs (assigned to Unicode codepoints) remain within the same script run, and everything works as it should, unless your features do really weird circular stuff.

For example, if you have swash Arabic glyphs in the "swsh" feature inside the "arab" languagesystem, these glyphs would normally already partake in the init, medi, fina, isol, curs or mark features in the original font. If you freeze the swsh feature using pyftfeatfreeze, those swash glyphs get mapped as the default Arabic letters in the cmap table, but since they get classified as the Arabic run, are fed into the Arabic shaper and already partake in the "arab" features defined in the font, everything works as expected.

Freezing "swsh" is sometimes even a better method than applying a user defined feature to one character via, say a span with a local style="font-feature-settings: 'swsh'" property, because doing the latter may force a run break (or generally cause the span to be rendered in a separate step), which also stops the interaction with the neighbors -- unless the higher-level text engine is smart enough to detect and ignore certain span changes or somehow fuses the line together before passing it to the OTL engine.

Again, this is all shit. :)

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

On 11 February 2016 at 09:13, Adam Twardoch notifications@github.com
wrote:

If the same glyphs are invoked via PUA, the shaper

Hmm. Do we have concrete examples of implementations that have an OT
shaper, but no UI for users to set/unset non-default features? (I'll ask
Laura on TypeDrawers about this.)

@twardoch

This comment has been minimized.

Copy link

commented Feb 11, 2016

@kenlunde pyftfeatfreeze is a tool I published which e.g. could turn the Source Han Sans superfont that uses the OT features to switch between SC, TC, J and K variants into a series of SC, TC, J and K fonts which all have the appropriate variants mapped in the cmap. In fact, after you published Source Han Sans, I realized that such an approach may no longer as frowned upon as it used to be by some people, and I finished and published the tool (I had a simple working version of it for a long time now).

@twardoch

This comment has been minimized.

Copy link

commented Feb 11, 2016

@davelab Start with Notepad. :) I supports calt (contextual alternates) but gives no way to override its results. Zapfino Extra LT Pro that I made in 2003 shows how this works -- you see substitutions happening as you type but if you finished typing, there is no way to pick another variant. And then virtually every Windows app that uses the standard GDI Windows text controls, any Photoshop clone, older versions of Word or Corel Draw, and tons of simple "add text to image" apps, or apps for vinyl sign cutting or CAD apps or motion graphics/video editing apps or apps that add subtitles or captions etc. All that uses the standard text stack on Windows.

@davelab6

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2016

Right. I see why this is a really big problem 😠

Looking back at Laura's post on TD, she says that feeding the new hobbyist retail market PUA fonts is increasing their use and appreciation of fonts, that puts pressure on app developers to provide better typography features.... but the example is a glyph picker, not a real OpenType UI. I wonder if any apps have added OT UIs, so I've updated my question to Laura to be about that :)

@khaledhosny

This comment has been minimized.

Copy link

commented Feb 11, 2016

That swash example makes no sense at all, since you usually want to enable it for just part of the word (at least for Arabic), and you can’t do that with a “frozen” font. That is really just another hack suitable for certain fonts. That CSS font-features currently force run breaks is an implementation bug, Firefox, for example, apply many style changes across spans without breaking OpenType logic and it should do that for font features too (since HarfBuzz already support that).

@khaledhosny

This comment has been minimized.

Copy link

commented Feb 11, 2016

@dave I don’t see how kern table can handle all kinds of kerning supported by GPOS pair positioning, nor how it is certain that the kern table will be always supported in these situations.

@twardoch

This comment has been minimized.

Copy link

commented Feb 11, 2016

@khaledhosny Replace "swash" with any other, e.g. one variant letter accessible through ss02 that you want to appear consistently in your text, or a localized form. Though admittedly, pyftfeatfreeze does not currently allow to freeze a feature for only some glyphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.