-
Notifications
You must be signed in to change notification settings - Fork 105
Encode unencoded glyphs as F0000 + hex(GID) #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
All it requires is a simple tool which adds a "3.10" cmap subtable which maps glyph ids to the PUP A (sequentially, by adding 0xF0000 to the glyph ID). Because the codepoints will be in the same order as the glyph IDs, you can use the space-saving cmap format 12 which only defines the start and end of the cmap mapping range. So the added size overhead is small. |
@behdad Is there any way to define that glyph is unencoded using fontTools? |
@hash3g you can find the glyph IDs in the |
I would advocate that it would be much simpler (and more storage-effective) if you just encode ALL glyphs as U+F0000 + GID.
This has the advantage that cmap subtable format 12 uses an efficient storage for continuous code-to-GID ranges. With my method, you'll only create one such range, so it'll only add a few bytes to the size, and will be very fast. This approach has an additional benefit: I can still use the proper Unicodes. But if I do so, the browser/app will always perform the Unicode processing and default OpenType Layout shaping for complex scripts. So I won't really have the guarantee that the glyph I'm seeing is actually the glyph assigned to the Unicode codepoint in the font's cmap. It will be for most Unicodes but for some codepoints, the "Unicode+OTL magic" will kick in. But if I address even the "properly" encoded glyphs using the U+F000+ codepoint, I will have a WYSIWYG guarantee. Even more: with harfbuzz.js, I can run a JS port of HarfBuzz in the browser, take the output GIDs, add F000+ to them and have my own explicit custom OTL processing if I need to. So I'm completely in control and independent of any "browser magic". |
Here is my code that does exactly what I described above. #! /usr/bin/python
# -*- coding: utf-8 -*-
#
# pyftaddspuaabygids.py
# Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF)
# by 0xF0000 + glyphID
#
# Copyright (c) 2014 by Adam Twardoch
#
# Licensed to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import fontTools.ttLib, sys, copy
def addSPUAByGlyphIDsMappingToCMAP(ttx):
cmap = ttx["cmap"]
# Check if an UCS-2 cmap exists
for ucs2cmapid in ((3, 1), (0, 3), (3, 0)):
ucs2cmap = cmap.getcmap(ucs2cmapid[0], ucs2cmapid[1])
if ucs2cmap:
break
# Create UCS-4 cmap and copy the contents of UCS-2 cmap
# unless UCS 4 cmap already exists
ucs4cmap = cmap.getcmap(3, 10)
if not ucs4cmap:
cmapModule = fontTools.ttLib.getTableModule('cmap')
ucs4cmap = cmapModule.cmap_format_12(12)
ucs4cmap.platformID = 3
ucs4cmap.platEncID = 10
ucs4cmap.language = 0
if ucs2cmap:
ucs4cmap.cmap = copy.deepcopy(ucs2cmap.cmap)
cmap.tables.append(ucs4cmap)
# Map all glyphs to UCS-4 cmap Supplementary PUA-A codepoints
# by 0xF0000 + glyphID
ucs4cmap = cmap.getcmap(3, 10)
for glyphID, glyphName in enumerate(ttx.getGlyphOrder()):
ucs4cmap.cmap[0xF0000 + glyphID] = glyphName
def usage():
print "Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF) by 0xF0000 + glyphID"
print "python %s inputfile[.otf|.ttf] outputfile[.otf|.ttf]" % sys.argv[0]
if len(sys.argv) == 3:
inpath = sys.argv[1]
outpath = sys.argv[2]
ttx = fontTools.ttLib.TTFont(inpath, 0, verbose=0)
addSPUAByGlyphIDsMappingToCMAP(ttx):
ttx.save(outpath)
ttx.close()
else:
usage() |
I categorically reject this and think it's a bad idea. Nowhere in this report I see any reasoning for why this is needed or is a good idea. |
Ah, yes. We talked with Dave about this. Sorry it didn't become clear. The idea is not to do this for production-ready fonts but for the purpose of development, to be used within the context of document-driven type design and similar such applications. In a way, think of it as the "debug" mode of building fonts. Such debug mode might include other options that generate some redundant data (such as, well, glyph names! :) ) which is useful while designing but when building fonts in "release" mode, this stuff should not be included. |
Ok, sure. Yeah, that would be useful. |
@behdad could you explain more about why you think this is a bad idea..? You think that if all Google Fonts have this feature, that it will increase the use of PUA characters and documents tightly bound to particular fonts in general usage? |
@davelab6 for the same reasons that non-Unicode encodings are bad. This is even worse, this is full custom encoding, meaning any text encoded in those is illegible to any text processing use. |
@davelab6 please check that test and fix is applied https://github.com/googlefonts/fontbakery-cli/commit/5acd915d47e9385ef529be646906790411bd731d |
@behdad I am skeptical that this would find any general usage. It its a secondary method that is not for text processing, but debugging: it is supplementing, not replacing, the unicode encodings and OTL tables. Part of Document Driven Type Design is having good examples to refer to; specifically for the re-implementation of http://fuelproject.org/utrrs/index (which is the result of a 24-hour overnight sprint, but the concept is valid and needed.) Since we don't have OTL processing in |
@hash3g for now, can you make this optional in the same way as fontcrunch is optional, via |
Per TypeThursday's Laura Worthington article we should consider this, perhaps only for display fonts, if its become important for casual users of desktop fonts. |
What's this article you are referring to? |
Oh its not out yet. Stay tuned.
|
ok. I hope you don't want to revive the idea of using PUA codes in released fonts... |
lol. Ping us when it is. That said, people have had bad ideas forever; doesn't mean we should support them. I'm more willing to implement a HarfBuzz tool to render arbitrary glyphs than to add a hack in fonttools. |
I risk saying something stupid here, but if a strict mapping should not be infered to be normative, then can't these tools randomise mappings on purpose? |
That seems wise, to mitigate the practical problem with PUA text. |
PUA Text: Hope for best, expect the worst. |
@kenlunde I understand the categorical criticism of this, but I'm curious what your advise is for Laura Worthington. What should she do to serve her customers? |
@davelab6: I wrote the following at the bottom of page 162 of CJKV Information Processing, Second Edition: "The use of PUA code points should be avoided at all costs, because their interpretation, in terms of character properties, and their interaction with legacy character set standards (in other words, interoperability) cannot be guaranteed." With that said, Laura's article suggests that PUA usage is a necessary evil in order to access glyphs that are not directly encoded. However, the main caveat is that the more an implementation depends on PUA code points, the more closed said implementation is. |
I'm not entirely sure what you mean by 'closed,' please could you clarify? :) I am framing this as fallback to help users of implementations that are so poorly done that this is the only way to make the font useful. It isn't that implementations should depend on the PUA codepoints for providing large glyph sets to users, it is that they are oblivious to this need and PUA is a workaround for people who are held hostage by these implementations' incompetence. |
@davelab6: What I mean by closed is that the implementation is closely tied to the PUA mappings of a particular font, and changing the selected font to anything else is virtually guaranteed to result in illegible text. Furthermore, in the context of font fallback, I would claim that PUA usage is at least an order of magnitude more dangerous than environments that do not employ it, because it is not clear from which font (or fonts) the glyphs are being displayed. Also, for fonts with glyphs that are not encoded and require OpenType feature support to access them, there may also be metrics-related dependencies that would likely be overlooked by simpler apps, meaning that even if a user is somehow able to enter a glyph via a PUA code point, the resulting glyph may not behave as expected, due to limitations in the simpler authoring app. |
Hmm. So, it seems that Laura should simply Would that be wiser than recommending SPUA-GID encoding? |
Where is the glyph positioning of all this (so 60s) PUA talk, it is not like OpenType is only about glyph replacements. OK you are not building fonts that need mark positioning or all this fancy stuff, but what about kerning? |
@davelab6: I am not sure what pyfeafreeze is and its ramifications. I have no strong objections to someone who feels the need to use PUA code points, and my point is that those who decide to go down that path simply need to understand the consequences, which is that nothing is guaranteed, and any oddball behavior is likely to be related to the decision to use PUA code points. @khaledhosny: That was sort of my point, specifically that there is more to glyphs that merely having them encoded. Perhaps such an approach works for Western fonts, but I can pretty much guarantee that it will crumble when non-Western fonts enter into the picture. |
All OpenType substitutions and positioning happen on the glyph level. Principally, it doesn't matter which codepoints a glyph is invoked through. Of course since the OT Layout model relies on per-script shaping engines, reordering happens or certain features are automatically applied in a specific order when glyphs are invoked via their true Unicode codepoints. If the same glyphs are invoked via PUA, the shaper probably classifies them as "DFLT" script. Indeed if a PUA codepoint is inserted in the middle of "true" Arabic or Devanagari or even Latin or Cyrillic text, some OTL engines may interpret that single glyph inserted via PUA as a separate run, and then would not execute feature interactions with meighboring glyphs (kern, mark etc.). Which does indeed pose a problem. Other engines may fold that glyph into the dominant run and it will work. Inconsistent run itemization and inability to perform positionings or contextual substitutions across run boundaries is indeed a very weak aspect of OTL. But if the PUA-invoked glyphs end up to be in the same run, universally applied default features like kern or mark, and any explicitly user specified features, both GSUB and GPOS, will work. The pyftfeatfreeze method isn't problematic. It only remaps the "cmap" and typically does so within the same languagesystem, so the new default glyphs (assigned to Unicode codepoints) remain within the same script run, and everything works as it should, unless your features do really weird circular stuff. For example, if you have swash Arabic glyphs in the "swsh" feature inside the "arab" languagesystem, these glyphs would normally already partake in the init, medi, fina, isol, curs or mark features in the original font. If you freeze the swsh feature using pyftfeatfreeze, those swash glyphs get mapped as the default Arabic letters in the cmap table, but since they get classified as the Arabic run, are fed into the Arabic shaper and already partake in the "arab" features defined in the font, everything works as expected. Freezing "swsh" is sometimes even a better method than applying a user defined feature to one character via, say a span with a local style="font-feature-settings: 'swsh'" property, because doing the latter may force a run break (or generally cause the span to be rendered in a separate step), which also stops the interaction with the neighbors -- unless the higher-level text engine is smart enough to detect and ignore certain span changes or somehow fuses the line together before passing it to the OTL engine. Again, this is all shit. :) |
Khaled, non-OT implementations are TrueType implementations; thus they use
the KERN table. :)
Ken, please see
https://github.com/twardoch/fonttools-utils/tree/master/pyftfeatfreeze
|
On 11 February 2016 at 09:13, Adam Twardoch notifications@github.com
Hmm. Do we have concrete examples of implementations that have an OT |
@kenlunde pyftfeatfreeze is a tool I published which e.g. could turn the Source Han Sans superfont that uses the OT features to switch between SC, TC, J and K variants into a series of SC, TC, J and K fonts which all have the appropriate variants mapped in the cmap. In fact, after you published Source Han Sans, I realized that such an approach may no longer as frowned upon as it used to be by some people, and I finished and published the tool (I had a simple working version of it for a long time now). |
@davelab Start with Notepad. :) I supports calt (contextual alternates) but gives no way to override its results. Zapfino Extra LT Pro that I made in 2003 shows how this works -- you see substitutions happening as you type but if you finished typing, there is no way to pick another variant. And then virtually every Windows app that uses the standard GDI Windows text controls, any Photoshop clone, older versions of Word or Corel Draw, and tons of simple "add text to image" apps, or apps for vinyl sign cutting or CAD apps or motion graphics/video editing apps or apps that add subtitles or captions etc. All that uses the standard text stack on Windows. |
Right. I see why this is a really big problem 😠 Looking back at Laura's post on TD, she says that feeding the new hobbyist retail market PUA fonts is increasing their use and appreciation of fonts, that puts pressure on app developers to provide better typography features.... but the example is a glyph picker, not a real OpenType UI. I wonder if any apps have added OT UIs, so I've updated my question to Laura to be about that :) |
That swash example makes no sense at all, since you usually want to enable it for just part of the word (at least for Arabic), and you can’t do that with a “frozen” font. That is really just another hack suitable for certain fonts. That CSS font-features currently force run breaks is an implementation bug, Firefox, for example, apply many style changes across spans without breaking OpenType logic and it should do that for font features too (since HarfBuzz already support that). |
@dave I don’t see how kern table can handle all kinds of kerning supported by GPOS pair positioning, nor how it is certain that the kern table will be always supported in these situations. |
@khaledhosny Replace "swash" with any other, e.g. one variant letter accessible through ss02 that you want to appear consistently in your text, or a localized form. Though admittedly, pyftfeatfreeze does not currently allow to freeze a feature for only some glyphs. |
I'm going to be dogmatic here and say that encoding stuff in the PUA for general-purpose fonts is a bad idea and if anything we should check that it's not happening. I understand that it's a necessarily evil for old applications which don't read the glyph table, but fontbakery is about best practices, not bad practices. If anyone still strongly insists that it should happen, feel free to |
I was chatting with @twardoch today about how glyph names are the 'primary key' for fonts, because in any contemporary font you have so many unencoded glyphs, accessed with OpenType Layout logic... But unencoded glyphs are tricky to precisely call, because OTL logic is per-font. I mentioned that I might like to use the Unicode Private Use Area to encode otherwise-unencoded glyphs.
Adam kindly mentioned he already thought about this, and he concluded that the Private Use Plane A (Unicode Plane 15) is ideal for this, as its
U+F0000..U+FFFFD
so you can use a value ofF0000 + hex(GID)
to cleanly, logically, encode all unencoded glyphs.Let's do it!
The text was updated successfully, but these errors were encountered: