Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build CFF2 Variable Font from sparse sources, and with more than one VarData table. #1547

Closed
wants to merge 8 commits into from

Conversation

readroberts
Copy link
Collaborator

in varLib/cffLib.py, add support for sparse sources, and sources with more than one model, and hence more than one VarData element in the VarStore.

Support blend ops in cffLib/specializer.py generalizeCommands() and specializeCommands().

@readroberts
Copy link
Collaborator Author

Note: this is a PR for a revision of an earlier branch. See #1475

…sindex. Fixes endless compile loop in some circumstances.

Fixed bug in mutator: need to remove vsindex from snapshotted charstrings, plus formatting clean up
…--retain-gid option is used. Needed to make subset_test.py::test_retain_gids_cff2 tests pass.
… more than one model, and hence more than one VarData element in the VarStore.

CFF2 source fonts with multiple FontDicts in the FDArray need some extra work. With sparse fonts, some of the source fonts may have a fewer FontDicts than the default font. The getfd_map function() builds a map from the FontDict indices in the default font to those in each region font. This is needed when building up the blend value lists in the master font FontDict PrivateDicts, in order to fetch PrivateDict values from the correct FontDict in each region font.

In specializer.py, add support for CFF2 CharStrings with blend operators. 1) In generalizeCommands, convert a blend op to a list of args that are blend lists for the following regular operator. A blend list as a default font value, followed by the delta tuple. 2) In specializeCommands(), convert these back to blend ops, combining as many successive blend lists as allowed by the stack limit.

Add test case for sparse CFF2 sources.
The test font has 55 glyphs. 2 glyphs use only 2 sources (weight = 0 and 100). The rest use 4 source fonts: the two end points of the weight axis, and two intermediate masters. The intermediate masters are only 1 design space unit apart, and are used to change glyph design at the point in design space. For the rest, at most 2 glyphs use the same set of source fonts. There are 12 source fonts.

Add test case for specializer programToCommands() and commandsToProgram by converting each CharString.program in the font to a command list, and back again, and comparing original and final versions.
@anthrotype
Copy link
Member

anthrotype commented Mar 19, 2019

I cherry-picked the HVAR commit into master since I am about about to cut a new release, and don't feel ready to merge this one just yet (would like to take another look and also give @behdad another chance to review).
6355376

@readroberts
Copy link
Collaborator Author

Of course, there is certainly enough rework to need a new review. Let's keep the momentum going!

…mmandsToProgram(), mask arg must be appended following the operator.
Copy link
Member

@behdad behdad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks for this and sorry for delay. Looks a lot better.

Based on the overall changes, I like to propose this new place to apply the changes:

  • In programToCommands, first thing before any other processing, convert each blend to ONE tuple. This means, each tuple might represent multiple values at this point,

  • In generalizeCommands, first before anything else, break such tuples into smaller ones, each representing one value. Move the width-extraction code from programToCommands here. Since we processed blends already, that code will work without modification.

  • Move combining of blend lists into longer blend lists to the end phase of specializeCommands, with stack size tracking and all.

  • Convert blend lists to operations in commandsToProgram.

Make sure we support recursive blends. The stack-size calculations get a bit complicated but not hugely so. Use recursive functions for many processes (like adding two values).

@@ -4,6 +4,7 @@

from __future__ import print_function, division, absolute_import
from fontTools.misc.py23 import *
from fontTools.cffLib import maxStackLimit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should reuse the stack limit passed to the functions. Currently that one defaults to 48. I don't know how to accommodate what you want (default to CFF2's for blends).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that the default value for the specializeCommands() maxstack argument is reasonable for CFF fonts, but not for CFF2 fonts with a blend operator. How about a check to set maxstack to cffLib.maxStackLimit if a) it is the default value of 48, and numRegions is not None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I like that. If numRegions is not None, then assume CFF2.

if isinstance(arg1, list):
new_args = [[a1 + a2 for a1, a2 in zip(arg0, arg1)]]
else:
new_args = [[a1 + arg1 for a1 in arg0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. You should just add arg1 to the first entry, not all. No?

if isinstance(arg1, list):
new_args = [[arg0 + a1 for a1 in arg1]]
else:
new_args = [arg0 + arg1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the above block should be moved to a function (and handle symmetry by calling itself with args reversed.)

else:
program.extend(args)
if op:
program.append(op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this change? The code handles hintmask/cntrmask just fine.

'hmoveto', 'vmoveto', 'rmoveto',
'endchar'}:
# We skip this when seen_blend == True because a blend operator
# can leave an odd number of arguments on the stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we only support blend in CFF2, and CFF2 doesn't have width. But I prefer if you implement this to work with both.

I'll sketch in my overall review how that can be done instead.

blend_args.append(blendList)
tuplei = next_ti
argi += 1
return blend_args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should handle errors. Ie. if there's insufficient arguments, just encode them to roundtrip/ignore like other code in generalizeCommands does.

blendList = [op_args[argi]] + op_args[tuplei:next_ti]
blend_args.append(blendList)
tuplei = next_ti
argi += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block correct? I thought there's all default values first, then all deltas for first region, then all deltas for second region, etc. No?

At any rate, I think this while loop should be converted to something using itertools or otherwise for loops. There's nothing "while" about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block correct? I thought there's all default values first, then all deltas for first region, then all deltas for second region, etc. No?

Apparently no.

assert numRegions is not None, (
"Cannot process charstring without numRegions argument")
blendArgs = _convertBlendOpToArgs(args, numRegions)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block will be unnecessary in the order I like things to be done, as sketched in main comment.

for arg in args:
if isinstance(arg, list):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be simply inlined as any(isinstance(arg, list) for arg in args).

blend_args = []
stack_use = prev_stack_use + num_blends

return blend_cmds, blend_args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know spec doesn't allow it, but we should support blends where the args are also blended themselves. Or at least be able to pass those through without error.

…'s comments in PR 1547 on April 10, 2019.

Fix some bugs in handling hinting.
Copy link
Member

@behdad behdad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done a full review. But a few points that need fixing already.

Also, it occurred to me: we are passing down numRegions. Whereas the program might have a vsindex operator. Shouldn't we pass a map/function that when passed the vsindex, returns numRegion?

@@ -26,13 +27,15 @@ def programToString(program):
return ' '.join(str(x) for x in program)


def programToCommands(program):
def programToCommands(program, numRegions=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all the kwargs args?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra args are there because of specializeProgram(program, **kwargs) and generalizeProgram(program, **kwargs). **kwargs needs to hold both numRegions and generalizeFirst. If these args are both in **kwargs, then programToCommnd and specializeCommand complain about the unused arg unless you provide the additional **kwargs to swallow the unused arg. I don't really like what I did, but it is better than the alternatives I thought of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just spell out the arguments that are needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I saw that the prior code used **kwargs rather than the arguments actually needed. I assumed this was a deliberate choice. I do see that it allows for adding future new arguments to the called functions, without having to change specializeProgram() and generalizeProgram(). That said, I am perfectly happy to follow your suggestion. I would then change **kwargs in specializeProgram() and generalizeProgram to the two args, and pass only the needed args to the callees. Just confirm that this is what you would prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That definitely explains your choice. Thank you.

Generally I like using kwargs for methods that just pass along arguments, but not in methods that consume them. What I like to see here, I think, is to keep things as they are, just add numRegion anywhere it's needed. You can add numRegions=None just before **kwargs in the methods that take that. This will separate that one argument from the rest, then you decide which methods you pass the numRegions to and which one the **kwargs. I think that works but up to you whichever works. Thanks.

elif isinstance(arg1, list):
new_args = _combineLineArgs(arg1, arg0)
else:
new_args = [arg0 + arg1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should simply become `new_args = [_addArgs(args[0], other_args[0])]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

def _combineLineArgs(listArg, valArg):
listArg[0] += valArg
newArgList = listArg
return newArgList
Copy link
Member

@behdad behdad Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems incomplete. The whole if hierarchy below should come here, and it should recurse somethings. This should become:

def _addArgs(a, b):
  if isinstance(b, list):
    if isinstance(a, list):
      return [_addArgs(va, vb) for va,vb in zip(a, b)]
    else:
      a, b = b, a
  if isinstance(a, list):
    return [_addArgs(a[0], b)] + a[1:]]
  return a + b

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behdad Just pushed the changes above.

simplify the logic to combine arguments for successive single-argument hlineto or vlineto operators
For generalizeProgram() and specializeProgram(), pass the numRegions arguments explicitly rather than in a **kwargs argument.
@readroberts
Copy link
Collaborator Author

@behdad About your question: "Shouldn't we pass a map/function that when passed the vsindex, returns numRegion?". I think this will add complexity to the logic. 'vsindex' may not ( and indeed is usually not) present in the program, so you would have to pass in both the mapping function and the default vsindex, which may be set only in the private dict, or not at all. You can see how numRegions is derived at line 945 in specializer_test.py, and line varLib/cff.py: whether or not there is a 'vsindex' in the charstring is already handled by the T2Charstring.vsindex property.

@behdad
Copy link
Member

behdad commented Apr 19, 2019

@behdad About your question: "Shouldn't we pass a map/function that when passed the vsindex, returns numRegion?". I think this will add complexity to the logic. 'vsindex' may not ( and indeed is usually not) present in the program, so you would have to pass in both the mapping function and the default vsindex, which may be set only in the private dict, or not at all. You can see how numRegions is derived at line 945 in specializer_test.py, and line varLib/cff.py: whether or not there is a 'vsindex' in the charstring is already handled by the T2Charstring.vsindex property.

I still think the specializer should do the right thing / be generic.

I suggest this: take numRegions as is. When we need to use it:

  • if it's None, err,
  • if it's an integer, use it,
  • otherwise call it with one argument. The argument is the last vsindex seen so far (I know spec says vsindex can only occur once and at the beginning, but there's no reason NOT to write the code more generally here). If we have not seen vsindex so far, pass None as parameter. Use the return value.

I suggest you do this in specializer and document it in the docstring. Whether you use it in varLib/cff is another issue and up to you.

Thanks. Or feel free to ignore, and I do it after you land.

@readroberts
Copy link
Collaborator Author

@behdad That creates an extra problem case: what if the numRegions passed in as an integer, and the program contains a vsindex operator? How about a variation on your original suggestion: the numRegions arg is always None or a function. If the function is not passed an arg, it returns the default numRegions for the charstring, else it takes vsindex as an argument, and returns the numRegion implied by the vsindex.

@behdad
Copy link
Member

behdad commented Apr 22, 2019

@behdad That creates an extra problem case: what if the numRegions passed in as an integer, and the program contains a vsindex operator? How about a variation on your original suggestion: the numRegions arg is always None or a function. If the function is not passed an arg, it returns the default numRegions for the charstring, else it takes vsindex as an argument, and returns the numRegion implied by the vsindex.

Sure, that's what I originally had in mind. I thought about not passing an argument, vs passing None. The latter is easier since you can initialize a variable to None, and update it if you see vsindex op, and pass that variable to function, as opposed to conditionalize the call to the function.

@readroberts
Copy link
Collaborator Author

Sounds good. I propose adding a numRegions property in psCharstrings.py::T2CharString. This not only provides the needed function, but can reset the T2CharString current vsindex value whenever T2CharString.numRegions() is called with a not None value. I think I will need to update T2CharString anyway - it now assumes that any vsindex op occurs only at the start of the charstring.

@behdad
Copy link
Member

behdad commented Apr 22, 2019

Sounds good. I propose adding a numRegions property in psCharstrings.py::T2CharString. This not only provides the needed function, but can reset the T2CharString current vsindex value whenever T2CharString.numRegions() is called with a not None value. I think I will need to update T2CharString anyway - it now assumes that any vsindex op occurs only at the start of the charstring.

I'm not sure how that works, but sure, if you think so.

- programToCommands now takes a function argument, numRegions
- 'vsindex' is now allowed to occur more than once in the charstring

Since vsindex may now occur more than once in a charstring, changed misc/psCharString.py::T2Charstring accordingly:
- removed vsindex property, since this is no longer a static item, and now depends on current location in the charstring
- add a numRegions function to get the num regions in use according to the current charstring vsindex.

Updated specializer_test.py and varLib/mutator.py to match
@readroberts
Copy link
Collaborator Author

@behdad Updated specializer with change in handling of numRegions and vsindex; please review. Other files updated to match -see commit message.

Copy link
Member

@behdad behdad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Read. Looks great!

program (¯\_(ツ)_/¯).
'numRegions' may be None, or a function that returns the number
of regions. If the function is not passed a vsindex argument, it returns
the default number of regions for the charstring, else it returns the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function is not passed a vsindex argument is inaccurate. We pass a None to it in that case.

Also, instead of it returns, use it must return?

# replace the blend op args on the stack with a single list
# containing all the blend op args.
numBlendOps = stack[-1]*(numRegions+1) + 1
numBlendOps = stack[-1] * numSourceFonts + 1
# replace first blend op by a list of the blend ops.
stack[-numBlendOps] = stack[-numBlendOps:]
del stack[-numBlendOps + 1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above two lines can be written as:
stack[-numBlendOps:] = [stack[-numBlendOps:]]

elif (not seen_width_op) and token in {'hstem', 'hstemhm', 'vstem', 'vstemhm',

elif token == 'vsindex':
vsIndex = stack[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert it's integer, or doesn't matter?

@@ -944,6 +944,16 @@ def __init__(self, bytecode=None, program=None, private=None, globalSubrs=None):
self.program = program
self.private = private
self.globalSubrs = globalSubrs if globalSubrs is not None else []
self._cur_vsindex = None

def numRegions(self, vsindex=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed getNumRegions?

"""Takes a T2CharString program list and returns list of commands.
Each command is a two-tuple of commandname,arg-list. The commandname might
be empty string if no commandname shall be emitted (used for glyph width,
hintmask/cntrmask argument, as well as stray arguments at the end of the
program (¯\_(ツ)_/¯)."""
program (¯\_(ツ)_/¯).
'numRegions' may be None, or a function that returns the number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of function say callable?

def _addArgs(a, b):
if isinstance(b, list):
if isinstance(a, list):
return [_addArgs(va, vb) for va,vb in zip(a, b)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that the lengths match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behdad Sure. This can't happen if the command list is build by programToCommands, but a developer could build one independently and do it incorrectly. I would raise an error if the lengths don't match, catch it at line 585 and continue without changing the command. Sound OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks.

In specializer.py:programToCommands():
 - edit comments about getNumRegions arg
- at line 58, use more compact stack array editing syntax
- assert that vsindex arg is an int at line 76

In specializer.py:specializeCommands():
- When combining successive [vh]lineto's, assert that when both args are lists, that they are the same length, and continue if not.

In fontTools/misc/psCharStrings.py::T2CharString:
rename numRegions method to getNumRegions. This is parallel to the same function name in PrivateDict(), and avoids confusion with the self.numRegions field in SimpleT2Decompiler(). Applied same name change to argument for specializer.py:programToCommands().
@readroberts
Copy link
Collaborator Author

@behdad Thanks, the changes you suggested yesterday were all useful. Please take a look at the latest commit, which I think implements all of them

Copy link
Member

@behdad behdad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Read. Looks great.

@readroberts
Copy link
Collaborator Author

Change submitted in #1591: built new branch and PR to allow clean commit

@readroberts readroberts deleted the cff2vf-sparse-sources branch April 26, 2019 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants