Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for subsetting OT-SVG table #2452

Merged
merged 22 commits into from
Nov 16, 2021
Merged

Add support for subsetting OT-SVG table #2452

merged 22 commits into from
Nov 16, 2021

Conversation

anthrotype
Copy link
Member

Fixes #534

…Subsetter instance

so that SVG tables' subset_glyphs method can use it to get glyph names from GISs and to remap from old to new GIDs
this drops svg document records when they no longer intersect the subset. It keeps them in their entirety (for now) when they still intersect the subset, only renaming all the id='glyphXXX' to point to the new glyph indices after subsetting. Unused, unreferenced elements are not pruned yet.
support for namespaces and xpath is insufficient in built-in ElementTree; supporting both lxml and ElementTree is too complicated, let's simply require lxml to be able to subset SVG for now
With this I can finally follow xlink:href and url(#...) sort of
references within the SVG doc and subset the elements accordingly so
that only those that are reachable from the initial set of glyph
elements are kept.
new_id = f"{new_id}.{next(n)}"

id_map[el_id] = new_id
el.attrib["id"] = new_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's doing two jobs and might be simpler if it was broken into handling each explicitly:

  1. Assign new gids to things we're keeping
  2. Rename things we are dropping to avoid clashes
    • Can we just destroy them? Why do we have to keep the things we aren't keeping and rename them to avoid id clashes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't destroy them, because they may contain other elements with id="glyph{digit}" to keep.
I think renaming all the id="glyph{digit}" attributes can logically be done in one go within the same method, whether we are keeping the glyph elements because their id is in our font subset (in which case we want to remap them from old to new glyph indices), or whether we are keeping them indirectly either because they are x-referenced from elements that we are keeping, or because they contain elements that we are keeping.
If we didn't do it in one go, there could be an intermediate state where clashes may still occur (we are updating the tree in-place).
We also return an id_map that has all the final old:new glyph id strings, with the values renamed to match the subset font's new glyph indices or renamed to avoid element id clashes. We can use this in the following update_glyph_href_links method where we search for all the xlink:href="#glyph references and update those as well.

when decompiled from binary, the SVG.docList contains (unicode) strings, decoded as UTF-8. lxml fromstring accepts either bytes or str, but when given str with the xml header declaring an explicit encoding, it rejects them (since the header is lying). So we encode to bytes before calling fromstring in case the SVG contains an explicit encoding (UTF-8 is the only one allowed anyway). When serializing to XML with tostring, we similarly decode to str as UTF-8. Not only to match SVG decompile (which gives us str), but if we didn't do that, then attempting to dump to XML would fail, because XMLWriter.writecdata expects str, not bytes.
Tests/subset/svg_test.py Outdated Show resolved Hide resolved
Tests/subset/svg_test.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subset SVG table with pyftsubset
2 participants