-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for subsetting OT-SVG table #2452
Conversation
simply pass-through for now
…Subsetter instance so that SVG tables' subset_glyphs method can use it to get glyph names from GISs and to remap from old to new GIDs
this drops svg document records when they no longer intersect the subset. It keeps them in their entirety (for now) when they still intersect the subset, only renaming all the id='glyphXXX' to point to the new glyph indices after subsetting. Unused, unreferenced elements are not pruned yet.
support for namespaces and xpath is insufficient in built-in ElementTree; supporting both lxml and ElementTree is too complicated, let's simply require lxml to be able to subset SVG for now
False (more compact) by default
With this I can finally follow xlink:href and url(#...) sort of references within the SVG doc and subset the elements accordingly so that only those that are reachable from the initial set of glyph elements are kept.
new_id = f"{new_id}.{next(n)}" | ||
|
||
id_map[el_id] = new_id | ||
el.attrib["id"] = new_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it's doing two jobs and might be simpler if it was broken into handling each explicitly:
- Assign new gids to things we're keeping
- Rename things we are dropping to avoid clashes
- Can we just destroy them? Why do we have to keep the things we aren't keeping and rename them to avoid id clashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't destroy them, because they may contain other elements with id="glyph{digit}" to keep.
I think renaming all the id="glyph{digit}" attributes can logically be done in one go within the same method, whether we are keeping the glyph elements because their id is in our font subset (in which case we want to remap them from old to new glyph indices), or whether we are keeping them indirectly either because they are x-referenced from elements that we are keeping, or because they contain elements that we are keeping.
If we didn't do it in one go, there could be an intermediate state where clashes may still occur (we are updating the tree in-place).
We also return an id_map that has all the final old:new glyph id strings, with the values renamed to match the subset font's new glyph indices or renamed to avoid element id clashes. We can use this in the following update_glyph_href_links
method where we search for all the xlink:href="#glyph
references and update those as well.
when decompiled from binary, the SVG.docList contains (unicode) strings, decoded as UTF-8. lxml fromstring accepts either bytes or str, but when given str with the xml header declaring an explicit encoding, it rejects them (since the header is lying). So we encode to bytes before calling fromstring in case the SVG contains an explicit encoding (UTF-8 is the only one allowed anyway). When serializing to XML with tostring, we similarly decode to str as UTF-8. Not only to match SVG decompile (which gives us str), but if we didn't do that, then attempting to dump to XML would fail, because XMLWriter.writecdata expects str, not bytes.
40cf50f
to
f1e9241
Compare
501fe8e
to
247fa84
Compare
Fixes #534