Type42 subsetting in PS/PDF #20391

aitikgupta · 2021-06-08T14:16:19Z

PR Summary

This PR is a fresh rebase from #18143.

Adds a dependency: fonttools to handle font subsetting for us
(we already have an external ttconv dependency, which does not handle subsetting)
Interfaces a getSubset utility to get file-like objects containing subsetted font data

Possibly fixes #11303 (large file sizes)
Fixes #18191.

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

lumberbot-app · 2021-06-10T19:25:33Z

I'm Mr. Meeseek, @aitikgupta, Look at meee !

aitikgupta · 2021-06-13T09:14:41Z

The last commit subsets and embeds Type 42 PS/EPS, here's the size difference in outputs:

nosub-dejavu.ps                ----> 1.2 MB
sub-dejavu.ps                  ----> 8.5 kB

(To test it out on your machine, apply this patch for nosub-dejavu.ps:)

diff --git a/lib/matplotlib/backends/backend_ps.py b/lib/matplotlib/backends/backend_ps.py
index 2a3ab64a1c..ed0af4ea85 100644
--- a/lib/matplotlib/backends/backend_ps.py
+++ b/lib/matplotlib/backends/backend_ps.py
@@ -997,12 +997,8 @@ class FigureCanvasPS(FigureCanvasBase):
                             ) as tmp:
                                 tmp.write(fontdata)
                                 tmp.seek(0, 0)
-                                font = FT2Font(tmp.name)
-                                glyph_ids = [
-                                    font.get_char_index(c) for c in chars
-                                ]
                                 convert_ttf_to_ps(
-                                    os.fsencode(tmp.name),
+                                    os.fsencode(font_path),
                                     fh,
                                     fonttype,
                                     glyph_ids,

The script:

import matplotlib.pyplot as plt

plt.rcParams["ps.fonttype"] = 42
plt.figtext(0.5, 0.5, "hello")
# plt.savefig('sub-dejavu.ps')                    # before applying the patch
# plt.savefig('nosub-dejavu.ps')                  # after applying the patch

lib/matplotlib/backends/backend_pdf.py

lib/matplotlib/backends/backend_ps.py

lib/matplotlib/backends/_backend_pdf_ps.py

lib/matplotlib/backends/backend_pdf.py

lib/matplotlib/backends/backend_ps.py

setup.py

lib/matplotlib/backends/_backend_pdf_ps.py

aitikgupta · 2021-06-18T21:38:20Z

#20391 (comment): This appears to require testing.

I think it's difficult to test get_glyphs_subset, since it really depends on font files, and if we chose certain characters to test with a fixed font (lets say), there's no guarantee that the subset will be the same for different versions of the library.

Is there any other way to test this?

edit: @jklymak this isn't a draft anymore, just needed to address review comments 😄

jklymak · 2021-06-18T21:48:49Z

You can mark as ready-to-review at any point. I just kick things to Draft so they don't show up in the queue if the PR requires action on the part of the author. (this one is still not passing the tests).

aitikgupta · 2021-06-18T21:54:03Z

(this one is still not passing the tests).

Yeah, CI isn't installing fonttools, which is why NoModuleFound is all over the place (even after adding it to minvers.txt: #20391 (comment))

QuLogic · 2021-06-19T00:26:18Z

Install is done with --no-deps, so hard dependencies also need to be listed here: https://github.com/matplotlib/matplotlib/blob/master/.github/workflows/tests.yml#L148 minver.txt is only a constraint on versions.

QuLogic · 2021-06-19T00:29:58Z

I think it's difficult to test get_glyphs_subset, since it really depends on font files, and if we chose certain characters to test with a fixed font (lets say), there's no guarantee that the subset will be the same for different versions of the library.

We ship DejaVu ourselves, so that should always be available for testing.

Is there any other way to test this?

A file with just 'A' embedded should be smaller than one with the full alphabet, say?
Also, a subsetted PDF should appear the same as one without subsetting (assuming Ghostscript doesn't happen to substitute the same glyphs.)

aitikgupta · 2021-06-19T19:07:18Z

A file with just 'A' embedded should be smaller than one with the full alphabet, say?

Oh, if we just test the subset being 'smaller' yeah we can do it, but not with the exact number glyphs, since that will definitely vary. (we also set recommeded_glyphs=True)

But even so, isn't that the 'full-time job' of the fonttools library itself? or in other words wouldn't testing the get_glyphs_subset function be the same as testing the library itself (that it does reduce the number of glyphs)?

lib/matplotlib/backends/backend_pdf.py

lib/matplotlib/backends/_backend_pdf_ps.py

lib/matplotlib/backends/backend_ps.py

jkseppan · 2021-06-23T14:50:38Z

Oh, and one more thing (sorry for not remembering this earlier): the PDF specification has some extra requirements for font subsets in section 9.6.4. The PostScript name of subsetted fonts needs to have a prepended tag of six random uppercase letters and a plus sign, e.g. EOODIA+Poetica if the original font name is Poetica. This is relevant in the case that multiple Matplotlib plots are combined in the same document, e.g. in a LaTeX paper with several figures. The random tags prevent collisions between the different versions of the same font.

aitikgupta · 2021-06-24T00:38:55Z

extra requirements for font subsets ... PostScript name of subsetted fonts needs to have a prepended tag of six random uppercase letters and a plus sign

I think we don't do this even for Type 3 subsetting?
Since we always try to subset those, I can probably just add a function to modify the name for Type 3 and Type 42 both.

aitikgupta · 2021-06-24T01:51:28Z

This breaks the test_determinism_check:

matplotlib/lib/matplotlib/tests/test_determinism.py

Lines 85 to 88 in 593de35

    
           def test_determinism_check(objects, fmt, usetex): 
        
               """ 
        
               Output three times the same graphs and checks that the outputs are exactly 
        
               the same.

This is for obvious reasons, since we have random string inside font postscript name. (due to the specification)

jkseppan · 2021-06-24T02:55:29Z

Then the string shouldn't be random, but perhaps something derived from the subset of glyphs? Take hash(frozenset(glyphs)), convert it to base 26 and take the first six letters, or something like that.

The PDF specification only requires the tag for Type 1 and TrueType fonts. It probably doesn't hurt with Type 3 fonts either.

aitikgupta · 2021-06-24T03:03:55Z

Then the string shouldn't be random, but perhaps something derived from the subset of glyphs?

That totally makes sense! Let me try this out 👍🏼

jklymak · 2021-07-22T13:43:34Z

@sauerburger can you suggest a test for that, or is it already in #20633?

aitikgupta · 2021-07-22T13:49:46Z

can you suggest a test for that, or is it already in #20633?

The test is already in, and that is exactly what is breaking the CI for this PR.
I'll rebase and push a commit, fixing the error..

jkseppan · 2021-07-26T12:59:21Z

It seems to me that the requested changes have been made.

tacaswell mentioned this pull request Jun 8, 2021

Proof of concept: Type42 subsetting in pdf #18143

Closed

6 tasks

jklymak marked this pull request as draft June 8, 2021 21:31

aitikgupta changed the title ~~Type42 subsetting in PDF~~ Type42 subsetting in PS/PDF Jun 13, 2021

anntzer reviewed Jun 15, 2021

View reviewed changes

lib/matplotlib/backends/backend_pdf.py Outdated Show resolved Hide resolved

anntzer reviewed Jun 15, 2021

View reviewed changes

lib/matplotlib/backends/backend_pdf.py Outdated Show resolved Hide resolved

anntzer reviewed Jun 15, 2021

View reviewed changes

lib/matplotlib/backends/backend_ps.py Outdated Show resolved Hide resolved

aitikgupta marked this pull request as ready for review June 15, 2021 16:57

QuLogic reviewed Jun 17, 2021

View reviewed changes

jklymak added this to the v3.5.0 milestone Jun 17, 2021

jklymak added status: needs revision topic: text/fonts labels Jun 17, 2021

jklymak marked this pull request as draft June 17, 2021 14:40

aitikgupta marked this pull request as ready for review June 19, 2021 19:13

anntzer reviewed Jun 22, 2021

View reviewed changes

lib/matplotlib/backends/backend_pdf.py Outdated Show resolved Hide resolved

jkseppan reviewed Jun 22, 2021

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py Show resolved Hide resolved

anntzer reviewed Jun 22, 2021

View reviewed changes

lib/matplotlib/backends/backend_ps.py Outdated Show resolved Hide resolved

aitikgupta force-pushed the pyftsubset-fonttools branch from 4974261 to 2702656 Compare June 24, 2021 01:01

aitikgupta added 19 commits July 22, 2021 19:12

Log the correct way

0d75117

Add fonttools min version for testing

f5eebbb

Add fonttools in test workflow

91417cd

Use ASCII characters for logging

aca3bb5

Add unit test for get_glyphs_subset

265a563

Remove seek()

2193caa

Add prefix to subsetted font names according to PDF spec

5661f0d

Use charmap for prefix

d0d766f

Update fonttools requirements

5ea7f1b

Drop PfEd table

17873f3

flush before reading the contents back from tmp file

9837733

Fix testing for subsetting

f509731

Add whatsnew entry for Type42 subsetting

a362601

Fix subset tests

57267a3

Add PS test for multiple fonttypes

7571055

Use TemporaryDirectory instead of NamedTemporaryFile

1630ad9

Add fontTools in dependencies.rst

fa197d2

Add API changenote for new dependency

fe583dd

Rebase tests.yml for packaging

a95f2b6

jklymak marked this pull request as draft July 22, 2021 13:42

Keep a reference to non-subsetted font for XObjects

85f4377

aitikgupta force-pushed the pyftsubset-fonttools branch from 42f57bb to 85f4377 Compare July 22, 2021 13:52

aitikgupta marked this pull request as ready for review July 22, 2021 16:34

jkseppan removed the status: needs revision label Jul 26, 2021

jkseppan merged commit e13f0bd into matplotlib:master Jul 26, 2021

anntzer mentioned this pull request Feb 20, 2023

Use pybind11 in ttconv module #25253

Merged

andrzejnovak mentioned this pull request Feb 23, 2024

FIX: Investigate replacing Tex Gyre Heros Type 3 fonts with Type 1 scikit-hep/mplhep#462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type42 subsetting in PS/PDF #20391

Type42 subsetting in PS/PDF #20391

aitikgupta commented Jun 8, 2021 •

edited

Loading

lumberbot-app bot commented Jun 10, 2021

aitikgupta commented Jun 13, 2021 •

edited

Loading

aitikgupta commented Jun 18, 2021 •

edited

Loading

jklymak commented Jun 18, 2021

aitikgupta commented Jun 18, 2021

QuLogic commented Jun 19, 2021

QuLogic commented Jun 19, 2021

aitikgupta commented Jun 19, 2021

jkseppan commented Jun 23, 2021

aitikgupta commented Jun 24, 2021

aitikgupta commented Jun 24, 2021

jkseppan commented Jun 24, 2021

aitikgupta commented Jun 24, 2021

jklymak commented Jul 22, 2021

aitikgupta commented Jul 22, 2021

jkseppan commented Jul 26, 2021

Type42 subsetting in PS/PDF #20391

Type42 subsetting in PS/PDF #20391

Conversation

aitikgupta commented Jun 8, 2021 • edited Loading

PR Summary

PR Checklist

lumberbot-app bot commented Jun 10, 2021

aitikgupta commented Jun 13, 2021 • edited Loading

aitikgupta commented Jun 18, 2021 • edited Loading

jklymak commented Jun 18, 2021

aitikgupta commented Jun 18, 2021

QuLogic commented Jun 19, 2021

QuLogic commented Jun 19, 2021

aitikgupta commented Jun 19, 2021

jkseppan commented Jun 23, 2021

aitikgupta commented Jun 24, 2021

aitikgupta commented Jun 24, 2021

jkseppan commented Jun 24, 2021

aitikgupta commented Jun 24, 2021

jklymak commented Jul 22, 2021

aitikgupta commented Jul 22, 2021

jkseppan commented Jul 26, 2021

aitikgupta commented Jun 8, 2021 •

edited

Loading

aitikgupta commented Jun 13, 2021 •

edited

Loading

aitikgupta commented Jun 18, 2021 •

edited

Loading