Speedup pdftex.map parsing. #19538

anntzer · 2021-02-18T22:56:26Z

1st commit:

For reminder, pdftex.map is a file that maps tex font names ("cmr10") to
filesystem font names ("cmr10.pfb"), together with additional metadata
(font encoding, postscript special commands). When using pdf output
with usetex, we parse usetex-generated dvi files and then need to locate
and load these fonts for embedding into the pdf file, hence then need to
parse pdftex.map.

On some systems (likely with large texlive installs), pdftex.map can be
really large (>10^4 entries), and parsing it is quite slow (>500ms on
the matplotlib macos).

This patch implements a new (simpler?) parser, which is ~25% faster
(so it can cut hundreds of ms on systems with large maps). The patch
additionally correctly handles entries of the form foo <bar.pfb
(i.e., with no postscript font name -- in that case the docs say that
the postscript font name is the same as the tfm name). On the other
hand, the patch also drops support for quotes around anything but the
postscript specials (in accordance with the psfonts.map docs, and the
actual pdftex implementation in src/texk/web2c/pdftexdir/mapfile.c:
case '"': /* opening quote */ only handles postscript specials). See
also changes to test.map for the changes in supported syntax.

2nd commit:

See previous commit for description of pdftex.map. The vast majority
of entries (dozens of thousands) in pdftex.map actually end up being
unused, and their parsing is just wasted. This patch takes advantage of
the fact that we can quickly recover the tex font name from pdftex.map
entries (it's just the first word), so we can very quickly build a
mapping of tex font names to unparsed pdftex.map entries, and then only
parse the few entries that we'll need on-demand. This speeds up e.g.

python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")'

by ~700ms (~20%) on the matplotlib macos.

PR Summary

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

tacaswell · 2021-02-18T23:54:30Z

xref to the c code: http://tug.org/svn/pdftex/trunk/source/src/texk/web2c/pdftexdir/mapfile.c?view=markup#l450

lib/matplotlib/dviread.py

lib/matplotlib/tests/baseline_images/dviread/test.map

lib/matplotlib/dviread.py

For reminder, pdftex.map is a file that maps tex font names ("cmr10") to filesystem font names ("cmr10.pfb"), together with additional metadata (font encoding, postscript special commands). When using pdf output with usetex, we parse usetex-generated dvi files and then need to locate and load these fonts for embedding into the pdf file, hence then need to parse pdftex.map. On some systems (likely with large texlive installs), pdftex.map can be really large (>10^4 entries), and parsing it is quite slow (>500ms on the matplotlib macos). This patch implements a new (simpler?) parser, which is ~25% faster (so it can cut hundreds of ms on systems with large maps). The patch additionally correctly handles entries of the form `foo <bar.pfb` (i.e., with no postscript font name -- in that case the docs say that the postscript font name is the same as the tfm name). On the other hand, the patch also drops support for quotes around anything but the postscript specials (in accordance with the psfonts.map docs, and the actual pdftex implementation in `src/texk/web2c/pdftexdir/mapfile.c`: `case '"': /* opening quote */` only handles postscript specials). See also changes to test.map for the changes in supported syntax.

See previous commit for description of pdftex.map. The vast majority of entries (dozens of thousands) in pdftex.map actually end up being unused, and their parsing is just wasted. This patch takes advantage of the fact that we can quickly recover the tex font name from pdftex.map entries (it's just the first word), so we can very quickly build a mapping of tex font names to unparsed pdftex.map entries, and then only parse the few entries that we'll need on-demand. This speeds up e.g. ``` python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")' ``` by ~700ms (~20%) on the matplotlib macos.

jkseppan

Looks like an improvement, thanks!

anntzer added topic: text/usetex Performance labels Feb 18, 2021

tacaswell reviewed Feb 19, 2021

View reviewed changes

lib/matplotlib/dviread.py Show resolved Hide resolved

anntzer force-pushed the psfontsmap branch 2 times, most recently from ba7f9fd to 650802f Compare February 22, 2021 21:42

anntzer force-pushed the psfontsmap branch 2 times, most recently from f57cb14 to 40ddab7 Compare March 25, 2021 12:21

QuLogic reviewed Apr 2, 2021

View reviewed changes

lib/matplotlib/tests/baseline_images/dviread/test.map Show resolved Hide resolved

lib/matplotlib/dviread.py Outdated Show resolved Hide resolved

anntzer added 2 commits April 6, 2021 09:51

anntzer force-pushed the psfontsmap branch from 40ddab7 to 04d28e9 Compare April 6, 2021 07:51

QuLogic approved these changes Apr 6, 2021

View reviewed changes

QuLogic added this to the v3.5.0 milestone Apr 6, 2021

jkseppan self-requested a review May 1, 2021 11:54

jkseppan approved these changes May 1, 2021

View reviewed changes

jkseppan merged commit b85e958 into matplotlib:master May 1, 2021

anntzer deleted the psfontsmap branch May 1, 2021 12:41

anntzer mentioned this pull request May 26, 2021

SVG savefig + LaTeX extremely slow on macOS #9653

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup pdftex.map parsing. #19538

Speedup pdftex.map parsing. #19538

anntzer commented Feb 18, 2021

tacaswell commented Feb 18, 2021

jkseppan left a comment

Speedup pdftex.map parsing. #19538

Speedup pdftex.map parsing. #19538

Conversation

anntzer commented Feb 18, 2021

PR Summary

PR Checklist

tacaswell commented Feb 18, 2021

jkseppan left a comment

Choose a reason for hiding this comment