Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup pdftex.map parsing. #19538

Merged
merged 2 commits into from May 1, 2021
Merged

Speedup pdftex.map parsing. #19538

merged 2 commits into from May 1, 2021

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Feb 18, 2021

1st commit:

For reminder, pdftex.map is a file that maps tex font names ("cmr10") to
filesystem font names ("cmr10.pfb"), together with additional metadata
(font encoding, postscript special commands). When using pdf output
with usetex, we parse usetex-generated dvi files and then need to locate
and load these fonts for embedding into the pdf file, hence then need to
parse pdftex.map.

On some systems (likely with large texlive installs), pdftex.map can be
really large (>10^4 entries), and parsing it is quite slow (>500ms on
the matplotlib macos).

This patch implements a new (simpler?) parser, which is ~25% faster
(so it can cut hundreds of ms on systems with large maps). The patch
additionally correctly handles entries of the form foo <bar.pfb
(i.e., with no postscript font name -- in that case the docs say that
the postscript font name is the same as the tfm name). On the other
hand, the patch also drops support for quotes around anything but the
postscript specials (in accordance with the psfonts.map docs, and the
actual pdftex implementation in src/texk/web2c/pdftexdir/mapfile.c:
case '"': /* opening quote */ only handles postscript specials). See
also changes to test.map for the changes in supported syntax.

2nd commit:

See previous commit for description of pdftex.map. The vast majority
of entries (dozens of thousands) in pdftex.map actually end up being
unused, and their parsing is just wasted. This patch takes advantage of
the fact that we can quickly recover the tex font name from pdftex.map
entries (it's just the first word), so we can very quickly build a
mapping of tex font names to unparsed pdftex.map entries, and then only
parse the few entries that we'll need on-demand. This speeds up e.g.

python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")'

by ~700ms (~20%) on the matplotlib macos.

PR Summary

PR Checklist

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (run flake8 on changed files to check).
  • New features are documented, with examples if plot related.
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

@tacaswell
Copy link
Member

@anntzer anntzer force-pushed the psfontsmap branch 2 times, most recently from ba7f9fd to 650802f Compare February 22, 2021 21:42
@anntzer anntzer force-pushed the psfontsmap branch 2 times, most recently from f57cb14 to 40ddab7 Compare March 25, 2021 12:21
For reminder, pdftex.map is a file that maps tex font names ("cmr10") to
filesystem font names ("cmr10.pfb"), together with additional metadata
(font encoding, postscript special commands).  When using pdf output
with usetex, we parse usetex-generated dvi files and then need to locate
and load these fonts for embedding into the pdf file, hence then need to
parse pdftex.map.

On some systems (likely with large texlive installs), pdftex.map can be
really large (>10^4 entries), and parsing it is quite slow (>500ms on
the matplotlib macos).

This patch implements a new (simpler?) parser, which is ~25% faster
(so it can cut hundreds of ms on systems with large maps).  The patch
additionally correctly handles entries of the form `foo <bar.pfb`
(i.e., with no postscript font name -- in that case the docs say that
the postscript font name is the same as the tfm name).  On the other
hand, the patch also drops support for quotes around anything but the
postscript specials (in accordance with the psfonts.map docs, and the
actual pdftex implementation in `src/texk/web2c/pdftexdir/mapfile.c`:
`case '"': /* opening quote */` only handles postscript specials).  See
also changes to test.map for the changes in supported syntax.
See previous commit for description of pdftex.map.  The vast majority
of entries (dozens of thousands) in pdftex.map actually end up being
unused, and their parsing is just wasted.  This patch takes advantage of
the fact that we can quickly recover the tex font name from pdftex.map
entries (it's just the first word), so we can very quickly build a
mapping of tex font names to unparsed pdftex.map entries, and then only
parse the few entries that we'll need on-demand.  This speeds up e.g.
```
python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")'
```
by ~700ms (~20%) on the matplotlib macos.
@QuLogic QuLogic added this to the v3.5.0 milestone Apr 6, 2021
@jkseppan jkseppan self-requested a review May 1, 2021 11:54
Copy link
Member

@jkseppan jkseppan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement, thanks!

@jkseppan jkseppan merged commit b85e958 into matplotlib:master May 1, 2021
@anntzer anntzer deleted the psfontsmap branch May 1, 2021 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants