New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup pdftex.map parsing. #19538
Merged
Merged
Speedup pdftex.map parsing. #19538
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tacaswell
reviewed
Feb 19, 2021
anntzer
force-pushed
the
psfontsmap
branch
2 times, most recently
from
February 22, 2021 21:42
ba7f9fd
to
650802f
Compare
anntzer
force-pushed
the
psfontsmap
branch
2 times, most recently
from
March 25, 2021 12:21
f57cb14
to
40ddab7
Compare
QuLogic
reviewed
Apr 2, 2021
For reminder, pdftex.map is a file that maps tex font names ("cmr10") to filesystem font names ("cmr10.pfb"), together with additional metadata (font encoding, postscript special commands). When using pdf output with usetex, we parse usetex-generated dvi files and then need to locate and load these fonts for embedding into the pdf file, hence then need to parse pdftex.map. On some systems (likely with large texlive installs), pdftex.map can be really large (>10^4 entries), and parsing it is quite slow (>500ms on the matplotlib macos). This patch implements a new (simpler?) parser, which is ~25% faster (so it can cut hundreds of ms on systems with large maps). The patch additionally correctly handles entries of the form `foo <bar.pfb` (i.e., with no postscript font name -- in that case the docs say that the postscript font name is the same as the tfm name). On the other hand, the patch also drops support for quotes around anything but the postscript specials (in accordance with the psfonts.map docs, and the actual pdftex implementation in `src/texk/web2c/pdftexdir/mapfile.c`: `case '"': /* opening quote */` only handles postscript specials). See also changes to test.map for the changes in supported syntax.
See previous commit for description of pdftex.map. The vast majority of entries (dozens of thousands) in pdftex.map actually end up being unused, and their parsing is just wasted. This patch takes advantage of the fact that we can quickly recover the tex font name from pdftex.map entries (it's just the first word), so we can very quickly build a mapping of tex font names to unparsed pdftex.map entries, and then only parse the few entries that we'll need on-demand. This speeds up e.g. ``` python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")' ``` by ~700ms (~20%) on the matplotlib macos.
QuLogic
approved these changes
Apr 6, 2021
jkseppan
approved these changes
May 1, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement, thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1st commit:
For reminder, pdftex.map is a file that maps tex font names ("cmr10") to
filesystem font names ("cmr10.pfb"), together with additional metadata
(font encoding, postscript special commands). When using pdf output
with usetex, we parse usetex-generated dvi files and then need to locate
and load these fonts for embedding into the pdf file, hence then need to
parse pdftex.map.
On some systems (likely with large texlive installs), pdftex.map can be
really large (>10^4 entries), and parsing it is quite slow (>500ms on
the matplotlib macos).
This patch implements a new (simpler?) parser, which is ~25% faster
(so it can cut hundreds of ms on systems with large maps). The patch
additionally correctly handles entries of the form
foo <bar.pfb
(i.e., with no postscript font name -- in that case the docs say that
the postscript font name is the same as the tfm name). On the other
hand, the patch also drops support for quotes around anything but the
postscript specials (in accordance with the psfonts.map docs, and the
actual pdftex implementation in
src/texk/web2c/pdftexdir/mapfile.c
:case '"': /* opening quote */
only handles postscript specials). Seealso changes to test.map for the changes in supported syntax.
2nd commit:
See previous commit for description of pdftex.map. The vast majority
of entries (dozens of thousands) in pdftex.map actually end up being
unused, and their parsing is just wasted. This patch takes advantage of
the fact that we can quickly recover the tex font name from pdftex.map
entries (it's just the first word), so we can very quickly build a
mapping of tex font names to unparsed pdftex.map entries, and then only
parse the few entries that we'll need on-demand. This speeds up e.g.
by ~700ms (~20%) on the matplotlib macos.
PR Summary
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).