Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup pdftex.map parsing. #19538

Merged
merged 2 commits into from
May 1, 2021
Merged

Speedup pdftex.map parsing. #19538

merged 2 commits into from
May 1, 2021

Commits on Apr 6, 2021

  1. Speedup pdftex.map parsing.

    For reminder, pdftex.map is a file that maps tex font names ("cmr10") to
    filesystem font names ("cmr10.pfb"), together with additional metadata
    (font encoding, postscript special commands).  When using pdf output
    with usetex, we parse usetex-generated dvi files and then need to locate
    and load these fonts for embedding into the pdf file, hence then need to
    parse pdftex.map.
    
    On some systems (likely with large texlive installs), pdftex.map can be
    really large (>10^4 entries), and parsing it is quite slow (>500ms on
    the matplotlib macos).
    
    This patch implements a new (simpler?) parser, which is ~25% faster
    (so it can cut hundreds of ms on systems with large maps).  The patch
    additionally correctly handles entries of the form `foo <bar.pfb`
    (i.e., with no postscript font name -- in that case the docs say that
    the postscript font name is the same as the tfm name).  On the other
    hand, the patch also drops support for quotes around anything but the
    postscript specials (in accordance with the psfonts.map docs, and the
    actual pdftex implementation in `src/texk/web2c/pdftexdir/mapfile.c`:
    `case '"': /* opening quote */` only handles postscript specials).  See
    also changes to test.map for the changes in supported syntax.
    anntzer committed Apr 6, 2021
    Configuration menu
    Copy the full SHA
    2d5883f View commit details
    Browse the repository at this point in the history
  2. Parse PsfontMap entries on-demand.

    See previous commit for description of pdftex.map.  The vast majority
    of entries (dozens of thousands) in pdftex.map actually end up being
    unused, and their parsing is just wasted.  This patch takes advantage of
    the fact that we can quickly recover the tex font name from pdftex.map
    entries (it's just the first word), so we can very quickly build a
    mapping of tex font names to unparsed pdftex.map entries, and then only
    parse the few entries that we'll need on-demand.  This speeds up e.g.
    ```
    python -c 'from pylab import *; rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf")'
    ```
    by ~700ms (~20%) on the matplotlib macos.
    anntzer committed Apr 6, 2021
    Configuration menu
    Copy the full SHA
    04d28e9 View commit details
    Browse the repository at this point in the history