Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster character mapping #5299

Merged
merged 4 commits into from Oct 30, 2015
Merged

Conversation

mdboom
Copy link
Member

@mdboom mdboom commented Oct 22, 2015

This is a follow-on to #5295.

Don't cache the charmap and inverse charmap

mathtext creates Python dictionaries for the charmap and inverse charmap
for each font.  This turns out to be unnecessary:

1) freetype has an API to do a charmap lookup that is faster than a
Python dictionary

2) The inverse charmap isn't really necessary if we convert the
latex_to_bakoma to use unicode character points rather than glyph
indices.

This should have a large impact when #5241 is merged with larger fonts.

@mdboom mdboom added this to the next major release (2.0) milestone Oct 22, 2015
from matplotlib.font_manager import findfont
from matplotlib.ft2font import FT2Font, LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \
from matplotlib.font_manager import findfont, get_font
from matplotlib.ft2font import LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you are touching this can you get rid of the \

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@tacaswell
Copy link
Member

Did you mean to include the script M commits in this PR?

@mdboom
Copy link
Member Author

mdboom commented Oct 26, 2015

Yes -- I meant to include the M script commit here. It turns out that these changes causes that bug to manifest itself in a different way. So to avoid that failure, I just included the fix here. Now that the fix is merged though, I can take it back out after rebasing.

@mdboom mdboom force-pushed the faster-character-mapping branch 2 times, most recently from 81ff794 to 4b2306d Compare October 26, 2015 16:48
@mdboom
Copy link
Member Author

mdboom commented Oct 27, 2015

Status update: This is ready to go once we turn off Python 2.6 (#5215) and include the file-handle fix (#5295)

@mdboom mdboom force-pushed the faster-character-mapping branch 2 times, most recently from c4af76a to 4f11fb6 Compare October 27, 2015 19:16
This should hopefully address the long-reported "Too many open files"
error message (Fix matplotlib#3315).

To reproduce: On a Mac or Windows box with starvation for file
handles (Linux has a much higher file handle limit by default), build
the docs, then immediately build again.  This will trigger the caching
bug.

The font cache in the mathtext renderer was broken.  It was caching a
font file once for every *combination* of font properties, including
things like size.  Therefore, in a complex math expression containing
many different sizes of the same font, the font file was opened once for
each of those sizes.

Font files are opened and kept open (rather than opened, read,
and closed) so that FreeType only needs to load the actual glyphs that
are used, rather than the entire font.  In an era of cheap memory and
fast disk, it probably doesn't matter for our current fonts, but once
 matplotlib#5214 is merged, we will have larger font files with many more glyphs
and this loading time will matter more.

The solution here is to do all font file loading in one place and to use
`lru_cache` (available since Python 3.2) to do the caching, and to use
only the file name and hinting parameters as a cache key.  For earlier
versions of Python, the functools32 backport package is required.  (Or
we can discuss whether we want to vendor it).
mathtext creates Python dictionaries for the charmap and inverse charmap
for each font.  This turns out to be unnecessary:

1) freetype has an API to do a charmap lookup that is faster than a
Python dictionary

2) The inverse charmap isn't really necessary if we convert the
latex_to_bakoma to use unicode character points rather than glyph
indices.

This should have a large impact when matplotlib#5241 is merged with larger fonts.
@mdboom
Copy link
Member Author

mdboom commented Oct 29, 2015

This is ready for a final review and merge now.

@mdboom
Copy link
Member Author

mdboom commented Oct 30, 2015

I guess the big question here is whether to require or vendor functools32 (required only for Python 2.7).

@tacaswell
Copy link
Member

I vote for require.

latex_to_bakoma = {
r'\oint' : ('cmex10', 45),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are the same numbers, just written in octal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually.

Here's the gist of this change. TrueType fonts have the concept of charcodes (which loosely corresponds to a Unicode codepoint if it's a Unicode font which most are these days) and glyph indices (which is just an array index into the location of the glyph within a file and mostly arbitrary). Font files contain a very fast N-1 mapping from charcode to gind, and FreeType has an API for this. However, the reverse mapping is not directly available.

Prior to this change, Python dictionaries were created for the forward and reverse mapping, consuming a lot of memory for large fonts and being entirely redundant in the case of ccode to gind.

The latex_to_bakoma mapping used to map from LaTeX name to gind (for reasons that are lost in the sands of time, and are different from every other table in this file). Since there's now no gind to ccode mapping, there was no longer a way to get both gind and ccode from the LaTeX name. This table has now been changed to map LaTeX names to ccode instead (and this was done automatically by a script, so I'm reasonably confident everything is correct).

Sorry for the long explanation. It's all bizarro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, your last commit message makes sense now

@jenshnielsen
Copy link
Member

I support require too.

tacaswell added a commit that referenced this pull request Oct 30, 2015
@tacaswell tacaswell merged commit aaa34e0 into matplotlib:master Oct 30, 2015
tacaswell added a commit that referenced this pull request Oct 30, 2015
@tacaswell
Copy link
Member

backported to v2.0.x as 454c330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants