Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using pickle - deprecating the font caching mechanism + .pkl font files definitions parsing #345

Closed
Lucas-C opened this issue Mar 1, 2022 · 5 comments · Fixed by #347

Comments

@Lucas-C
Copy link
Member

Lucas-C commented Mar 1, 2022

Intent
The pickle module is currently used in fpdf.py to implement a font caching mechanism.
However this library is notoriously dangerous: https://intoli.com/blog/dangerous-pickles/
bandit warned us about it: .banditrc.yml

Solution

  1. Find out if how useful is this font caching mechanism: figure the speed improvement it provides:
    a. in a single Python script execution, with in-memory caching (no impact expected)
    b. when several consecutive calls to a Python script are made (there the cache should have some use)
  2. If it is useful to keep a caching mechanism, implement another one
@Spenhouet
Copy link

I guess this is referring to the font caching file.
Some quick notes on that since it bothered me and I wanted to deactivate it.

The docs here state:

Such metrics are stored using the Python Pickle format (.pkl extension), by default in the font directory

but the docs here state:

Cache files are created in the current folder by default.

where only the second one is correct.

The caching can be disabled via the font_cache_dir param of the FPDF class as explained here.
But the docs state that one can disable it by setting None and that the default value is True.
Setting None results in a type error (only linter, not runtime).
Setting False results in a runtime error.
Feels like this could be improved (or just removed in general as suggested here).

@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 4, 2022

I opened PR #347 to deprecate both the font caching mechanism and prepare deprecation for .pkl font files definitions parsing.

This won't be enough to get rid right now of the pickle, but we should be able to do so in the next non-patch fpdf2 release.

Regarding performances, I did a quick comparison using this test code:

issue_345.py
import fpdf, warnings
from fpdf.ttfonts import TTFontFile


class MyTTFontFile(TTFontFile):
    def getCMAP4(self, unicode_cmap_offset, glyphToChar, charToGlyph):
        TTFontFile.getCMAP4(self, unicode_cmap_offset, glyphToChar, charToGlyph)
        self.saveChar = charToGlyph
    def getCMAP12(self, unicode_cmap_offset, glyphToChar, charToGlyph):
        TTFontFile.getCMAP12(self, unicode_cmap_offset, glyphToChar, charToGlyph)
        self.saveChar = charToGlyph

def main():
    warnings.simplefilter('ignore', UserWarning)
    for font_filename in ("DejaVuSans.ttf", "DroidSansFallback.ttf", "Roboto-Regular.ttf", "cmss12.ttf"):
        font_path = "test/fonts/" + font_filename
        font_name = font_filename.split(".")[0]
        pdf = fpdf.FPDF()
        pdf.add_page()
        pdf.add_font(font_name, fname=font_path, uni=True)
        pdf.set_font(font_name, size=10)
        ttf = MyTTFontFile()
        ttf.getMetrics(font_path)
        # Create a PDF with the first 999 charters defined in the font:
        for counter, character in enumerate(ttf.saveChar, 0):
            pdf.write(8, f"{counter:03}) {character:03x} - {character:c}")
            pdf.ln()
            if counter >= 999:
                break
        pdf.output(f"charmap_first_999_chars-{font_name}.pdf")

The result of running python -mtimeit -n10 --setup 'from issue_345 import main; main()' 'main()' on my machine are then:

  • initial version, with pickle-based cache: 10 loops, best of 5: 866 msec per loop
  • new version, without pickle-based cache: 10 loops, best of 5: 1.17 sec per loop

Which means that without this cache, if we build many different PDF documents from different fpdf.FPDF instances, we loose approximatively 40ms per font file.

This seems fine with me.

@Lucas-C Lucas-C self-assigned this Mar 4, 2022
@Lucas-C Lucas-C changed the title Stop using pickle Stop using pickle - deprecating the font caching mechanism + .pkl font files definitions parsing Mar 4, 2022
@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 4, 2022

@Spenhouet would you kindly review the PR please ? 😊

@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 7, 2022

Deprecation warnings introduced in PR #347 have been released in new patch version v2.5.1

Next step will be to fully remove the dependency on the pickle module in the next new minor version

Lucas-C added a commit that referenced this issue Apr 23, 2022
@Lucas-C
Copy link
Member Author

Lucas-C commented Apr 23, 2022

I opened #402 to finally get rid of the dependency on pickle in the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants