Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up) #17428

Merged
merged 3 commits into from
Dec 22, 2023

Commits on Dec 21, 2023

  1. Add iteration support in the PDFObjects class

    This (obviously) only includes "resolved" data, and will be used in an upcoming patch.
    Snuffleupagus committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    b09f238 View commit details
    Browse the repository at this point in the history
  2. Compute the length of the final image-bitmap/data on the worker-thread

    Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.
    Snuffleupagus committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    e547b19 View commit details
    Browse the repository at this point in the history
  3. Attempt to further reduce re-parsing for globally cached images (PR 1…

    …1912, 16108 follow-up)
    
    In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
    However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
    
    Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
    
    For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
     - With the `master`-branch it takes >600 ms to render.
     - With this patch that goes down to ~50 ms, which is one order of magnitude faster.
    
    (Note that all other pages are, as expected, completely unaffected by these changes.)
    
    This new main-thread copying is limited to "large" global images, since:
     - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
     - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
     - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
    Snuffleupagus committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    9f02cc3 View commit details
    Browse the repository at this point in the history