Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to disable the images cache to reduce the memory footprint #1153

Open
ggrossetie opened this issue Mar 19, 2020 · 3 comments
Open

Allow to disable the images cache to reduce the memory footprint #1153

ggrossetie opened this issue Mar 19, 2020 · 3 comments

Comments

@ggrossetie
Copy link

Currently, Prawn is using an image registry to "cache" images:

prawn/lib/prawn/images.rb

Lines 84 to 107 in d980247

def build_image_object(file)
image_content = verify_and_read_image(file)
image_sha1 = Digest::SHA1.hexdigest(image_content)
# if this image has already been embedded, just reuse it
if image_registry[image_sha1]
info = image_registry[image_sha1][:info]
image_obj = image_registry[image_sha1][:obj]
else
# Build the image object
info = Prawn.image_handler.find(image_content).new(image_content)
# Bump PDF version if the image requires it
if info.respond_to?(:min_pdf_version)
renderer.min_version(info.min_pdf_version)
end
# Add the image to the PDF and register it in case we see it again.
image_obj = info.build_pdf_object(self)
image_registry[image_sha1] = { obj: image_obj, info: info }
end
[image_obj, info]
end

Here's the memory consumption for a single 321,2 ko PNG image (2144x784 pixels) using the following Ruby code:

require 'memory_profiler'
require 'prawn'

report = MemoryProfiler.report(top: 5) do
  Prawn::Document.generate('test.pdf') do
    image('/path/to/image.png')
  end
end

report.pretty_print
With cache
Total allocated: 160712565 bytes (3367976 objects)
Total retained:  16736057 bytes (158 objects)

allocated memory by gem
-----------------------------------
 150102211  prawn-2.2.2
  10610274  pdf-core-0.7.0
        80  other

allocated memory by file
-----------------------------------
 150080338  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
   9662518  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
    408056  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb
    281890  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
    237170  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb

allocated memory by location
-----------------------------------
  67235840  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:292
  67235840  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:293
   9437600  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:42
   7776841  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:93
   5043761  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:275

allocated memory by class
-----------------------------------
 160578221  String
     96656  Array
     21576  Hash
      8656  File
      2304  Thread

allocated objects by gem
-----------------------------------
   3363709  prawn-2.2.2
      4265  pdf-core-0.7.0
         2  other

allocated objects by file
-----------------------------------
   3363548  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
      3846  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
       116  /path/to/gems/prawn-2.2.2/lib/prawn/document.rb
       114  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb
       113  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb

allocated objects by location
-----------------------------------
   1680896  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:292
   1680896  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:293
      2037  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb:77
       784  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:288
       784  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:289

allocated objects by class
-----------------------------------
   3366001  String
      1827  Array
        75  Hash
        11  PDF::Core::FilterList
        11  PDF::Core::Reference

retained memory by gem
-----------------------------------
   9685706  pdf-core-0.7.0
   7050351  prawn-2.2.2

retained memory by file
-----------------------------------
   9438400  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
   7047750  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
    235554  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb
      6040  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb
      2512  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/object_store.rb

retained memory by location
-----------------------------------
   9437520  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:42
   5043513  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:275
   1681721  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:279
    321276  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:37
    235554  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb:16

retained memory by class
-----------------------------------
  16721641  String
      8584  Hash
      2400  Array
       880  PDF::Core::Reference
       456  Class

retained objects by gem
-----------------------------------
       131  pdf-core-0.7.0
        27  prawn-2.2.2

retained objects by file
-----------------------------------
        34  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb
        28  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
        16  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/object_store.rb
        16  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
        13  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filter_list.rb

retained objects by location
-----------------------------------
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filter_list.rb:5
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/object_store.rb:59
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb:18
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:15
         8  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb:77

retained objects by class
-----------------------------------
        44  Array
        43  String
        25  Hash
        11  PDF::Core::FilterList
        11  PDF::Core::Reference
Without cache
Total allocated: 160711900 bytes (3367970 objects)
Total retained:  10019166 bytes (154 objects)

allocated memory by gem
-----------------------------------
 150101546  prawn-2.2.2
  10610274  pdf-core-0.7.0
        80  other

allocated memory by file
-----------------------------------
 150080338  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
   9662518  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
    408056  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb
    281890  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
    237170  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb

allocated memory by location
-----------------------------------
  67235840  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:292
  67235840  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:293
   9437600  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:42
   7776841  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:93
   5043761  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:275

allocated memory by class
-----------------------------------
 160578060  String
     96656  Array
     21112  Hash
      8656  File
      2304  Thread

allocated objects by gem
-----------------------------------
   3363703  prawn-2.2.2
      4265  pdf-core-0.7.0
         2  other

allocated objects by file
-----------------------------------
   3363548  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
      3846  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
       116  /path/to/gems/prawn-2.2.2/lib/prawn/document.rb
       114  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb
       113  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb

allocated objects by location
-----------------------------------
   1680896  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:292
   1680896  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:293
      2037  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb:77
       784  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:288
       784  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:289

allocated objects by class
-----------------------------------
   3365998  String
      1827  Array
        73  Hash
        11  PDF::Core::FilterList
        11  PDF::Core::Reference

retained memory by gem
-----------------------------------
   9685738  pdf-core-0.7.0
    333428  prawn-2.2.2

retained memory by file
-----------------------------------
   9438400  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
    323052  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb
    235554  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb
     10256  /path/to/gems/prawn-2.2.2/lib/prawn/document.rb
      6040  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb

retained memory by location
-----------------------------------
   9437520  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:42
    321276  /path/to/gems/prawn-2.2.2/lib/prawn/images/png.rb:37
    235554  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filters.rb:16
      8464  /path/to/gems/prawn-2.2.2/lib/prawn/document.rb:386
      2784  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb:43

retained memory by class
-----------------------------------
   9996286  String
      8424  File
      8120  Hash
      2400  Array
       880  PDF::Core::Reference

retained objects by gem
-----------------------------------
       131  pdf-core-0.7.0
        23  prawn-2.2.2

retained objects by file
-----------------------------------
        34  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/page.rb
        28  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb
        16  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/object_store.rb
        15  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb
        13  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filter_list.rb

retained objects by location
-----------------------------------
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/filter_list.rb:5
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/object_store.rb:59
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/reference.rb:18
        11  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/stream.rb:15
         8  /path/to/gems/pdf-core-0.7.0/lib/pdf/core/pdf_object.rb:77

retained objects by class
-----------------------------------
        44  Array
        39  String
        23  Hash
        11  PDF::Core::FilterList
        11  PDF::Core::Reference

As you can see the retained memory goes from 150102211 bytes (150 Mbytes) to 10019166 bytes (10 Mbytes).

On a large PDF document (300 pages with 60 images) the memory consumption goes from more than 8Gb to 5Gb.
Surprisingly, the process is also faster! I don't know the exact reason but it might be related to the fact that we do less work (ie. we do not compute a hexdigest for every image) and that there's less memory pressure.

I guess it makes sense to use a cache when an image is used on every page (for instance, when using a logo on the header or footer) but in practice it can do more harm than good.

Maybe we should add an option to decide if we want to cache the image or not? And/or a global flag to enable/disable the cache?

@ggrossetie
Copy link
Author

ggrossetie commented Mar 26, 2020

Maybe we should add an option to decide if we want to cache the image or not? And/or a global flag to enable/disable the cache?

We could also configure a max cache size and/or a threshold to store an image in the cache if the size of the image is lower than X bytes.

Let me know what you think.

@Fustrate
Copy link

I ran into a somewhat related problem, where adding the same image multiple times was reading that file from disk every time in order to hash its content and determine if it was already added. My way of solving the problem was to add an explicit cached_image method, and since I saw this issue, I also added an uncached_image method that skips any caching logic.

https://gist.github.com/Fustrate/cf1b3ce8b227e385287963c23edb8c72#file-prawn-images-rb

My use case is Rails-related, so I put this in an initializer, but it's framework agnostic. Note that this won't work out of the box with images inside groups from the prawn-grouping gem - that gem pretty much needs to be rewritten to work this way, so there's an included prawn-grouping.rb file that I use.

If you still have this problem, and you're only interested in skipping the cache, you only need the uncached_image and cacheable_image_data methods. They're just the original methods with any caching logic stripped out.

@ggrossetie
Copy link
Author

Thanks for sharing ☺️
That was my idea but I want to hear from the maintainer(s) before implementing this feature.
I don't really want to "monkey patch" the methods, I would prefer an option to enable/disable this mechanism (or some sort of configuration).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants