-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory allocations and enhance performance #90
Conversation
Thank you for your contribution. I'm not quite sure if object allocations is a good metric to optimize. How performance or memory usage are affected? |
All created objects must sometime be collected by the garbage collector. The more short-lived objects are created the more the garbage collector has to do. So reducing object allocations leads to better performance although the gains may not be as much as for other performance related changes. In this case we are reducing a huge amount of allocations which will be performance relevant:
As you can see the TTF version of the benchmark is still much slower than the one using built-in PDF fonts. But in comparison to prawn 2.1 - 2.4 we are much faster when applying this patch. Max memory usage is not affected because the garbage collector can and does remove the short-lived objects quickly. |
@@ -17,12 +17,16 @@ def [](subset) | |||
def use(characters) | |||
characters.each do |char| | |||
covered = false | |||
@subsets.each_with_index do |subset, _i| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we even using each_with_index
if we never use the index anywhere? The while
loop is fine, but can we get the same allocation wins with a plain 'ol each
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know the reason for the allocation is the block itself. So if we would use each
instead of each_with_index
there would still be an allocation.
Furthermore, there is an non-negligible performance hit when using a block instead of a plain while
loop due to the block invocation. So apart from saving the allocation the while
loop is also faster which is especially visible in such an often called inner loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's super interesting, I wonder why blocks cause an allocation? I do vaguely remember hearing something about that at a Railsconf one time though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there was a talk some time ago that went through (nearly) all the Enumerable methods to show the time/space complexity - I also don't remember its title. However, there is a great talk by Jeremy Evans about optimizations in Roda/Sequel: http://code.jeremyevans.net/presentations/rubykaigi2019/index.html#83 (this links directly to the "avoid proc allocation slide").
I ran a benchmark using Prawn together with a TrueType font handled by ttfunk which allocated a staggering 107 million objects (compared to 1.6 million objects for the same benchmark using HexaPDF). A 5min investigation revealed two spots that when optimized reduced the number to about 5 million objects. * TTFunk::Subset::CodePage#from_unicode: Don't create an array and use #pack to convert a codepoint to a character, just add the codepoint directly, saving an array allocation. Furthermore, the created string can be modified using #encode! since it is thrown away anyway. Last, since the mapping is static use an internal cache for the mapping. * TTFunk::SubsetCollection#use: The inner loop is called many times. By using a while loop instead of an iterator with a block we avoid allocating and calling the block.
a82a91c
to
caebe64
Compare
@camertron @pointlessone Together with the other two pull requests and after applying the caching fix as per @camertron total allocations are down to around 4.4 million objects. The reduction of 96% of the allocated objects lead to a runtime decrease for the benchmark in question (HexaPDF raw_text benchmark 10x with TrueType font) of 64% (from 19,8 seconds down to 7.2 seconds). |
I ran a benchmark using Prawn together with a TrueType font handled by
ttfunk which allocated a staggering 107 million objects (compared to 1.6
million objects for the same benchmark using HexaPDF).
A 5min investigation revealed two spots that when optimized reduced the
number to about 32 million objects.
TTFunk::Subset::CodePage#from_unicode:
Don't create an array and use #pack to convert a codepoint to a
character, just add the codepoint directly, saving an array allocation.
Furthermore, the created string can be modified using #encode! since
it is thrown away anyway.
TTFunk::SubsetCollection#use:
The inner loop is called many times. By using a while loop instead of
an iterator with a block we avoid allocating and calling the block.