Reduce memory allocations and enhance performance #90

gettalong · 2021-01-15T23:39:26Z

I ran a benchmark using Prawn together with a TrueType font handled by
ttfunk which allocated a staggering 107 million objects (compared to 1.6
million objects for the same benchmark using HexaPDF).

A 5min investigation revealed two spots that when optimized reduced the
number to about 32 million objects.

TTFunk::Subset::CodePage#from_unicode:

Don't create an array and use #pack to convert a codepoint to a
character, just add the codepoint directly, saving an array allocation.

Furthermore, the created string can be modified using #encode! since
it is thrown away anyway.
TTFunk::SubsetCollection#use:

The inner loop is called many times. By using a while loop instead of
an iterator with a block we avoid allocating and calling the block.

lib/ttfunk/subset/code_page.rb

pointlessone · 2021-01-16T09:12:33Z

Thank you for your contribution.

I'm not quite sure if object allocations is a good metric to optimize. How performance or memory usage are affected?

gettalong · 2021-01-16T11:35:04Z

All created objects must sometime be collected by the garbage collector. The more short-lived objects are created the more the garbage collector has to do. So reducing object allocations leads to better performance although the gains may not be as much as for other performance related changes.

In this case we are reducing a huge amount of allocations which will be performance relevant:

|--------------------------------------------------------------------|
|                              ||    Time |     Memory |   File size |
|--------------------------------------------------------------------|
| prawn 2.1.0 | 10x            |  2.824ms |  71.804KiB |   5.861.065 |
| prawn 2.2.0 | 10x            |  2.935ms |  76.452KiB |   6.170.089 |
| prawn 2.3.0 | 10x            |  3.658ms |  74.956KiB |   6.170.089 |
| prawn       | 10x            |  4.758ms |  74.928KiB |   6.170.089 |
| prawn-dev   | 10x            |  4.804ms |  74.924KiB |   6.170.089 |
|--------------------------------------------------------------------|
| prawn 2.1.0 | 10x ttf        |  8.129ms |  74.512KiB |   5.868.049 |
| prawn 2.2.0 | 10x ttf        | 17.205ms |  76.976KiB |   6.177.034 |
| prawn 2.3.0 | 10x ttf        | 18.104ms |  77.664KiB |   6.177.032 |
| prawn       | 10x ttf        | 19.017ms |  77.864KiB |   6.177.032 |
| prawn-dev   | 10x ttf        | 14.207ms |  77.712KiB |   6.177.032 |
|--------------------------------------------------------------------|

As you can see the TTF version of the benchmark is still much slower than the one using built-in PDF fonts. But in comparison to prawn 2.1 - 2.4 we are much faster when applying this patch.

Max memory usage is not affected because the garbage collector can and does remove the short-lived objects quickly.

lib/ttfunk/subset/code_page.rb

camertron · 2021-01-19T17:41:23Z

lib/ttfunk/subset_collection.rb

@@ -17,12 +17,16 @@ def [](subset)
    def use(characters)
      characters.each do |char|
        covered = false
-        @subsets.each_with_index do |subset, _i|


Why are we even using each_with_index if we never use the index anywhere? The while loop is fine, but can we get the same allocation wins with a plain 'ol each?

As far as I know the reason for the allocation is the block itself. So if we would use each instead of each_with_index there would still be an allocation.

Furthermore, there is an non-negligible performance hit when using a block instead of a plain while loop due to the block invocation. So apart from saving the allocation the while loop is also faster which is especially visible in such an often called inner loop.

That's super interesting, I wonder why blocks cause an allocation? I do vaguely remember hearing something about that at a Railsconf one time though.

Yes, there was a talk some time ago that went through (nearly) all the Enumerable methods to show the time/space complexity - I also don't remember its title. However, there is a great talk by Jeremy Evans about optimizations in Roda/Sequel: http://code.jeremyevans.net/presentations/rubykaigi2019/index.html#83 (this links directly to the "avoid proc allocation slide").

I ran a benchmark using Prawn together with a TrueType font handled by ttfunk which allocated a staggering 107 million objects (compared to 1.6 million objects for the same benchmark using HexaPDF). A 5min investigation revealed two spots that when optimized reduced the number to about 5 million objects. * TTFunk::Subset::CodePage#from_unicode: Don't create an array and use #pack to convert a codepoint to a character, just add the codepoint directly, saving an array allocation. Furthermore, the created string can be modified using #encode! since it is thrown away anyway. Last, since the mapping is static use an internal cache for the mapping. * TTFunk::SubsetCollection#use: The inner loop is called many times. By using a while loop instead of an iterator with a block we avoid allocating and calling the block.

gettalong · 2021-01-19T22:58:41Z

@camertron @pointlessone Together with the other two pull requests and after applying the caching fix as per @camertron total allocations are down to around 4.4 million objects. The reduction of 96% of the allocated objects lead to a runtime decrease for the benchmark in question (HexaPDF raw_text benchmark 10x with TrueType font) of 64% (from 19,8 seconds down to 7.2 seconds).

gettalong requested review from pointlessone and camertron January 15, 2021 23:39

gettalong commented Jan 15, 2021

View reviewed changes

lib/ttfunk/subset/code_page.rb Show resolved Hide resolved

gettalong mentioned this pull request Jan 17, 2021

Reduce memory allocations and enhance performance prawnpdf/prawn#1194

Merged

camertron suggested changes Jan 19, 2021

View reviewed changes

gettalong force-pushed the performance-optimization branch from a82a91c to caebe64 Compare January 19, 2021 22:53

camertron approved these changes Jan 21, 2021

View reviewed changes

pointlessone approved these changes Jan 21, 2021

View reviewed changes

pointlessone merged commit 1b02af5 into master Jan 21, 2021

pointlessone deleted the performance-optimization branch January 21, 2021 08:53

hidakatsuya mentioned this pull request Jan 21, 2021

30x slower depending on the ttfunk v1.6.x or later than depending on the ttfunk v1.5.x thinreports/thinreports-generator#104

Closed

yoshoku mentioned this pull request Jan 15, 2022

Please release the new version #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory allocations and enhance performance #90

Reduce memory allocations and enhance performance #90

gettalong commented Jan 15, 2021

pointlessone commented Jan 16, 2021

gettalong commented Jan 16, 2021

camertron Jan 19, 2021

gettalong Jan 19, 2021

camertron Jan 21, 2021

gettalong Jan 21, 2021

gettalong commented Jan 19, 2021

Reduce memory allocations and enhance performance #90

Reduce memory allocations and enhance performance #90

Conversation

gettalong commented Jan 15, 2021

pointlessone commented Jan 16, 2021

gettalong commented Jan 16, 2021

camertron Jan 19, 2021

Choose a reason for hiding this comment

gettalong Jan 19, 2021

Choose a reason for hiding this comment

camertron Jan 21, 2021

Choose a reason for hiding this comment

gettalong Jan 21, 2021

Choose a reason for hiding this comment

gettalong commented Jan 19, 2021