Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce regexps computed for parsing kramdown lists #596

Closed
wants to merge 5 commits into from

Conversation

ashmaroli
Copy link
Contributor

Rationale

Interpolated regular expressions within a frequently called method allocates memory for each generated expression. When possible, interpolation can be limited to just once by using the /o flag. But that is not possible in this situation.

While profiling the build for Jekyll's documentation site (via memory_profiler), I came across this hot path where the regex /^( {0,1}[+*-])([\t| ].*?\n)/ was allocated a significant amount of times closely followed by /^( {0,2}[+*-])([\t| ].*?\n)/ among a few others.

If it is of any relevance, the summary from the profiling is:

Before

Total allocated: 158.09 MB (1168556 objects)
Total retained:  10.3 MB (92547 objects)

After

Total allocated: 157.29 MB (1163537 objects)
Total retained:  10.3 MB (92556 objects)

@gettalong
Copy link
Owner

@ashmaroli Could you profile/benchmakr this change again? The change is quite large for a ~0,4% win on allocated objects.

@ashmaroli
Copy link
Contributor Author

Using the following script to profile the changes:

# frozen_string_literal: true

require 'memory_profiler'
require 'bundler/setup'
require 'kramdown'

CONTENT = (<<~TEXT * 500)
  Once upon a time in a far away land

  * hello
  * world

  And this other time it was

    * hello again
    * worldwide

TEXT

class Profiler
  def self.profile(label, &block)
    report = MemoryProfiler.report(&block)
    total_allocated_output = report.scale_bytes(report.total_allocated_memsize)
    total_retained_output  = report.scale_bytes(report.total_retained_memsize)
    puts "Total allocated: #{total_allocated_output} (#{report.total_allocated} objects)"
    puts "Total retained:  #{total_retained_output} (#{report.total_retained} objects)"
    report.pretty_print(to_file: "#{label}-memprof.tmp", scale_bytes: true )
  end
end

Profiler.profile(ARGV[0]) { Kramdown::Document.new(CONTENT).to_html }
ruby profile.rb master

Total allocated: 29.11 MB (220240 objects)
Total retained:  840.45 kB (2971 objects)

and on this branch (with updates on master pulled)

ruby profile.rb pull-request

Total allocated: 26.07 MB (202381 objects)
Total retained:  843.01 kB (2988 objects)

@gettalong
Copy link
Owner

@ashmaroli Thanks for the profiling information!

So it seems that for very heavy usage of lists (nearly only lists), there can be a win of about 10% on allocated objects but there is no difference in execution as far as I could measure.

I also profiled some "standard" kramdown documents and there the win is also only 0,4% , like with your original profiling stats.

Nonetheless, I have merged your changes! Thanks again!

@gettalong gettalong closed this Apr 21, 2020
@ashmaroli ashmaroli deleted the reduce-computed-regexps branch April 22, 2020 05:58
@ashmaroli
Copy link
Contributor Author

Thank you for merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants