Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce regexps computed for parsing kramdown lists #596

wants to merge 5 commits into from


Copy link

@ashmaroli ashmaroli commented May 6, 2019


Interpolated regular expressions within a frequently called method allocates memory for each generated expression. When possible, interpolation can be limited to just once by using the /o flag. But that is not possible in this situation.

While profiling the build for Jekyll's documentation site (via memory_profiler), I came across this hot path where the regex /^( {0,1}[+*-])([\t| ].*?\n)/ was allocated a significant amount of times closely followed by /^( {0,2}[+*-])([\t| ].*?\n)/ among a few others.

If it is of any relevance, the summary from the profiling is:


Total allocated: 158.09 MB (1168556 objects)
Total retained:  10.3 MB (92547 objects)


Total allocated: 157.29 MB (1163537 objects)
Total retained:  10.3 MB (92556 objects)

Copy link

@gettalong gettalong commented Apr 17, 2020

@ashmaroli Could you profile/benchmakr this change again? The change is quite large for a ~0,4% win on allocated objects.

Copy link
Contributor Author

@ashmaroli ashmaroli commented Apr 20, 2020

Using the following script to profile the changes:

# frozen_string_literal: true

require 'memory_profiler'
require 'bundler/setup'
require 'kramdown'

CONTENT = (<<~TEXT * 500)
  Once upon a time in a far away land

  * hello
  * world

  And this other time it was

    * hello again
    * worldwide


class Profiler
  def self.profile(label, &block)
    report =
    total_allocated_output = report.scale_bytes(report.total_allocated_memsize)
    total_retained_output  = report.scale_bytes(report.total_retained_memsize)
    puts "Total allocated: #{total_allocated_output} (#{report.total_allocated} objects)"
    puts "Total retained:  #{total_retained_output} (#{report.total_retained} objects)"
    report.pretty_print(to_file: "#{label}-memprof.tmp", scale_bytes: true )

Profiler.profile(ARGV[0]) { }
ruby profile.rb master

Total allocated: 29.11 MB (220240 objects)
Total retained:  840.45 kB (2971 objects)

and on this branch (with updates on master pulled)

ruby profile.rb pull-request

Total allocated: 26.07 MB (202381 objects)
Total retained:  843.01 kB (2988 objects)

Copy link

@gettalong gettalong commented Apr 21, 2020

@ashmaroli Thanks for the profiling information!

So it seems that for very heavy usage of lists (nearly only lists), there can be a win of about 10% on allocated objects but there is no difference in execution as far as I could measure.

I also profiled some "standard" kramdown documents and there the win is also only 0,4% , like with your original profiling stats.

Nonetheless, I have merged your changes! Thanks again!

@gettalong gettalong closed this Apr 21, 2020
@ashmaroli ashmaroli deleted the reduce-computed-regexps branch Apr 22, 2020
Copy link
Contributor Author

@ashmaroli ashmaroli commented Apr 22, 2020

Thank you for merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants