Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right size regular expression compile buffers #2696

Merged
merged 1 commit into from
Dec 10, 2020

Conversation

methodmissing
Copy link
Contributor

@methodmissing methodmissing commented Nov 26, 2019

As a continuation of type specific resize on freeze implementations of String and Array and looking into the Regexp type I found these memory access patterns for regular expression literals:

==22079== -------------------- 12 of 500 --------------------
==22079== max-live:    1,946,560 in 4,345 blocks
==22079== tot-alloc:   1,946,560 in 4,345 blocks (avg size 448.00)
==22079== deaths:      none (none of these blocks were freed)
==22079== acc-ratios:  1.36 rd, 0.98 wr  (2,651,994 b-read, 1,908,158 b-written)
==22079==    at 0x4C2DECF: malloc (in /usr/lib/valgrind/vgpreload_exp-dhat-amd64-linux.so)
==22079==    by 0x24C496: onig_new_with_source (re.c:844)
==22079==    by 0x24C496: make_regexp (re.c:874)
==22079==    by 0x24C496: rb_reg_initialize (re.c:2858)
==22079==    by 0x24C496: rb_reg_initialize_str (re.c:2892)
==22079==    by 0x24C496: rb_reg_compile (re.c:2982)
==22079==    by 0x12EB84: rb_parser_reg_compile (parse.y:12185)
==22079==    by 0x12EB84: parser_reg_compile (parse.y:12179)
==22079==    by 0x12EB84: reg_compile (parse.y:12195)
==22079==    by 0x2147E3: new_regexp (parse.y:10101)
==22079==    by 0x2147E3: ruby_yyparse (parse.y:4419)
==22079==    by 0x2161F7: yycompile0 (parse.y:5942)
==22079==    by 0x3241FF: rb_suppress_tracing (vm_trace.c:427)
==22079==    by 0x1FDBF6: yycompile (parse.y:5991)
==22079==    by 0x1FDBF6: rb_parser_compile_file_path (parse.y:6130)
==22079==    by 0x27AC96: load_file_internal (ruby.c:2034)
==22079==    by 0x137730: rb_ensure (eval.c:1129)
==22079==    by 0x27CEEA: load_file (ruby.c:2153)
==22079==    by 0x27CEEA: rb_parser_load_file (ruby.c:2175)
==22079==    by 0x1954CE: load_iseq_eval (load.c:587)
==22079==    by 0x1954CE: rb_load_internal (load.c:651)
==22079==    by 0x1954CE: rb_f_load (load.c:709)
==22079==    by 0x2FB957: vm_call_cfunc_with_frame (vm_insnhelper.c:2468)
==22079==    by 0x2FB957: vm_call_cfunc (vm_insnhelper.c:2493)

Digging a little further and remembering some context of previous oniguruma memory investigation I remembered the pattern buffer struct has a compile buffer with a simple watermark for tracking used space. This changeset implements reg_resize (static as ary_resize) which attempts to right size the compile buffer if over allocated at the following sites:

  • After compiling a literal regular expression.
  • Implement an explicit type specific rb_reg_freeze and point Regexp#compile to it
  • I also follow the chain member which points to another regex_t on the struct if present, but have not been able to find references to it in the source tree other than for freeing a regex or inspecting it's memory footprint.

I introduced 2 new debug counters, which yields the following results on booting Redmine on Rails 5:

[RUBY_DEBUG_COUNTER]    obj_regexp_lit_extracapa                    6319
[RUBY_DEBUG_COUNTER]    obj_regexp_lit_extracapa_bytes            301685

About 300kb reallocated across 6319 oversized instances.

An example of Regexp#freeze

irb(main):007:0> r = Regexp.compile("(?!%\h\h|[!$-&(-;=?-_a-~]).")
irb(main):008:0> ObjectSpace.memsize_of(r)
=> 588
irb(main):009:0> r.freeze
=> /(?!%hh|[!$-&(-;=?-_a-~])./
irb(main):010:0> ObjectSpace.memsize_of(r)
=> 543

There is likely more layers that can be peeled back here, but keeping it simple and concise for review.

@shyouhei @byroot thoughts?

@shyouhei
Copy link
Member

No strong opinion (== very well written).

@byroot
Copy link
Member

byroot commented Nov 27, 2019

Nice writeup, a few semi-educated reactions:

  • AFAIK there's no way to mutate the regexp itself (only set instance variables on the wrapping object), so the right sizing could also happen on Regexp.new.
  • .freeze is almost never called on regexes, since they are mostly immutable already. So that specialized freeze is unlikely to yield most results.

@methodmissing
Copy link
Contributor Author

ACK, I'll look into Regexp.new, thx Jean

@methodmissing
Copy link
Contributor Author

@byroot pushed 3cb5bf0 as per your guidance on regular expressions being immutable anyways. And ousted Regexp#freeze

Booting redmine:

[RUBY_DEBUG_COUNTER]	obj_regexp_lit_extracapa      	          7249
[RUBY_DEBUG_COUNTER]	obj_regexp_lit_extracapa_bytes	        481204

50% improvement on previous numbers - about 469kb of excess compile buffer trimmed.

@byroot
Copy link
Member

byroot commented Nov 27, 2019

nice !

@methodmissing methodmissing changed the title Right size regular expression compile buffers for literal regexes and on Regexp#freeze Right size regular expression compile buffers Nov 28, 2019
re.c Outdated
@@ -2954,7 +2954,7 @@ static void
reg_resize(regex_t *reg)
{
if (reg->alloc > reg->used) {
unsigned char *new_ptr = xrealloc(reg->p, reg->used);
unsigned char *new_ptr = ruby_sized_xrealloc(reg->p, reg->used, reg->capa);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use ruby_sized_xrealloc in this file.
reg->p is allocated in onig_bbuf_init where xmalloc is not ruby_xmalloc.

@nobu
Copy link
Member

nobu commented Dec 26, 2019

reg_resize looks a function which should be in regcomp.c, not re.c.

@methodmissing
Copy link
Contributor Author

methodmissing commented Dec 26, 2019

In e07d9b6 migrated the resize function as onig_reg_resize to regcomp.c and this beautifully cleaned up any references in re.c with a single callsite on compilation.

Debug counters on a recent Redmine install:

[RUBY_DEBUG_COUNTER]	obj_regexp_ptr                	           367
[RUBY_DEBUG_COUNTER]	obj_regexp_lit_extracapa      	          7405
[RUBY_DEBUG_COUNTER]	obj_regexp_lit_extracapa_bytes	        497266

Thank you for the always constructive feedback. Is there any value in an upstream oniguruma patch as it currently isn't just Ruby specific? I don't know how the workflow works, but can investigate if there's value beyond this patch.

@nobu
Copy link
Member

nobu commented Dec 27, 2019

It feels too invasive to access Ruby's debug counters inside onigmo, for me.

@methodmissing
Copy link
Contributor Author

Totally makes sense - felt like jumping through too many hoops to get in there as well - decoupled and removed in 03f35de

regcomp.c Outdated
reg->p = new_ptr;
}
}
if (reg->chain) onig_reg_resize(reg->chain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect all compilers to optimize tail-calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'd expect gcc and clang to with opt level 3, but we cannot rely on that. Or the behavior of other compilers. I replaced the recursive call with a goto instead in 1ff3033

@methodmissing
Copy link
Contributor Author

@nobu anything else left to do here? I squashed to 1 commit and addressed the suggestion about not assuming tail call optimization.

@nobu nobu merged commit 9a17437 into ruby:master Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants