Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Dynamic "once" regexps are not as atomic as in MRI #2798
The test_once_multithread test in MRI's ruby/test_regexp.rb tries to guarantee that the contents of a dynamic "once" regexp (e.g.
My fix just puts atomic guarantees around the update of the cache.
I have filed https://bugs.ruby-lang.org/issues/11026 to clarify with ruby-core how atomic a dynamic "once" regexp should actually be.
After some discussion with @enebo, we figured out that
And once this has run once, MRI additionally flushes all the iseqs for the regexp and just leaves the resulting object in the iseq.
So, unless that changes in MRI, we need to make the same guarantee.
We also discovered that IR currently will evaluate all the pieces of the dregexp every time it is encountered, which is at best a performance problem and at worst a severe semantic difference with MRI. So this needs to be fixed.
Marking for 126.96.36.199. Hopefully we can get it in.
* Ensure only one regexp is ever cached for //o This is done using an AtomicReference in the compiled method for JVM6 and a field + atomic updater in the indy call site. In both cases, we may end up evaluating the operands twice, and the code that produced them may still run after caching (a bug, #2798), but we will at least guarantee to return exactly one regexp. * Add non-boxed paths to construct dregexp with up to 5 elements. * Add a ThreadContext-local Encoding to use for encoding negotiation when preprocessing the dregexp elements. * If, at JIT time, a once-dregexp has already been encountered and cached in the instr, just emit that regexp directly into the bytecode. This new logic is faster than what we had before, likely because the locking I put in place for JVM6 was preventing the JVM from jitting (punted out with "COMPILE SKIPPED: invalid parsing" due to a flaw in my code). This new logic is lighter-weight and JITs fine. Given the benchmark from #3735: 9.0.5: 3.87s 9.1: 0.70s 1.7.24: 0.72s
I pushed 256e753, which removes the locking I had in the JVM6 jit and replaces it with an atomic reference. The evaluation of the dregexp/o operands may run in two threads at the same time, but that can't be fixed right now until IR wraps the entire dregexp/o in a locking mechanism as discussed above. So for now, we guarantee the dregexp/o will only ever return one value.
I'm actually going to call this fixed. It's a grey area, to be sure, and the hassle to guarantee once-only evaluation of the
If someone runs into this being a problem, they can file a bug and we can debate it with them and MRI.