Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Very slow regex #2013
takes 6 seconds
I didn't really know what to expect, but something faster than this at least :)
When I tried to see if I could figure out the cause myself, I got a segfault:
Reproduced both the slowness and the profiler segfault on
FWIW, I'd expect that construct to be much slower than regular regex: you're executing a piece of code at runtime that returns a value to be treated as a regex, which then compiles a regex that matches an alternation with 255 alternatives. Also, you're doing that 4 times (the compiler doesn't know what sort of code you're executing, so it doesn't cache it or the result of converting it to a regex).
This version is a lot faster:
Some gdb output.
It succeeds with MVM_SPESH_INLINE_DISABLE=1, but not with MVM_SPESH_OSR_DISABLE=1, nor with MVM_JIT_EXPR_DISABLE=1.
To make matters worse, the
And we repeat this 4 times. I think we do cache the compilation somewhat for the next 3 times, but it clearly doesn't help a lot.
Things we could do are:
We might also detect the case when we have values that would just compile into a literal. If that's all we have, then sort them by length descending and pick the first one at the current offset that we match. This avoids the whole NFA and regex evaluation machinery and would be a significant speedup for this case.
The thing is, that's what writing