-
Notifications
You must be signed in to change notification settings - Fork 485
Open
Labels
Description
What version of regex are you using?
v1.12.2
Describe the bug at a high level.
Compiling r"^[^/]+/foo/[^/]+$" takes 2.5x as long with unicode turned on then with it turned off
What are the steps to reproduce the behavior?
fn main() {
for i in 0..100000 {
let re = regex::bytes::RegexBuilder::new(r"^[^/]+/foo/[^/]+$")
.unicode(true)
.build().unwrap();
assert!(re.is_match(b"bar/foo/baz"));
}
}Here's a profile with unicode(false) 931ms:
https://share.firefox.dev/47GnkMp
and a profile with unicode(true) 2.4s:
https://share.firefox.dev/4i6f5wO
The difference that stands out most is in regex_automata::nfa::thompson::compiler::Utf8Compiler::new
where I believe we're initializing a 10,000 element Vec.
This difference was discovered when investigating https://bugzilla.mozilla.org/show_bug.cgi?id=1983632