Skip to content

Compiling some unicode regexes is 2.5x slower #1316

@jrmuizel

Description

@jrmuizel

What version of regex are you using?

v1.12.2

Describe the bug at a high level.

Compiling r"^[^/]+/foo/[^/]+$" takes 2.5x as long with unicode turned on then with it turned off

What are the steps to reproduce the behavior?

fn main() {
    for i in 0..100000 {

        let re = regex::bytes::RegexBuilder::new(r"^[^/]+/foo/[^/]+$")
            .unicode(true)
            .build().unwrap();
        assert!(re.is_match(b"bar/foo/baz"));
        
    }
}

Here's a profile with unicode(false) 931ms:
https://share.firefox.dev/47GnkMp
and a profile with unicode(true) 2.4s:
https://share.firefox.dev/4i6f5wO

The difference that stands out most is in regex_automata::nfa::thompson::compiler::Utf8Compiler::new
where I believe we're initializing a 10,000 element Vec.

This difference was discovered when investigating https://bugzilla.mozilla.org/show_bug.cgi?id=1983632

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions