Skip to content

Performance regression in case-insensitive regexes #572

@RReverser

Description

@RReverser

This sample benchmark show ~24% regression between versions 1.1.2 and 1.1.3 on my machine:

#[bench]
fn bench(b: &mut test::Bencher) {
    lazy_static::lazy_static! {
        static ref RE: Regex = Regex::new(r#"(?i)googlebot/\d+\.\d+"#).unwrap();
    }

    let re = black_box(&*RE);
    let input = black_box("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36");

    b.iter(move || {
        black_box(re.is_match(input));
    });
}

According to the changelog, this corresponds to aho-corasick upgrade in #566, which might be the reason.

Note that I've also tried removing \d+\.\d+ part, keeping just (?i)googlebot - still same regression.

However, removing (?i) makes both almost equally fast, so I suspect this is something specific to case-insensitive handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions