Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex does not handle quoted right bracket correctly within character set #18034

Closed
kanaka opened this issue Oct 14, 2014 · 1 comment · Fixed by #18282
Closed

regex does not handle quoted right bracket correctly within character set #18034

kanaka opened this issue Oct 14, 2014 · 1 comment · Fixed by #18282

Comments

@kanaka
Copy link

kanaka commented Oct 14, 2014

$ rustc --version
rustc 0.13.0-nightly (1c3ddd297 2014-10-13 23:27:46 +0000)

Here is my test code:

#![feature(phase)]
#[phase(plugin)]
extern crate regex_macros;
extern crate regex;

fn main() {
    let re = regex!(r#"([AB])"#);
    println!("\n1 {}", re);
    for cap in re.captures_iter("AABBAB") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\x{5b}])"#);
    println!("\n2 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\[])"#);
    println!("\n3 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\x{5d}])"#);
    println!("\n4 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\]])"#);
    println!("\n5 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\x{5b}\x{5d}])"#);
    println!("\n6 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\x{5d}\x{5b}])"#);
    println!("\n7 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\[\]])"#);
    println!("\n8 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"([\]\[])"#);
    println!("\n9 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }

    let re = regex!(r#"\[|\]"#);
    println!("\n10 {}", re);
    for cap in re.captures_iter("[[]][]") {
        println!("0: {}, 1: {}", cap.at(0), cap.at(1));
    }
}

Results:

1 ([AB])
0: A, 1: A
0: A, 1: A
0: B, 1: B
0: B, 1: B
0: A, 1: A
0: B, 1: B

2 ([\x{5b}])
0: [, 1: [
0: [, 1: [
0: [, 1: [

3 ([\[])
0: [, 1: [
0: [, 1: [
0: [, 1: [

4 ([\x{5d}])
0: ], 1: ]
0: ], 1: ]
0: ], 1: ]

5 ([\]])
0: ], 1: ]
0: ], 1: ]
0: ], 1: ]

6 ([\x{5b}\x{5d}])
0: [], 1: []
0: [], 1: []

7 ([\x{5d}\x{5b}])
0: [], 1: []
0: [], 1: []

8 ([\[\]])
0: [], 1: []
0: [], 1: []

9 ([\]\[])
0: [], 1: []
0: [], 1: []

10 \[|\]
0: [, 1: 
0: [, 1: 
0: ], 1: 
0: ], 1: 
0: [, 1: 
0: ], 1: 

I expect case 6, 7, 8, and 9 to behave like 1 and 10. It looks like something about having both a left and right quoted bracket within a character set is causing the regex engine to treat the character set as
a literal "[]" that matches a single time (even though that sequence appears twice in the text).

The same behavior happens with Regex::new instead of regex! macro.

@alexcrichton
Copy link
Member

cc @BurntSushi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants