Fix panics parsing regex with whitespace in extended mode#349
Fix panics parsing regex with whitespace in extended mode#349robinst wants to merge 1 commit intorust-lang:masterfrom
Conversation
The added tests fail without the fix like this:
---- parser::tests::ignore_space_escape_hex2 stdout ----
thread 'parser::tests::ignore_space_escape_hex2' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 10, surround: "x 5 3", kind: InvalidBase16(" 5 3") }', src/libcore/result.rs:860
---- parser::tests::ignore_space_escape_hex stdout ----
thread 'parser::tests::ignore_space_escape_hex' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 12, surround: "{ 5 3 }", kind: InvalidBase16(" 5 3") }', src/libcore/result.rs:860
---- parser::tests::ignore_space_ascii_classes stdout ----
thread 'parser::tests::ignore_space_ascii_classes' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 5, surround: "(?x)[ [ : ", kind: UnsupportedClassChar('[') }', src/libcore/result.rs:860
note: Run with `RUST_BACKTRACE=1` for a backtrace.
---- parser::tests::ignore_space_escape_octal stdout ----
thread 'parser::tests::ignore_space_escape_octal' panicked at 'valid octal number', src/libcore/option.rs:785
---- parser::tests::ignore_space_escape_unicode_name stdout ----
thread 'parser::tests::ignore_space_escape_unicode_name' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 15, surround: "Y i }", kind: UnrecognizedUnicodeClass(" Y i") }', src/libcore/result.rs:860
---- parser::tests::ignore_space_repeat_counted stdout ----
thread 'parser::tests::ignore_space_repeat_counted' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 15, surround: ", 1 0 }", kind: InvalidBase10("1 0") }', src/libcore/result.rs:860
The reason for the panics is that `bump_get` would ignore space when
walking the characters, but then keep the spaces in the returned String.
Found using cargo-fuzz.
|
The fuzz script is here (not sure if you would want to merge that or not): master...robinst:add-cargo-fuzz-script You can run it using The artifact that it returned was this: |
| } | ||
|
|
||
| #[test] | ||
| fn ignore_space_escape_octal() { |
There was a problem hiding this comment.
Seems a bit weird that it's allowed to add space between digits of a number, but that seems to be the closest to the current behavior.
|
@robinst Thanks for finding this! Sorry it slipped out of my queue, but your blog post caught my attention. :-) Nice work! I'm not sure the fix is right either. Does this also apply to thinks like |
Yes, and things like Maybe whitespace should only be allowed between logical groups of characters. For example, it should not be allowed within a number or within a text identifier. Here's what other engines do: Oniguruma: Perl behaves the same way, checked with So at least for |
|
Thinking about this a bit more, it feels like we shouldn't allow arbitrary whitespace in arbitrary syntax. Maybe things like |
Instead of ignoring space in all the bump/peek methods (as proposed in pull request rust-lang#349), have an explicit `ignore_space` method that can be used in places where space/comments should be allowed. This makes parsing a bit stricter than before as well.
Agreed. I've prepared a different pull request here: #354 |
…e-strict, r=BurntSushi Fix panics with whitespace in extended mode by being more strict Instead of ignoring space in all the bump/peek methods (as proposed in pull request #349), have an explicit `ignore_space` method that can be used in places where space/comments should be allowed. This makes parsing a bit stricter than before as well.
|
I decided to go with #354 over this one. Thanks so much! |
The added tests fail without the fix like this:
The reason for the panics is that
bump_getwould ignore space whenwalking the characters, but then keep the spaces in the returned String.
Found using cargo-fuzz.