Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairs can be made with mismatched input str and Vec<QueueableToken> using pest::state #993

Closed
djkoloski opened this issue Mar 13, 2024 · 1 comment · Fixed by #998
Closed

Comments

@djkoloski
Copy link
Contributor

In pest::state, after parsing the input using the provided closure, the result is used to construct a Pairs with the original input string, but the returned Vec<QueueableToken>. This can be misused to create a Position which does not correspond to a valid unicode codepoint boundary in the input string it holds a reference to.

Note that this does not currently allow for undefined behavior, but only because the safe boundary-checked str indexing is used. The code has a number of unsafe functions and unsafe blocks which claim that this invariant is never violated, which could lead to an exploit if it is relied on.

One way to fix this would be to drop the invariant that the token positions correspond to valid codepoint boundaries in the input string.

To Reproduce

use pest::ParserState;

fn main() {
    let input_1 = "❤️";
    let input_2 = "hello world";

    let pairs = pest::state::<(), _>(
        input_1,
        |s| {
            drop(s);
            let state = ParserState::new(input_2);
            state.rule((), |state| state.match_char_by(|c| c.is_ascii()))
        },
    ).unwrap();

    println!("{:?}", pairs);
}
$ cargo run --release
thread 'main' panicked at $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pest-2.7.8/src/span.rs:212:20:
byte index 1 is not a char boundary; it is inside '❤' (bytes 0..3) of `❤️`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This panics because the state we returned had a different input string than the one we called pest::state with.

Expected behavior
It should not be possible to create a Pairs with mismatched input string and QueueableTokens.

Additional context
Discovered during Fuchsia unsafe review at https://fxrev.dev/1003433.

@djkoloski djkoloski added the bug label Mar 13, 2024
@tomtau
Copy link
Contributor

tomtau commented Mar 13, 2024

@djkoloski thanks for the report! Generally, "state" is only called by the proc macro-generated code, so most library users wouldn't be able to make this mistake and the correctness of code generation is checked by a fairly extensive test suite and fuzz tests.
If your suggested fix is non-breaking (in terms of semver and existing behaviour), feel free to open a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants