Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't match ANY. #500

Closed
shi-yan opened this issue Mar 16, 2021 · 3 comments
Closed

can't match ANY. #500

shi-yan opened this issue Mar 16, 2021 · 3 comments
Labels

Comments

@shi-yan
Copy link

shi-yan commented Mar 16, 2021

trying to create a parser for plantuml, ANY doesn't match alphabets,

https://plantuml.com/sequence-diagram

sample code:
test.pest

WHITESPACE = _{ " " | "\t" }
startuml_sequence = {"@startuml" ~ "sequence" ~ NEWLINE}
enduml = {"@enduml"}
sequence_content = {ANY+ ~ NEWLINE}
sequence = {SOI ~ startuml_sequence ~ sequence_content ~ enduml ~ EOI}

main.rs

extern crate pest;
#[macro_use]
extern crate pest_derive;

use pest::Parser;

#[derive(Parser)]
#[grammar = "test.pest"]
struct IdentParser;

fn main() {
    let pairs = IdentParser::parse(Rule::sequence, "@startuml sequence
    test
    @enduml").unwrap_or_else(|e| panic!("{}", e));

    for pair in pairs {
        println!("Rule:    {:?}", pair.as_rule());
        println!("Span:    {:?}", pair.as_span());
        println!("Text:    {}", pair.as_str());

        for inner_pair in pair.into_inner() {
            println!("Rule:    {:?}", inner_pair.as_rule());
        }
    }
}

output

thread 'main' panicked at ' --> 2:5
  |
2 |     test␊
  |     ^---
  |
  = expected sequence_content', src/main.rs:14:34

if I change ANY to ASCII_ALPHA, it would work. I don't understand. looks like a bug to me.

@CAD97
Copy link
Contributor

CAD97 commented Mar 16, 2021

ANY+ eats the entire input, thus cannot be followed by NEWLINE (or anything else other than EOI).

PEG parsing is eager, and each match eats the maximal much, without any backtracking within a rule.

You likely want to match {everything other than NEWLINE} rather than truly ANY as the head of sequence_content.

@shi-yan
Copy link
Author

shi-yan commented Mar 16, 2021

Thank you, I figured out too. but I later tried

(!NEWLINE)+
            |
          6 | sequence_content = { (!NEWLINE)+ }␊
            |                      ^---------^
            |
            = expression inside repetition is non-progressing and will repeat infinitely

and !NEWLINE+

thread 'main' panicked at ' --> 1:1
  |
1 | @startuml sequence␊
  | ^---
  |
  = expected sequence', src/main.rs:14:34

@birkenfeld
Copy link
Contributor

See https://pest.rs/book/grammars/syntax.html#predicates - !NEWLINE does not consume anything but is like a lookahead assertion in regexes, so you need (!NEWLINE ~ ANY)+.

@CAD97 CAD97 added the question label Mar 16, 2021
@tomtau tomtau converted this issue into discussion #630 Jul 10, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

3 participants