Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar ambiguity between {action} and {count}. #74

Closed
kevinmehall opened this issue Apr 7, 2015 · 7 comments
Closed

Grammar ambiguity between {action} and {count}. #74

kevinmehall opened this issue Apr 7, 2015 · 7 comments

Comments

@kevinmehall
Copy link
Owner

Actions that consist of returning a single integer are ambiguous with the repetition count syntax introduced by #20.

number -> u32
  = "one" {1}
  / "two" {2}
  / "three" {3}

Those should be actions, but are parsed as repeat counts.

One option that minimally solves the problem is to simply drop x {5} in favor of x {5, 5}. A bare comma is not valid Rust, but the distinction between actions and the remaining { , } forms may be hard for humans to parse.

Does anyone have ideas for a new syntax for the bounded-repeat functionality?

@kevinmehall
Copy link
Owner Author

An alternate syntax could, but doesn't have to, address #47

@buster
Copy link

buster commented May 19, 2015

I Kevin,

i have another related issue:

for an IMAP protocol parser there is that notion of a literal.
A literal is the string {X} followed by CRLF followed by X bytes.
So i would like to use X in a repition like so:

literal -> &'input str
    = "{" number "}" CRLF (CHAR8{number} { match_str })

Is there a way to do that?

@jonas-schievink
Copy link
Contributor

This is not supported, since you can't do variable repetition. But I'm sure there is a way to do this using a conditional capture (which is apparently still undocumented). I'll try to write a grammar that does this when I find time.

@jonas-schievink
Copy link
Contributor

Okay, here you go:

#![feature(plugin, str_char, collections)]
#![plugin(peg_syntax_ext)]

peg! grammar(r#"

number -> usize
    = [0-9]+ { match_str.parse().unwrap() }

pos -> usize
    = &. { start_pos }

#[pub]
literal -> &'input str
    = "{" n:number "}\r\n" start:pos ( . {?
        println!("{}", start_pos - start);
        if start_pos - start < n { Ok(()) } else { Err("") }
    } )* {? if match_str.len() - start == n { Ok(match_str) } else { Err("literal") } }

"#);

fn main() {
    println!("{:?}", grammar::literal("{8}\r\n12345678"));
}

@buster
Copy link

buster commented May 20, 2015

Thanks, that's great!
Now i had problems with the {} in the middle of a string to be parsed and also i didn't want to have the "{8}\r\n" in the output so i came up with this based on your example:

literalnumber -> usize
    = [0-9]+ { match_str.parse().unwrap() }

pos -> usize
    = &. { start_pos }

literal -> &'input str
    = "{" n:literalnumber "}\r\n" start:pos ( . {?
        println!("{}", start_pos - start);
        if start_pos - start < n { Ok(()) } else { Err("") }
    } )* {?
    let head = format!("{{{}}}\r\n", n).len();
    if match_str.len() - head == n { Ok(&match_str[head..]) } else { Err("literal") } }

This successfully parses my test strings and also removes the {}\r\n part:
"{10}\r\n1234567890"
"{1}\r\n1"
"{0}\r\n"
"{2}\r\n"""
"{4}\r\n 1"

Thanks! 👍

@Mingun
Copy link

Mingun commented Nov 12, 2016

As I see, this project inspired by pegjs. I proposed unambiguous and intuitive and clear syntax for that project for the description of number of repetitions. Besides, in syntax are supported also a separator between elements and an opportunity to take the number of repetitions from earlier parser data (thus, allowing to parse grammars like number_with_quantity_of_elements; element element element...). I hope, it will help you to solve this problem.

PS. Also I noted that in grammar the type of returned values is appropriated to rules though actually it would be more logical to appropriate it to actions, they give the typified result. If you are interested in making grammar of more logical, then it is possible to look how it was made by me for the pegjs project (at present in a repository the preliminary version and sometime I will lick it into shape. But I think, the idea shall be clear. For code generation we inference types of all rules from types of actions. For actions types need to be set obviously).

@kevinmehall
Copy link
Owner Author

I'm changing the syntax to foo*<n,m> in 0.4 to avoid the ambiguity, while reserving foo<x,y> for OMeta-like template arguments.

I like @buster and @Mingun's suggestion of variable repetition counts from a previously captured variable, and while it is not in the "context free" language class, I don't think it poses any problems for PEG. Opened #143.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants