Skip to content

seanyoung/lrpeg

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
doc
 
 
 
 
 
 
 
 
 
 
 
 
 
 

crates.io CI license

DEPRECATED

The peg crate now supports left recursion using the #[cache_left_rec] attribute. This crate is much faster, has more features and is just more polished than this crate. So I recommend using that crate instead.

Left Recursive Parsing Expression Grammar (PEG)

lrpeg allows left recursive rules, and uses ratpack parsing for speed. I wrote a blog post to introduce the ideas of lrpeg.

The existing PEG parser generators for rust do not allow left recursion, which makes it very awkward to write grammars. It is possible to write a PEG parser generator which allows for left recursion, just as python now uses.

See IRP Grammar for a complete lrpeg grammar and handling for IRP.

How to use lrpeg

Add lrpeg to your Cargo.toml in build-dependencies:

[build-dependencies]
lrpeg = "0.4"

[dependencies]
regex = "1"
unicode-xid = "0.2"

Now add a build.rs to the root of your project, containing:

use std::env;
use std::path::PathBuf;

fn main() {
    let out_dir = env::var("OUT_DIR").unwrap();
    lrpeg::process_files(&PathBuf::from("src"), &PathBuf::from(out_dir));
}

Write your peg grammar, and put it in a file which ends with .peg, for example src/calculator.peg:

calculator <- expr EOI;

expr <- expr ("+" / "-") WHITESPACE term
    / term;

term <- term ("*" / "/" / "%") WHITESPACE  factor
    / factor;

factor <- "(" WHITESPACE  expr ")" WHITESPACE
    / num;

num <- re#[0-9]+# WHITESPACE;

When your run cargo build, calculator.rs will be generated into your target/... directory. You need to include this module in your project, and then you can instantiate the PEG like so:

include!(concat!(env!("OUT_DIR"), "/calculator.rs"));

use calculator::{Node, Rule};

fn main() {
    let mut parser = calculator::PEG::new();

    let args: Vec<String> = std::env::args().collect();

    if args.len() != 2 {
        eprintln!("Usage {} EXPRESSION", &args[0]);
        std::process::exit(2);
    }

    let input = &args[1];

    println!("parsing: {}", input);

    match parser.parse(input) {
        Ok(node) => {
            fn walk(node: &Node, input: &str) -> u64 {
                match node.rule {
                    Rule::num => u64::from_str_radix(node.children[0].as_str(input), 10).unwrap(),
                    Rule::expr => {
                        if node.alternative == Some(0) {
                            let left = walk(&node.children[0], input);
                            let right = walk(&node.children[3], input);

                            match node.children[1].as_str(input) {
                                "+" => left + right,
                                "-" => left - right,
                                _ => unreachable!(),
                            }
                        } else {
                            walk(&node.children[0], input)
                        }
                    }
                    Rule::term => {
                        if node.alternative == Some(0) {
                            let left = walk(&node.children[0], input);
                            let right = walk(&node.children[3], input);

                            match node.children[1].as_str(input) {
                                "*" => left * right,
                                "/" => left / right,
                                "%" => left % right,
                                _ => unreachable!(),
                            }
                        } else {
                            walk(&node.children[0], input)
                        }
                    }
                    Rule::factor => {
                        if node.alternative == Some(0) {
                            walk(&node.children[2], input)
                        } else {
                            walk(&node.children[0], input)
                        }
                    }
                    Rule::calculator => walk(&node.children[0], input),
                    _ => {
                        unreachable!()
                    }
                }
            }

            println!("result: {}", walk(&node, input));
        }
        Err((line_no, col_no)) => {
            eprintln!("parser error at {}:{}", line_no, col_no);
        }
    }
}

This example is available in lrpeg-example.

How to write grammar

PEG grammars are a set of rules. In lrpeg, each rule must end with a ";". Parsing starts at the top rule.

  • Each rule must start with an identifier, followed by <- and then the terms and terminated by an optional ;
  • A term can be repeated with *, + or optional ?.
  • The list of terms cannot be empty (but a term can be optional)
  • A text literal can be encoded in single or double quotes ("foo" or 'bar')
  • A regex must be written as re#[0-9]+#.
  • Alternatives are denoted with /. For example foo <- "a" / "b";
  • There are special terms WHITESPACE, EOI, and XID_IDENTIFIER (for unicode identifiers)

How lrpeg is bootstrapped

The file src/peg.peg contains the grammar for the peg itself. To generate the parser, run cargo run src/peg.peg > src/peg.rs.new and then mv src/peg.rs.new src/peg.rs.

TODO

  • More tests
  • Better parse error information (now only the line and column is returned)
  • Better documentation
  • Detect unreachable alternatives

About

Left Recursive PEG for rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages