Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An alternative to the Pair API #416

Open
Nadrieril opened this issue Sep 20, 2019 · 9 comments
Open

An alternative to the Pair API #416

Nadrieril opened this issue Sep 20, 2019 · 9 comments

Comments

@Nadrieril
Copy link
Contributor

Nadrieril commented Sep 20, 2019

I've been working on a project with a fairly large grammar for a while now, and in the process I've had to write and maintain code to convert the output of pest parsing to the actual AST I need in my project. I started out with a bunch of ad-hoc macros to help with dealing with the Pair API, but after a lot of iteration I got to a nice design that I thought could be of use for others.
(I did have a look at pest-ast, but I couldn't find basic documentation, and it was too rigid for my use-case)

So I made it into a crate, called pest_consume. It uses procedural macros and works on stable Rust. It is flexible, and makes code that is IMHO easy to write, read, and update when the grammar changes.

I thought I'd advertise it here because I feel it neatly complements pest to make writing parsers a great experience. Also, since there has been discussion around changing the pest API for pest3, I propose this design as something that has worked for me on a large project.

I'd be excited to hear what you guys think about it, if the docs are clear enough and if you think you may want to use it in your project ! If the pest devs like it, maybe this crate could be mentioned in the pest docs ?

@lwandrebeck
Copy link
Contributor

Hi,

I was just beginning to rewrite the parsing loop of my pet project to make it less awful when you announced pest_consume, which looks really sweet.
I will give it a try ASAP, and probably ask you a couple questions ;)

Thanks for it, doc is really awesome ! And I feel I’ll have a cleaner code than by using Pair API (no offense, @dragostis ;)

@lwandrebeck
Copy link
Contributor

Hi,
Seeing #418 , have you done any bench vs Pair api ?

PS1: I have begun to use pair_consume, but I’m stuck for a couple days until 1) I have some spare time to work again on it 2) I can get the beast to compile.
PS2: Do you plan to get on pest gitter ? I’d be nice to interact :)

@Nadrieril
Copy link
Contributor Author

I haven't done any benchmarks, mostly because I have no idea how to do them correctly. I'm not expecting significant differences in performance however since in most cases the produced code is essentially what you would have written by hand with the Pair API

@Timmmm
Copy link

Timmmm commented Nov 10, 2019

This is sorely needed! I don't really get why Rules are all untyped when Pest has all the type information available and is a code generator! As far as I can tell Tree Sitter works in the same way as Pest - it just gives you an untyped DOM-style tree of nodes. But Nom actually gives you the final Rust output. Once you have written your parser and told Nom to parse it there's no extra "oh, you still have to parse it yourself though" step.

Consider the CSV example:

field = { (ASCII_DIGIT | "." | "-")+ }
record = { field ~ ("," ~ field)* }
file = { SOI ~ (record ~ ("\r\n" | "\n"))* ~ EOI }

Pest gives you something like this.

Rule ("file")
  Rule ("record")
    Rule ("field")
    Rule ("field")
  Rule ("record")
    Rule ("field")
    Rule ("field")
...

The rule variants are actually native rust enums but in practice it makes zero difference to my coding (unfortunately code completion doesn't work) because they are all the same!

enum Rule {
  field,
  file,
  record,
}

impl Rule {
   fn into_inner(...) -> Rule {}
}

Why can't it do this instead?

struct Field {
  span: Span,
}
struct Record {
   span: Span,
   field_0: Field,
   field_1: Vec<Field>,
}
struct File {
   span: Span,
   record: Vec<Record>,
}

Another example, INI:

char = { ASCII_ALPHANUMERIC | "." | "_" | "/" }
name = { char+ }
value = { char* }
section = { "[" ~ name ~ "]" }
property = { name ~ "=" ~ value }
file = {
    SOI ~
    ((section | property)? ~ NEWLINE)* ~
    EOI
}

Would become

struct Char {
  span: Span,
}
struct Name {
  span: Span,
  chars: Vec<Char>,
}
struct Value {
  span: Span,
  chars: Vec<Char>,
}
struct Section {
  span: Span,
  name: Name,
}
struct Property {
  span: Span,
  name: Name,
  value: Value,
}
struct File {
  span: Span,
  section_or_property: Vec<SectionOrProperty>,
}
enum SectionOrProperty {
  Section(Section),
  Property(Property),
}

I suppose you may want to avoid creating a Vec<Char> somehow. Perhaps it could be done lazily, or is there some way of making a rule "silent"? In Tree Sitter you prepend _ to the name.

@birkenfeld
Copy link
Contributor

It's not only the Vec<char>, you're building up Vecs and nested structs everywhere in your solution. This is probably fine if you intend to build up an AST-like node tree anyway, but for other parsing jobs any such allocation is unneeded and wasteful.

I'm not saying that Pair is the best possible API for this kind of parsing, but this isn't it.

@Timmmm
Copy link

Timmmm commented Nov 10, 2019

This is probably fine if you intend to build up an AST-like node tree anyway

Exactly! I think this should be optional because sometimes you want extreme speed and you don't to store a full AST. But in the vast majority of cases you do. In the examples in the book, the INI, JSON and J language all do this. Only the CSV parser avoids it.

@Nadrieril
Copy link
Contributor Author

What you propose could use a custom "container" type like Pairs, but typed. It would construct the children on-demand when you iterate it instead of pre-allocating it. Then I think the performance and memory overhead should be comparable to the Pair API

@Timmmm
Copy link

Timmmm commented Nov 11, 2019

Sounds good! By the way I found someone has articulated this issue much better than me.

@CAD97
Copy link
Contributor

CAD97 commented Nov 11, 2019

pest-ast is an experimental library to add a typed conversion on top of pest proper. I'll also note that tree libraries like Rowan are pursuing an untyped tree with a typed API provided over it.

Ultimately, pestv3 is planned to provide a listener API as well as integrate something like pest-ast as a first class citizen to produce typed trees from that listener API. Unfortunately, pestv3 progress is stalled as dragostis hasn't had enough time to work on it recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants