Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Tree #19

Closed
awalterschulze opened this issue Apr 1, 2014 · 3 comments
Closed

Parse Tree #19

awalterschulze opened this issue Apr 1, 2014 · 3 comments

Comments

@awalterschulze
Copy link

Hi
We are trying to write a parser for a grammar that needs more lookahead than what LR(1) in http://code.google.com/p/gocc/ provides. I am pretty sure we will be able to Pegify the grammar, but I was wondering if we are able to access the parse tree in some way or is it easy to build an AST in a bottom up way in this PEG implementation?

We have really easy to use SDT rules in gocc to build up an AST in a bottom up way.

I have not used PEG before, but I have read an article and I am really amped :)

Please help, Thank you
Walter Schulze

@pointlander
Copy link
Owner

By default PEG stores an AST in a "flat array tree" format. Each array element is a token16 or token32:
type token16 struct { /* a matched rule, {action}, <capture> */ Rule /* tree depth is being stored in next, I should probably rename this... */ begin, end, next int16 }
as the input grows the tree is converted to a token32 array:
type token32 struct { Rule begin, end, next int32 }
This is probably over optimization. An example flat array tree:
{Child, Child, Parent, Child, Child, Parent, Grand_Parent}
The grammar that produced this tree could look like:
Grand_Parent <- Parent* !. Parent <- Child* Child <- .*
Your best bet would be to convert the flat array tree into a real tree. This could probably be done by looking at the begin, end, and depth (next) as you iterate across the flat array tree. (Take a look at Tokens() in the TokenTree interface defined in bootstrap.peg.go or peg.go) A sketch of the algorithm:

  1. grab a token
  2. allocate a node for the token
  3. if the node at the top of the stack is between begin and end add that node as a child and pop it off the stack (continue adding nodes and popping the stack until the node is not between being and end)
  4. push the allocated node onto the stack
  5. after all the tokens have been iterated through the root AST node is the only thing left on the stack

@pointlander
Copy link
Owner

An AST() function has been added.

@awalterschulze
Copy link
Author

Really cool.
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants