Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON example. Some questions and thoughts #14

Closed
reclosedev opened this issue Mar 22, 2013 · 3 comments
Closed

JSON example. Some questions and thoughts #14

reclosedev opened this issue Mar 22, 2013 · 3 comments

Comments

@reclosedev
Copy link

I was interested in parsimonius because of readable grammars and separation of concerns, but didn't find any examples (except rule_syntax), so I've tried to write a simple JSON parser with demo and benchmark. https://gist.github.com/reclosedev/5222560

I'm not sure that I've used correct way to express grammar. For example, comma separated values and members. This grammar allows comma after the last member/value (JSON doesn't). How should it be written?

Can we mark some term or rule as excluded from tree? Example: whitespace, braces, commas. It would allow to reuse and to simplify some visit_* methods.

Suggestion: NodeVisitor.lift_child could be more useful, if it accepted rules with more than one child, e.g.:

values = value ws? ","? ws?
def lift_child(self, node, visited_children):
    """Lift the sole child of ``node`` up to replace the node."""
    return visited_children[0]

Or it can be separate method.

I think it would be great to have more real grammar examples with benchmarks in parsimonius.

@erikrose
Copy link
Owner

I was interested in parsimonius because of readable grammars and separation of concerns, but didn't find any examples (except rule_syntax),

The best examples are probably the tests, especially in test_grammar.py.

so I've tried to write a simple JSON parser with demo and benchmark. https://gist.github.com/reclosedev/5222560

Cool! I may even have to steal this, once you finish it, for benchmark trials. There's a ton of speed work to come.

I'm not sure that I've used correct way to express grammar. For example, comma separated values and members. This grammar allows comma after the last member/value (JSON doesn't). How should it be written?

Here's (off the top of my head, approximately, untestedly, etc., etc.) how I'd structure it:

object = "{" ws? members ws? "}"
members = (member ws? "," ws?)* member
member = string ws? ":" ws? value

You might want to break it up even further, depending on your aesthetic and the details of your tree-walker implementation:

object = "{" members ws? "}"
members = member_and_comma* member
member_and_comma = member comma
comma = ws? ","
member = ws? string ws? ":" ws? value

I like that better. I moved some of the ws into member so you don't have to say "ws" so much in the higher-level rules. (I personally like to name my whitespace rule "_" because it sinks into the background visually.) In this version, the lower-level rules tend to eat leading whitespace but not trailing. Any final trailing whitespace is eaten near the end of object. This might have bugs in it, but you get the idea.

Can we mark some term or rule as excluded from tree? Example: whitespace, braces, commas. It would allow to reuse and to simplify some visit_* methods.

My intention is that unused nodes get ignored by the visitor methods, like you do in visit_object(). Feel free to say…

(_, brace, members, _, brace)

…etc.; that is, you can reuse variable names in the argument list if you don't care about capturing.

Suggestion: NodeVisitor.lift_child could be more useful, if it accepted rules with more than one child, e.g.:

values = value ws? ","? ws?

def lift_child(self, node, visited_children):
"""Lift the sole child of node up to replace the node."""
return visited_children[0]

Or it can be separate method.

Once you get your grammar refactored and working, let me know if this pattern still holds. I expect we'll discover lots of patterns to factor up as we write more grammars. I haven't ruled out adding a small number of tree transformations someplace. I especially like the ones PyPy chose: http://doc.pypy.org/en/latest/rlib.html#tree-transformations.

I think it would be great to have more real grammar examples with benchmarks in parsimonius.

You'll be happy to hear I'm writing benchmarks as we speak. :-) Then I get to do fun speed work, which is what I've been really looking forward to!

Thanks so much for trying Parsimonious!

@erikrose
Copy link
Owner

Feel free to re-open or just keep chatting.

@reclosedev
Copy link
Author

Thank you for detailed response!

I may even have to steal this, once you finish it, for benchmark trials

Feel free to use this code, also I can make pull request once it will be fixed.

I've modified example according to your suggestions. I like the second variant too.

My intention is that unused nodes get ignored by the visitor methods

I haven't ruled out adding a small number of tree transformations someplace. I especially like the ones PyPy chose: http://doc.pypy.org/en/latest/rlib.html#tree-transformations.

That's what I meant by "mark some term or rule as excluded from tree". PyPy way looks very nice.

member = [ws?] string [ws?] [":"] value
def visit_member(self, node, (name, value)):  # no more noise in visitors

Also, with this transformations the need of lift_first_child() disappears.

You'll be happy to hear I'm writing benchmarks as we speak. :-) Then I get to do fun speed work, which is what I've been really looking forward to!

👍


One more thing (maybe it should be in separate issue).
There is no tuple parameter unpacking in Python 3 pep-3113. Solution is to use variable-length arguments by unpacking it in visit(). Also it saves from typing extra parentheses in visitors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants