Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON decoder example #23

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

JSON decoder example #23

wants to merge 1 commit into from

Conversation

keleshev
Copy link
Contributor

@keleshev keleshev commented Apr 7, 2013

This is implementation of JSON decoder using parsimonious.

It's based on json spec with one exception that escape-sequences are not supported. I will most likely add them later.

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

Nice job. It's impressive how short it is. I may have to borrow this as a canonical benchmark!

@keleshev
Copy link
Contributor Author

keleshev commented Apr 8, 2013

The grammar ended up very similar to the spec (unlike parsley). But at the same time it was a bit tricky to make the decoder since it required to switch to the grammar a lot. I actually ended up developing it with rules inside each of the methods' docstrings, and moving it out when it was finished :-)

Also recently I presented parsimonious at a local Python meetup, showing how to write a simple interpreter. You might be interested to see it:

https://gist.github.com/halst/4531a03bcddab550992a

Too bad the camera's battery died while we were recording my talk, but I plan to make a screencast out of it.

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

A couple ideas:

  1. It would be almost trivial to create a sort of NodeVisitor class where a Grammar is built out of the concatenation of [parts of] the methods' docstrings. You'd lose the decoupling between grammar and visitor, but sometimes you don't need it.

  2. I'm probably going to add a few select tree transforms (based on http://doc.pypy.org/en/latest/rlib.html#tree-transformations) to the grammar so at least you can extract a child or ignore a child. For instance…

    pair = first >", "< second  # The comma gets ignored, and the visitor sees just first and second.
    
    subexpression = "(" <important_bit> ")"  # The visitor sees only important_bit in place of subexpression.
    

    Spelling is still extremely up for grabs.

    Would the tree transforms help make your need to look back and forth between the visitor and grammar go away? I definitely want to solve this problem—without killing the option of decoupling the two.

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

Say…it occurs to me that we can perfectly well extract a grammar from visitor docstrings without having to actually use that visitor to visit. :-) So there's no real disadvantage to doing that, except that you can't see the whole grammar at once without a little work. Hmm!

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

If you get around to making a screencast, I'd love to see it or even help publicize it.

@keleshev
Copy link
Contributor Author

keleshev commented Apr 8, 2013

As I can see around, this problem is usually handled by assigning names to children like parsley does:

object = ws '{' members:m ws '}' ws  # do something with `m`
pair = string:k ':' value:v  # do something with `k` and `v`

With grabbing syntax this would look like

object = ws '{' <members> ws '}' ws  # do something with `members`
pair = string >':'< value  # do something with `string` and `value`

In these cases I like grabbing better, because you don't need to come up with silly short names.

But I can imagine a problem that "naming" could handle that "grabbing" couldn't (probably?):

members = (pair:first (ws ',' pair)*:rest -> [first] + rest) | -> []

Although I'm not sure how parsley gets rest without such garbage as ws and ','.

@keleshev
Copy link
Contributor Author

keleshev commented Apr 8, 2013

Well, in ideal parallel universe where Python has real lambdas, I wold love to change this code:

class Mini(object):
    ...
    def ifelse(self, node):
        """ ifelse = ~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr """
        _, cond, _, cons, _, alt = node
        return self.eval(cons) if self.eval(cond) else self.eval(alt)

    def infix(self, node, children):
        """ infix = ~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*" """
        _, left, _, operator, _, right, _ = children
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        return operators[operator](left, right)

into something like this (in CoffeeScript syntax):

mini = Gramar({
    ...
    'ifelse': rule '~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr', -> 
        mini.eval(@expr3) if mini.eval(@expr1) else mini.eval(@expr2)

    'infix': rule '~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*"', ->
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        operators[@operator](@expr1, @expr2)
})

@keleshev
Copy link
Contributor Author

keleshev commented Apr 8, 2013

I.e. somehow avoid method signatures and tuple unpacking

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

But I can imagine a problem that "naming" could handle that "grabbing" couldn't (probably?):

This is, I suspect, where the third type of tree transformation comes in: http://doc.pypy.org/en/latest/rlib.html#nonterminal-1-nonterminal-2-nonterminal-n. (Yes, I changed the syntax in my example; don't let it confuse you.)

@erikrose
Copy link
Owner

erikrose commented Apr 8, 2013

I would nudge it in this direction:

class Mini(object):
    ...
    def ifelse(self, (_, cond, _, cons, _, alt)):
        """ ~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr """
        return self.eval(cons) if self.eval(cond) else self.eval(alt)

    def infix(self, node, (_, left, _, operator, _, right, _)):
        """ ~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*" """
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        return operators[operator](left, right)

That is, removing the duplicated rule names and doing tuple unpacking in the formal parameter list (though that's going away in Python 3 and will therefore require rethought).

This takes us pretty close to a PEG version of PLY, which is not entirely a bad thing, since (1) it's optional and (2) it doesn't strictly hurt our decoupling.

@keleshev
Copy link
Contributor Author

keleshev commented Apr 8, 2013

Yeah, I will be missing tuples in signatures. Completely opposite direction of what I want from Python :-). (E.g. CoffeeScript even allows unpacking of objects in signatures like func = (arg1, {foo: [arg2, arg3], arg4}) -> ...).

I guess what I did with Mini is the way to go in this case 😟

@erikrose
Copy link
Owner

I just finished multi-line support (#19) and am now turning my attention to benchmarking and optimizing, using your JSON decoder as a starting point. I got a real kick out of you naming the entrypoint loads. :-)

@keleshev
Copy link
Contributor Author

😀

@keleshev
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants