-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalizing and augmenting PEGs #47
Comments
Wow, that's 3 wishes, 3 tickets. :-)
|
grammar = Grammar("""
foo = "bar" / "baz" / quux
quux = "(" spam* ")"""",
spam = Rule(lambda source: ...)
) And in case of a visitor: class MyLanguage(Language):
@rule('"bar" / "baz" / quux')
def foo(self, node, children):
...
@rule(lambda source: ...)
def spam(self, node, children)
...
|
I like the Now that I think about it, Parsimonious already has a logical extension point for custom expression code: just make something that implements the Expression object's interface. We could even promote simple lambdas into Expressions transparently, though the lambdas would need a few more params than in your sketch: grammar = Grammar("""
foo = "bar" / "baz" / quux
quux = "(" spam* ")"
""",
spam=lambda text, pos, cache, error: return Node(...)) We'd probably also want to pass the Grammar itself into the lambda so it could call out to other rules if it liked. A natural way to do this might be to make the lambda appear to be an instance method, receiving the Grammar as an initial |
👍 |
Here's a little teaser of tonight's work: def expression(callable, rule_name, grammar):
"""Turn a plain callable into an Expression
The callable can be of this simple form::
def foo(text, pos):
# end_pos = the offset in `text` where I stop matching
# children = child nodes, if any
if the expression matched:
return end_pos, children
If there are no children, it can return just an int::
return end_pos
If the expression doesn't match at the given ``pos`` at all... ::
return None
If your callable needs to make sub-calls to other rules in the grammar, it
can take this form, adding additional arguments::
def foo(text, pos, cache, error, grammar):
# Call out to other rules:
matched = grammar['another_rule']._match(text, pos, cache, error)
...
# Return values as above.
The return value of the callable, if an int or a tuple, will be
automatically transmuted into a Node. If it returns a Node-like class
directly, it will be passed through unchanged.
"""
num_args = len(getargspec(callable).args)
if num_args == 2:
is_simple = True
elif num_args == 5:
is_simple = False
else:
raise RuntimeError("Custom rule functions must take either 2 or 5 "
"arguments, not %s." % num_args)
class AdHocExpression(Expression):
def _uncached_match(self, text, pos, cache, error):
if is_simple:
result = callable(text, pos)
else:
result = callable(text, pos, cache, error, grammar)
if isinstance((long, int), result):
end, children = result, None
elif isinstance(tuple, result):
end, children = result
else:
# Node or None
return result
return Node(self.name, text, pos, end, children=children)
def _as_rhs(self):
return '{custom function "%s"}' % callable.__name__
return AdHocExpression(name=rule_name) Want to know something disgusting? You could actually use this to add state to your parser—not just code. Since the 5-arg version of a custom rule gets passed the |
Example: grammar = Grammar("""
bracketed_digit = start digit end
start = '['
end = ']'""",
digit = lambda text, pos:
(pos + 1) if text[pos].isdigit() else None) |
@halst, take a look at the new stuff, and let me know what you think! The tests provide pretty good examples: 029cb2f#diff-23015ea821765daa0cb85fb8f8bd0665R301. I added some |
Still planning the |
The two-argument version is nice and readable. The 5-argument version is a bit awkward. But, maybe, it's just part of the essential complexity of the problem. |
I think it's unavoidable. |
Btw, the unicode vs. binary ticket is #31. |
I wish to parse (1) binary data using PEGs, or (2) token streams (like #40). Also, I wish to parse (3) more grammars than PEGs allow by somehow augmenting the grammar with some Python code and maybe some state, because I don't want to throw away my grammar and rewrite the parser manually as soon as I want to add a feature that is not parseable by PEGs, like, say, whitespace-sensitivity.
What do you think?
The text was updated successfully, but these errors were encountered: