New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a spike solution for is parsed
#485
Comments
|
Missed a biggie.
Not sure how to address this one. We're not getting rid of the recursion, since that's part of the language. In a way it's good that 001 is already complex enough to have recursion; otherwise we'd just have hit it later and been thrown off by it then. I have two thoughts. One is that the parser handover will need to factor into all this somehow. Not saying those two things will interfere, but they'll need to work together. Secondly, maybe we'll end up using the state graph not for all parsing, but just for LTM parsing. Something — a hope, maybe — tells me recursion doesn't become a problem if we limit the states to handle declarative prefixes. The rest is behind a "procedural event horizon", and weird things like recursion are allowed to happen there, outside of the pure NFA world. |
|
001 requires a semicolon after a non-last blockish statement. As the grammar is specified, that I think I'm fine with that. Technically, 001 is still a strict subset of 007 (in the sense of "it allows fewer productions, but not more"). Being allowed to skip that |
Coming back to this; this shouldn't have been a big reveal, actually. If NFAs were as powerful as grammars, then grammars wouldn't have to make the declarative/procedural distinction in the first place. Specifically, NFAs (and regular expressions) famously don't parse arbitrarily nested languages, of which even the stupidly simple 001 is one, and certainly 007 and Perl 6. And basically any language with the concepts of blocks in it. But, keeping our eyes on the ball: the winning condition here isn't "NFAs parse the entire language", it's "NFAs do dispatch into (procedural) rules" plus "NFAs can be extended during the parse". Having said that, I still don't feel I have a handle on exactly how that dispatch works, and how the interaction between parsers and the rules that want to extend them should work. |
|
As I was thinking about this while falling asleep yesterday in a jet-lagged haze: there are two parsers, really. The declarative one and the procedural one.
|
|
Forgot to mention: work is now underway in a branch. Current status: almost at the point of hitting the limits of the declarative parser. I'm toying with the idea of maybe being able to create the NFA for the declarative parts from the s-expressions of the procedural parts. Execution paths of the world, unite! |
|
Here, I tried to capture the thing with the NFA dispatch and the entry points in a set of diagrams. They are messy and probably mostly wrong, but here goes. Preliminary conclusions:
The investigation continues. |
|
Trying to figure out how to compute those entry points. Here's me doing it manually:
Ok, so I think that the rule is this: (a) something else is calling the rule (another rule, or in the case of |
|
There are also three "variants" of the
I don't feel I'm very good at describing what's going on here, so I content myself with feeling parts of the elephant one at a time. |
|
It feels to me one should be able to build a "continuation graph" from a grammar; for each rule, which rules can it dispatch into. Note that the distinction of "calling down" vs "returning up" has been erased from this graph — it's something that the procedural/stateful part of the parser cares about, but not the declarative/dispatching part. Here's my attempt to compute this manually for 001:
Mentally, the algorithm feels a bit like this:
Yes, I think this can be mechanized. One thing I'm not sure of is whether it would also make sense (or be necessary, even) to split the rules up... (getting off the train, thinking of it some more) ...yes, we need to do that. It feels quite similar to CPS-transforming a bit of |
|
I think it'd be possible to put together a small proof-of-concept of this algorithm. Like a small mini-spike within the spike. |
|
There's some very neat strange consistency going on between multi-method dispatch (figuring out which method to call based on the arguments sent in) and LTM (figuring out which subrule to end up in based on a part of the text currently being parsed). Also, we could lean on this consistency by (for example) handling the |

The purpose here is to build an MVP which can demonstrate the act of
is parsedswitching parsers. It still feels like #293 or even #177 are really big steps to take — this is meant to be a smaller (if not small) step.Parser<T>with the sole methodparse(Str): T. The grammar is (informally) a specification of what the parser accepts, and what result it builds.I suggest we name this spike "Trifle", because for once, this spike is to be trifled with.
Sublanguage
Let's think up a sublanguage of 007; call it 001. It has (non-negative) integers, identifiers,
mydeclarations, infix+, infix=, expression statements, block statements,ifstatements, andwhilestatements (but no block parameters). Just large enough to be interesting to extend.Here's a (Perl 6) grammar describing 001's syntax:
(Cute, huh? Perl 6 grammars really shine here. I haven't tested this, so it's probably correct modulo whitespace.)
Although not shown here, it's implied that the parse also gives back a Qtree; same one 007 would. Error handling is not our focus here; a parse that fails simply throws a very boring exception.
Where it gets interesting
Oh, and import statements are also supported in 001. (To 007's slight annoyance.) The syntax allowed is
import * from x.y.z;— if thex.y.zissyntax.stmt.until, then anuntilstatement is injected. (If it's anything else, it's a syntax error.)Importing
syntax.stmt.untilinstalls a new statement typestatement:until, analogously tostatement:while. This is where a parser handover happens, to a new parser with this new grammar rule. The new parser stays in control until block end, at which point it hands control back to the original parser.(If this works out well, could also try the same, in different orders, with
syntax.stmt.unless.)NFAs
This next section assumes familiarity with the ideas in the regexp1 article; specifically NFAs.
The big job is to have a parser running in 007 that will (a) parse 001 and (b) be extensible. I'd rather generate this 007 code than write it by hand — and besides, there needs to be something that's dynamic enough in the code itself for (b).
So we need to generate an NFA. Does the grammar above easily translate to an NFA? Let's go through things feature by feature:
+and*). Also covered by the article. These are just some arrows back.But then there are some Perl 6-specific tricks that we need to think about separately:
+%and*%). Not so bad. For example,<statement> *% ';'is just sugar for[<statement>? [';' <statement>]*]?. Similarly for+%. (Edit: Actually, it's<statement> *%% ';', allowing for an extra;after the last statement. The desugar for that would be[<statement>? [';' <statement>]* ';'?]?.)»). You simply can't know if you have anifstatement if you've only seenifin your input; the next character can be a 0x20 space (ifstatement) or anotherf(identifier). But as soon as the NFA sees that next character, it knows to pick betweenifstatement and identifier. So we're good.::discards other alternative rules that could have matched. I'll be honest and say I'll need to implement this in order to get a feel for it. In the best case, we'll just nuke all the other "current states" except for the present one.'{' ~ '}' <statementlist>is just syntactic sugar for'{' <statementlist> '}', at least when we disregard better error messages and such luxuries.myregisters a new variable, (b) when we enter and exit blocks, and (c) when we importsyntax.stmt.unless.So it looks like we're fine expressing all this as an NFA. The side effects are bending the rules a bit; specifically, it requires that we're no longer in the declarative prefix (and so it's a kind of implicit
::to have a side effect).The text was updated successfully, but these errors were encountered: