Skip to content

Custom Rules for Variable length Arguments

R. Bernstein edited this page Jan 21, 2017 · 1 revision

The Earley grammar parser is very general and can tolerate all kinds of ambiguous grammars. But the thrust since very early on has been not to feed it lots of grammars that are ambiguous, or else parsing will take a lot of time as a result of the ambiguity. And this is where the varargs custom rules comes into place.

We can create a general rule like:

map_expr ::= expr+ BUILD_CONST_KEY_MAP

and I think this will work. However, in doing this, we have to keep all the intermediate states around when parsing expr+ for all of the kinds expr that could appear before BUILD_CONST_KEYMAP. So if we see

expr expr expr BUILD_CONST_KEY_MAP

this could be parsed as any of:

 (expr expr expr BUILD_CONST_KEY_MAP)
 expr (expr expr BUILD_CONST_KEY_MAP)
 expr expr (expr BUILD_CONST_KEY_MAP)

To reduce the ambiguity here, what is done for this kind of variable argument is to change BUILD_CONST_KEYMAP to BUILD_CONST_KEY_MAP_3 in the token ingestion phase, and add a custom rule:

mapexpr ::= expr expr expr BUILD_CONST_KEYMAP_3

With this, note that the parser won't consider either of these

 expr (expr expr BUILD_CONST_KEYMAP_3)
 expr expr (expr BUILD_CONST_KEYMAP_3)

With the custom rule that was in there before, it wasn't as bad as using

       map_expr ::= expr+ BUILD_CONST_KEY_MAP

a priori, unless there happens to be in the code a one-item, a two-item, and a 3 item key map. Then it would be just as bad.

I'll note that I haven't done extensive tests here, so I could be wrong. I am just following the pattern that were there before, and this is my reading of why it is the way it is.