Skip to content

Writing Semantic action rules

rocky edited this page Oct 2, 2017 · 4 revisions

The Earley Algorithm parser builds an Abstract Syntax Tree either after it fully parses the grammar or when directed by custom reduction check rules that may be specified. This is done in instance method reduce_ast().

Later in semantic analysis this tree is traversed to accomplish whatever action is needed. For example in uncompyle6 the action is to reproduce Python source code from a parse of (modified) CPython bytecode instructions. In our examples expr and expr2, the example programs, semantic analysis evaluates arithmetic expressions.

The semantic actions class is usually derived from the GenericASTTraversal class. That class provides methods for pre- and post-order tree traversal. Most of the time pre-order traversal is used. Since pre-order traversal also allows for hooks to be run after all nodes complete, it can do a more than a strict pre-order traversal.

The way to indicate a semantic action to take on an GenericASTTraversal nonterminal is to create a method in the subclass of GenericaASTTraversal and preface the nonterminal name with n_. For example to have a semantic action for nonterminal stmt, you would define method n_stmt.

After a while, you'll probably find many routines which do similar sorts of things, and soon you'll find you want a short notation to describe rules and not have to create methods at all.

For example in the Python 2 grammar, you could write for the pass statement rule:

def n_pass(self, n):
  self.write(self.indent)  # start a new line
  self.writeln('pass') # write "pass" and finish line

and do the corresponding thing for continue and break.

To reduce tedium, what's generally done instead is to put rules into a table and have C-style sprintf format specifiers to indicate these common kinds of actions to take. Doing this in the Python2 grammar we instead have rules:

TABLE_DIRECT = {
    'break_stmt':       ( '%|break\n', ),
    'continue_stmt':    ( '%|continue\n', ),
    'pass_stmt':        ( '%|pass\n', ),
    ...
    }

The %| format specifier indicates starting on a new line.

The kinds of specifiers is very much tied to how one write the grammar and what kind of activity is needed. For example in uncompyle6 I often need to copy attributes from one node over to one of its siblings when I need be able to give a deparse by program offset. That kind of operation however is not needed when deparsing the entire program, so no offset information needs to be stored.

The method template_engine in the semantic rules is where we define such table-driven rules.

See also Table-driven Semantic Actions from the uncompyle6 documentation for a more description full-blown system of table-directed semantic actions.

Clone this wiki locally