Tree Construction

Erez Shinan edited this page Apr 21, 2018 · 7 revisions

Lark builds a tree automatically based on the structure of the grammar. It follows the following rules:

  • Each rule is a branch (node) in the tree, and its children are its matches, in the order of matching.

  • Rules can be expanded (inlined). See "Shaping the tree" below.

  • Inside rules, using item+ or item* will result in a list of items.

  • Terminals (tokens) are always values in the tree, never branches.

  • Terminals that won't appear in the tree are:

    • Unnamed literals (like "keyword" or "+")
    • Terminals whose name starts with an underscore (like _DIGIT)
  • Terminals that will appear in the tree are:

    • Unnamed regular expressions (like /[0-9]/)
    • Named terminals whose name starts with a letter (like DIGIT)

The resulting parse-tree (when unshaped) is a direct equivalent of a classical parse-tree. Applying a Transformer to it is equivalent to providing a callback to the parser.

Example:

    expr: "(" expr ")"
        | NAME+

    NAME: /\w+/

    %ignore " "

Lark will parse "((hello world))" as:

expr
    expr
        expr
            "hello"
            "world"

The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.

However, it's possible to keep all the tokens of a rule, by prefixing it with !:

    !expr: "(" expr ")"
         | NAME+
    NAME: /\w+/
    %ignore " "

Will parse "((hello world))" as:

expr
  (
  expr
    (
    expr
      hello
      world
    )
  )

Shaping the tree

  1. Rules whose name begins with an underscore will be inlined into their containing rule.

Example:

    start: "(" _greet ")"
    _greet: /\w+/ /\w+/

Lark will parse "(hello world)" as:

start
    "hello"
    "world"
  1. Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child.

Example:

    start: greet greet
    ?greet: "(" /\w+/ ")"
          | /\w+/ /\w+/

Lark will parse "hello world (planet)" as:

start
    greet
        "hello"
        "world"
    "planet"
  1. Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered).

  2. Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option.

Example:

    start: greet greet
    greet: "hello" -> hello
         | "world"

Lark will parse "hello world" as:

start
    hello
    greet
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.