---

# 4. Syntax-Directed Translation
**[Emil Sekerinski](http://www.cas.mcmaster.ca/~emil/), McMaster University, January 2019**

---

So far we were only concerned with accepting or rejecting the input. The goal of parsing is of course to produce eventually output, in the case of a compiler to generate machine code.

Attribute grammars attach computation to a parse tree; they were suggested by Knuth of assigning semantics to context-free languages <cite data-cite="1997494/K7M86FYQ"></cite>. An attribute grammar extends context-free grammars by
- associating a set of named attributes with each symbol and
- augmenting productions with attribute evaluation rules

To every symbol `X` a computation is associated that returns a tuple of values, the attributes of `X`. Productions are of the form

    X s₁ s₂ → … Y t₁ t₂ … Z u₁ u₂ …

where `s₁, s₂, t₁, t₂, u₁, u₂` are the attributes associated with the symbols `X, Y, Z`. The computation, in its simplest form, is a function that computes the attributes on the left-hand side of a production from the attributes on the right-hand side:

    (s₁, s₂) = f(t₁, t₂, u₁, u₂)

With the implementation of a parser in mind, we allow not only mathematical functions but programming language statements to express the computation. If a symbol appears multiple times, the attributes are given unique names, as the attributes are identified by position.

Consider a grammar for binary numbers with productions:

    binary → binary digit
    binary → digit
    digit → 0
    digit → 1

For computing the value of a binary number, one integer attribute is associted with `digit` and one integer attribute with `binary`. An attribute grammar computing the value is:

| production                     | attribute rule   |
|:-------------------------------|:-----------------|
| `binary v → binary w  digit d` | `v := 2 × w + d` |
| `binary v → digit d`           | `v := d`         |
| `digit d → '0'`                | `d := 0`         |
| `digit d → '1'`                | `d := 1`         |

In the parse tree, the attributes are evaluated bottom-up; the meaning of a sentence is given by the attributes of the start symbol from which it is derived.

Draw the parse tree of `101` and annotate each node of the tree with the attribute values!

**Question.** Given above plain grammar for binary numbers, what is an attribute grammar for computing the number of zero's and one's of sequence of digits? Draw the parse tree for `1011` and annotate each node with the attribute values! _Hint:_ Use two attributes, one for the number of zero's and one for the number of one's.

| production                                | attribute rule               |
|:------------------------------------------|:-----------------------------|
| `binary z₀ o₀ → binary z₁ o₁ digit z₂ o₂` | `z₀, o₀ := z₁ + z₂, o₁ + o₂` |
| `binary z₀ o₀ → digit z₁ o₁`              | `z₀, o₀ := z₁, o₁`           |
| `digit z o → '0'`                         | `z, o := 0`                  |
| `digit z o → '1'`                         | `z, o := 1`                  |

Above grammar has left-recursion, so it unsuitable for recursive descent parsing. An equivalent grammar in EBNF is:

    binary → digit { digit }
    digit → '0' | '1'

In this form, the attribute rules are placed "inside" the productions to express that the attributes are to be evaluated after a nonterminal is recognized, as would be with the plain grammar. The attribute rules are delineated `«` and `»`:

    binary v → digit d « v := d » { digit d « v := 2 × v + d » }
    digit d → '0' « d := 0 » | '1' « d := 1 »

In the construction of a recursive descent parser, the attributes become result parameters of the parsing procedures. The rules for constructing `pr(E)` are extended to include attribute evaluation rules:

| `E`             | `pr(E)` |
|:----------------|:--------|
| `«stat»`        | `stat`  |

As Python does not have result parameters but uses return values, local variables for the attributes are introduced and returned at the end of each parsing procedure. Here is the parser for above grammar without attributes:

In [7]:
def nxt():
    global pos, sym
    if pos < len(src): sym, pos = src[pos], pos+1
    else: sym = chr(0) # end of input symbol

def binary():
    digit()
    while sym in '01': digit()

def digit():
    if sym == '0': nxt()
    elif sym == '1': nxt()
    else: raise Exception("invalid character " + str(pos))

def parse(s):
    global src, pos;
    src, pos = s, 0; nxt(); binary()
    if sym != chr(0): raise Exception("unexpected characters at " + str(pos))

parse("101")

Here is the parser with attribute rules added:

In [8]:
def nxt():
    global pos, sym
    if pos < len(src): sym, pos = src[pos], pos+1
    else: sym = chr(0) # end of input symbol

def binary():
    d = digit(); v = d
    while sym in '01': d = digit(); v = v * 2 + d
    return v

def digit():
    if sym == '0': nxt(); d = 0
    elif sym == '1': nxt(); d = 1
    else: raise Exception("invalid character " + str(pos))
    return d

def parse(s):
    global src, pos;
    src, pos = s, 0; nxt(); v = binary()
    if sym != chr(0): raise Exception("unexpected characters at " + str(pos))
    return v

parse("101")

5

Consider following grammar for arithmetic expression over constant integers. The symbols are characters and white space (`ws`) is allowed around operators and integers:

    expression → ws term { '+' ws term }
    term → factor { '*' ws factor }
    factor → integer | '(' ws expression ')' ws
    integer → digit { digit } ws
    digit d → '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
    ws → { ' ' }

Attribute rules are added for evaluating an expression. 

    expression e → ws term t « e := t » { '+' ws term « e := e + t » }
    term t → factor f « t := f » { '*' ws factor f « t := t * f » }
    factor f → integer i « f := i » | '(' ws expression e « f := e »')' ws
    integer i → digit d « i := d » { digit « i := 10 * i + d » } ws
    digit d → '0' « d := 0 » | '1' « d := 1 » | … | '9' « d := 9 » 
    ws → { ' ' }

The implementation below contains several simplifications:
- the test `sym ∈ {'0', '1', … '9'}` is implemented by `'0' <= sym <= '9'`,
- if `sym` is a digit, it is converted to an integer by `ord(sym) - ord('0')`.

In [9]:
def nxt():
    global pos, sym
    if pos < len(src): sym, pos = src[pos], pos+1
    else: sym = chr(0) # end of input symbol

def expression() -> int:
    ws(); e = term()
    while sym == '+': nxt(); ws(); t = term(); e = e + t
    return e

def term() -> int:
    t = factor()
    while sym == '*': nxt(); ws(); f = factor(); t = t * f
    return t

def factor() -> int:
    if '0' <= sym <= '9': f = integer()
    elif sym == '(':
        nxt(); ws(); f = expression()
        if sym == ')': nxt(); ws()
        else: raise Exception("')' expected at " + str(pos))
    else: raise Exception("invalid character " + str(pos))
    return f

def integer() -> int:
    i = digit()
    while '0' <= sym <= '9': i = 10 * i + digit(); nxt()
    ws()
    return i

def digit() -> int:
    # '0' <= sym <= '9'
    d = ord(sym) - ord('0'); nxt()
    return d

def ws():
    while sym == ' ': nxt()

def parse(s) -> int:
    global src, pos;
    src, pos = s, 0; nxt(); v = expression()
    if sym != chr(0): raise Exception("unexpected characters at " + str(pos))
    return v

parse("(2 + 3)* 2")

10

Infix to postfix translation:

| infix notation      | postfix notation |
|:--------------------|:-----------------|
| `2 + 3`             | `2 3 +`          |
| `2 * 3 + 4`         | `2 3 * 4 +`      |
| `2 + 3 * 4`         | `2 3 4 * +`      |
| `(5 – 4) * (3 + 2)` | `5 4 – 3 2 + *`  |

**Question.** Where is postfix notation used?

All attributes are strings and `+` is used for concatenation. Complete the attribute rules!

    expression e → ws term t « e := t » { '+' ws term t « e := e + t + '+' » }
    term → factor { '*' ws factor }
    factor → integer | '(' ws expression ')' ws
    integer → digit { digit } ws
    ws → { ' ' }

Modify the implementation below accordingly!

In [10]:
def nxt():
    global pos, sym
    if pos < len(src): sym, pos = src[pos], pos+1
    else: sym = chr(0) # end of input symbol

def expression() -> int:
    ws(); e = term()
    while sym == '+': nxt(); ws(); t = term(); e = e + t
    return e

def term() -> int:
    t = factor()
    while sym == '*': nxt(); ws(); f = factor(); t = t * f
    return t

def factor() -> int:
    if '0' <= sym <= '9': f = integer()
    elif sym == '(':
        nxt(); ws(); f = expression()
        if sym == ')': nxt(); ws()
        else: raise Exception("')' expected at " + str(pos))
    else: raise Exception("invalid character " + str(pos))
    return f

def integer() -> int:
    i = digit()
    while '0' <= sym <= '9': i = 10 * i + digit(); nxt()
    ws()
    return i

def digit() -> int:
    # '0' <= sym <= '9'
    d = ord(sym) - ord('0'); nxt()
    return d

def ws():
    while sym == ' ': nxt()

def parse(s) -> int:
    global src, pos;
    src, pos = s, 0; nxt(); v = expression()
    if sym != chr(0): raise Exception("unexpected characters at " + str(pos))
    return v

parse("(2 + 3)* 2")

10

### Bibliography

<div class="cite2c-biblio"></div>