# Interpreters

We are going to built an interpreter. Interpreters are required to do a lot of error handling. What separates a useful programming language from one that's frustating to use is its ability to communicate errors effectively to its users.

What do interpreters do?

## Reading Scheme Lists

They read in the input program as texts and interpret it as a hierarchical structure.

In the case of building a Scheme interpreter, we're going to need to read many parentheses and `+`s and numerals and etc. and understand those as Scheme lists, that is, recursive lists.

A Scheme list is written as elements in parentheses:

In [None]:
(<element_0> <element_1> ... <element_n>)

Each `<element>` can be a combination or primitive. `<element_0>` can be a `+`, while `<element_1>` can be a second Scheme list with more parentheses.

Here is a particularly complicated Scheme list, which also happened to represent a Scheme expression.

In [1]:
(+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6))

57

Above, see that the first element is a `+`, and the 2nd element is a whole list of `(* 3 (+ (* 2 4) (+ 3 5)))`.

In any well-formed Scheme list, the number of left parentheses will be the same as the number of right parentheses. Part of reading Scheme lists would be matching up those parentheses. The other part of it would be figuring out whether all the individual numbers are well-formed. 

The task of parsing a programming language involves coercing a string representation of an expression in that language to an object that is the expression itself. This means validating that there are no errors and creating a nested hierarchical structure out of something that starts out as a bunch of parentheses and symbols.

Parsers must validate that expressions are well-formed. 

For the next few lectures, we're going to analyze a program called `scalc`, not as powerful as the Scheme itself. It only supports 4 operations: `+, -, *, /`. This is going to be a full functioning calculator for those 4 operations, but we'll use Scheme style syntax. 

Below assume that we're running the program `scheme_reader`. If we type in an expression, it will be printed out in 2 different ways.

In [None]:
> 1
1
1

In [None]:
> (1 2)
(1 2) ; Scheme representation of the list
Pair(1, Pair(2, nil)) ;Underlying Python representation, which explicitly states that this is a recursive list 

In [None]:
> (1 2 3)
(1 2 3)
Pair(1, Pair(2, Pair(3, nil)))

Note that this Scheme reader is not doing any arithmetic. 

In [None]:
> (+ 1 2 3)
(+ 1 2 3)
Pair('+', Pair(1, Pair(2, Pair(3, nil))))

It should still work even with random indentation,

In [None]:
> (+
    1 2
            3)
(+ 1 2 3)
Pair('+', Pair(1, Pair(2, Pair(3, nil))))

Now let's take a look at the program.

# Reader Note:

In the [course textbook](http://composingprograms.com/), there's a section that says [Interpreters for Languages with Combination](http://composingprograms.com/pages/34-interpreters-for-languages-with-combination.html). This page has a [link](http://composingprograms.com/examples/scalc/scalc.html) to the codes for the `calculator` program such as `scheme_reader`, `scalc`, etc.

Note that the `scheme_read` can handle `nil` and parentheses `()`, but it doesn't handle quotes `'` or dots `.`, which we'll add in the project.

### `scheme_reader.py`

#### `Pair` Class

The `Pair` class is just like a 2-element tuple. It has the `first` and `second` element.

In [None]:
def __init__(self, first, second):
    self.first = first
    self.second = second

However, it's special because when we print it out, we obtain a Scheme-style representation, as 2 different elements in parentheses.

In [None]:
>>> s = Pair(1, Pair(2, nil))
>>> s
Pair(1, Pair(2, nil))
# Scheme style representation below
>>> print(s)
(1 2)

#### `scheme_read`

The `scheme_read` function is the function that does all the work. It takes bunch of lines (regardless of spacing)...

In [None]:
>>> lines = ['(+ 1 ', '(+ 23 4)) (']

Or lines like this,

In [None]:
> (+
   1 2
           3)

...breaks the line to individual pieces,

In [None]:
>>> src = Buffer(tokenize_lines(lines))

And call `scheme_read` on the result,

In [None]:
>>> print(scheme_read(src))
(+ 1 (+ 23 4))

As we can see above, the result should be able to figure out that `(+ 1 (+ 23 4))` is the first well-formed Scheme expression in `lines`. 

Note that there's an open parentheses at the end of `lines`

In [None]:
>>> lines = ['(+ 1 ', '(+ 23 4)) (']

In [None]:
(']

This is not used at the first `scheme_read`. We'll have to call `scheme_read` again for that. 

This is a recursive procedure that where all the recursion happens in `read_tail`. 

In [None]:
elif val == "(":
    return read_tail(src)

After finding an open parentheses `(`, `read_tail` finds everything up until the closing parentheses. Some examples of how it work is listed withint the doctests of `read_tail`.

In [None]:
>>> read_tail(Buffer(tokenize_lines([')'])))
nil
>>> read_tail(Buffer(tokenize_lines(['2 3)'])))
Pair(2, Pair(3, nil))
>>> read_tail(Buffer(tokenize_lines(['2 (3 4))'])))
Pair(2, Pair(Pair(3, Pair(4, nil)), nil))

This program does error handling. For example, if we run the following,

In [None]:
> )
SyntaxError: unexpected token: )

This happens because we raise a SyntaxError in the following line in `scheme_read`,

In [None]:
else:
    raise SyntaxError("unexpected token: {0}".format(val))

### `read_print_loop()`

The whole program tries to read the expression and print out their result,

In [None]:
while src.more_on_line:
    expression = scheme_read(src)
    print(repr(expression))

When it finds a `SyntaxError`, it prints the error too.

In [None]:
except (SyntaxError, ValueError) as err:
    print(type(err).__name__ + ':', err)

Note that all of these code is inside a `while` statement that runs forever,

In [None]:
while True:
    try:
        src = buffer_input()
        while src.more_on_line:
            expression = scheme_read(src)
            print(repr(expression))
    except (SyntaxError, ValueError) as err:
        print(type(err).__name__ + ':', err)
    except (KeyboardInterrupt, EOFError):  # <Control>-D, etc.
        return

which means even after we have a `SyntaxError`, we're still in the same program we were in earlier.

The program covers other errors as well,

In [None]:
> 2.3.4
ValueError: invalid numeral: 2.3.4