# Variables, expressions, and operator precedence

## Introduction

In the previous chapter we finished by building a fake programming language, _Turtle_, with a very simple state which always contains exactly three pieces of information: `x`, `y`, and `pen`.

When solving bigger problems, it stands to reason that the state of _Turtle_ would fall dramatically short: how could we store information such as the year of birth of more than two people? Even if we accepted the misues of variables `x` and `y` as years of birth, `pen` can only be `on`/`off`, therefore is inadequate to represent a year.

In this chapter we will define a new simplistic beginner's programming language, which this time will be capable of more explicitly managing the state by also defining new names to store information in the state.

## Variables

When we define a new _name_, such as `year_of_birth`, which we subsequently use to bind and retrieve values in the state multiple times while running a program, we have implicitly defined what is almost universally called a _variable_. The term _variable_ comes from the fact that, by consistently using the same name, we are effectively definining a permament meta-entity that will accompany evaluation of the whole program. While the program is evaluated, the value bound to the name _changes_ (or _varies_), hence the name.

We now define a new, simple language, _Var_, which only allows us to perform variable manipulation. The syntax of the statements of _Var_ is:
- `V := N`, where `V` is a variable name and `N` is an arbitrary integer number (this is called _variable assignment_);
- `I;J`, where `I` and `J` are arbitrary statements (right associative: `A;B;C` means `A;(B;C)`);
- `done`.

The semantics of `Var` simply translate variable assignment to state binding, and follow the semicolons through:
- `eval(<V := N>, S)` $\rightarrow$ `<done>, S[V := N]`;
- `eval(<I; J>, S)` $\rightarrow$ `<I';J>, S'` given that `eval(<I>,S)` $\rightarrow$ `<I'>, S'` and `I` $\neq$ `done`;
- `eval(<done; I>, S)` $\rightarrow$ `<I>, S`;
- `eval(<done>, S)` $\rightarrow$ `<done>, S`.

Notice that variable assignment is just a way to _embed_ the binding operation, which operates on states, in the language itself. All instructions, no matter how advanced, are ways to embed the existing capabilities of state manipulation into the language: the language can not be more powerful than the underlying mechanisms upon which it is based.

Let us now consider an example program we could write in `Var`:

```day := 2; month := 3; year := 1985; done```

The first state is, as usual, the empty state: 

$$\{ \}$$

The evaluation will first run the composite `;` statement: `day := 2; ...`. This means that we begin by evaluating `day := 2` on an empty state. The semantic rules for assignment tell us that this leads us to a new pair program, state of:

```month := 3; year := 1985; done```

and

$\{ day := 2 \}$

Notice that variable assignment just allows us to write some statements that will directly, and without further translation, become bindings in the state. It is a transformation that reflects the structure of bindings in the program code itself.

## Expressions

The _Var_ language so far is not very useful by itself. We can store information in the state, but storage without the possiblity of retrieval is not very useful. Fortunately the state already supports reading of bindings, therefore we only need to embed an operation that is equivalent to state lookups in _Var_ and we are done.

In order to perform something useful, we will not just lookup variables directly. We will also perform some simple arithmetic operations while performing the lookup, therefore augmenting lookup with some basic transformations at the same time.

The assignment statement will therefore become:
- `V := E`, where `V` is a variable name and `E` is an _integer expression_;
- an integer expression can now be:
    - `N`, where `N` is an integer number;
    - `E`${_1}$ ` + E`${_2}$, where both `E`$_1$ and `E`$_2$ are expressions;
    - `E`${_1}$ ` - E`${_2}$, where both `E`$_1$ and `E`$_2$ are expressions;
    - `V`, where `V` is a variable name;

Examples of allowed integer expressions according to the definition above could therefore be:
- `3 + 4`
- `age + 1`
- `x + x`
- `new_score + old_score`

Of course assigning an expression to a variable cannot be done just directly. Our state is built up from bindings of names to values, so our state is not powerful or expressive enough to store an expression. Moreover, storing an expression directly would not be so interesting: we would rather like to evaluate the expression, and then store the resulting value in the state.

Fortunately, in our language, all expressions can be mapped to integer values by means of the _expression evaluation_ function, which we will call `eval_expr`. Evaluation of an expression requires the ability to read from the state, since an expression might contain variable names that can only be evaluated by a state lookup. When the expression is done evaluating, the result is either a new expression to evaluate further, or the final value, but no new state: expressions do not produce a new state when evaluating. This means that `eval_expr(<E>,S)` $\rightarrow$ `<E'> | V`, where `E'` is a new expression, `V` is a value, and `|` means "either of".

The semantics of expressions can therefore be stated as:
- `eval_expr(<N>, S)` $\rightarrow$ `N` (`N` is a number);
- `eval_expr(<L + R>, S)` $\rightarrow$ `<L'+R>`  where `L`, `R` are both expressions and `eval_expr(<L>, S)` $\rightarrow$ `L'`;
- `eval_expr(<N + E>, S)` $\rightarrow$ `<N+E'>`  where `N` is a number, `E` is an expression, and `eval_expr(<E>, S)` $\rightarrow$ `E'`;
- `eval_expr(<N + M>, S)` $\rightarrow$ `N+M`  where `N`, `M` are both numbers;
- `eval_expr(<V>, S)` $\rightarrow$ `C`, where `V` is a variable name, and `S[V]` $\rightarrow$ `C`;

Notice that we are evaluating expressions left to right, and we only perform the underlying sum operation when all sides to be added are constants. This suggests that the `+` operator has two distinct meanings: in one case, it is a placeholder inside the program that encodes the "wish" to perform a sum; in the other case, it is the actual arithmetic operation of adding two numbers together. The context always determines unambiguously which of the two interpretations is applicable: inside angle brackets `<`  and `>`, we manipulate statements and expressions (or shorter just "code"). Outsude angle brackets we manipulate state and arithmetic operations (the "concrete computer").

Let us now show some examples of expression evaluation.

`eval_expr(<3>, {})` evaluates a single integer constant. This results in the number three: `3`.

`eval_expr(<3+2>, {})` evaluates the sum of two integer constants. This results in their sum: `5`.

`eval_expr(<10-x>, { x := 5})` must first evaluate the right-hand side, leading us to `eval_expr(<10-x>, { x := 5})` $\rightarrow$ `5`. This is then injected in place of the value `x`, resulting in a new expression `<10 - 5>` which will be evaluated at the next test.

We can now reformulate the semantics of assignment in order to include the evaluation of expressions when needed. This means that when evaluating an assignment, we will be able to perform a binding in the state right away only when the right-hand side of assignment is a constant number. If this is not the case, we will evaluate the expression before proceeding with the assignment:
- `eval(<V := N>, S)` $\rightarrow$ `<done>, S[V := N]` (`N` is a number);
- `eval(<V := E>, S)` $\rightarrow$ `<V := eval_expr(<E>, S)>, S` (`E` is an expression).

Let us now see an example of expression evaluation and instructions mixed together. We will begin by the very simple program `day := 2; tomorrow := day + 1; done`, evaluate from the usual empty state $\{ \}$.

The first step simply adds `day := 2` to the state, since it is the assignment of a constant. This leads us to:

`tomorrow := day + 1; done`, $\{ day := 2 \}$

Evaluating `tomorrow := day + 1` requires more than one step, seen how we must first evaluate the expression `day + 1`, which in turn evaluates the left hand side `day`, which is simply the lookup $\{ day := 2 \}[day]$. This leads us to:

`tomorrow := 2 + 1; done`, $\{ day := 2 \}$

At this point the first statement can be evaluated to a number, since both its operands are constants:

`tomorrow := 3; done`, $\{ day := 2 \}$

Assignment of a single constant leads to a binding, therefore we reach the final step:

`done`, $\{ day := 2, tomorrow := 3 \}$

## Operator precedence
- we can also extend our language to other operators such as `*` (multiplication) and `/` (division);
- we now get ambiguous expressions though: how do we interpret `3 + 10 * 2`?
    - is it $26$ or is it $23$?
    - we introduce operator precedence:
        - parenthesized expressions
        - multiplication and division
        - addition and subtraction
    - this yields the following evaluation sequence:
        - `<3 + 10 * 2>` $\rightarrow$ `<3 + 20>` $\rightarrow$ `<23>` (state omitted as there are no variables)
- how do we turn this into semantics?
    - there exists a _parser_, a small program that reads the code before we run it
    - the parser knows about operator precedence and adds the parentheses automatically
    - so if a programmer writes `3 + 10 * 2`, the parser automatically adds brackets as if we had written `3 + (10 * 2)`
    - this means that ambiguity is solved at the language level
- let us now see some examples of programs featuring complex expressions, and remembering the fact that assignment instructions are rewritten until they assign to a primitive:
    - `x := 2 + 3 * 2; done`, $\{ \}$ $\rightarrow$
      `x := 2 + 6; done`, $\{ \}$ $\rightarrow$
      `x := 8; done`, $\{ \}$ $\rightarrow$
      `done`, $\{ x := 8 \}$ $\rightarrow$
    - `bonus := 1; malus := 3; multiplier := 4; score := 2 * multiplier + bonus - malus; done`
    - `x := 1; y := x + x * x; z := (x + 1) * (y + 2); done`

## Expressive power of _Var_ vs _Turtle_
- is the graph of operations of `Turtle` smaller, bigger, or equal than the graph of operations of `Var`?
    - draw a series of possible states of `Turtle`, linked by instructions;
    - draw a series of possible states of `Var`, linked by instructions;
    - let us show that operations such as `up N` can be encoded from `Turtle` to `Var` as `y := y + N`;
    - but there are operations such as `z := 10` that cannot be encoded from `Var` to `Turtle`;
    - this means that `Turtle` is _less expressive_ than `Var`.
    
    
## Small vs long step semantics
It should be clear by now that our evaluation strategy only takes a very small step in the evaluation, that is it is not capable of solving large problems all at once. Instead, slowly but steadily, it will transform a single piece of the program into a slightly simpler, equivalent version of itself that is one step closer to the desired target. This way, applying evaluation multiple times (potentially a very large number of times) will evaluate the whole program.

We call `eval` _small step semantics_, whereas repeated application of `eval` _big step semantics_. Such big step semantics could be (roughly, because we are ignoring `done`) defined as:

`Eval` $\rightarrow$ `Eval` $\circ$ `eval`

where $\circ$ denotes function composition. 