# Variable scoping

Armed with the knowledge we have just acquired, let us consider the following program:

```
k := 0
x := 0
incr_x := <k => x := x + k>
incr_x(1)
incr_x(3)
...
```

[[SEMANTIC STEPS]]

If we follow the semantics we have given in the previous lectures, then when we invoke `incr_x`, we do not just *declare* a new variable `k` in the state: we *overwrite* it. Overwriting a variable is a dangerous business. First of all, (over)writing a variable is referred to as a *destructive update*. It is called destructive because the value that was contained in the state is destroyed (we cannot read it anymore) in favour of the new value. Calling a function named `incr_x` not-so-subtly implies that the function will perform one task and one task only: increment variable `x`. Unfortunately, the function also has an unseen side-effect: it also resets the value of `k` to whatever we increment `x` by. This is very unfortunate, and to obviate this we might want to call the function `incr_x_and_reset_k`, but this would not really be the proper solution: it is a cumbersome name, and it reduces the usefulness of our function. An abstraction, such as a function, is considered to be *leaky* when it produces a change in some constructs which are not directly related to the main task of the function, or when it forces us to consider implementation details (such as the fact that `k` is reset). If you remember, the reason why we introduced functions in the first place was to ensure that we could only focus on what matters (such as incrementing `x`) while not having to keep in mind all the details needed to _make it happen_ (such as what happens with local data storage such as variable `k`), so this is actually quite a disappointing failure.

## Scope
To obviate this issue, we might observe that even though the code in the sample above does manipulate variable `k`, this variable acquires three very separate meanings during execution of the program: 
- in the beginning, the variable is just a global variable;
- during the first call to `incr_x`, `k` stores value $1$ (how much we want to increase `x`);
- during the second call to `incr_x`, `k` stores value $3$ (how much we want to increase `x`).

The reason why we consider these meanings to be distinct is that the second value that `k` acquires, $1$, is only useful and meaningful **inside the lambda**, but it should not interfere with (nor destroy) the previous value of `k`, which should remain (or return to) $0$ after the lambda is done computing. We might even say that even though the name of the variable is the same, the variable itself belongs to a different *context*, and thus the name and the context define **three separate variables**: one global, with value $0$, one local to the first call to `incr_x` and with value $1$, and a last local one for the second call to `incr_x` and with value $3$.

This notion of the context inside which a variable lives is called _scope_. Variables can either live in the *global* scope (the one we have used so far) or in the *local* scope defined by a function, and then being erased when the code of the function is done. If a local variable has the same name as a global variable, then it takes precedence. This means that the local `k` would be used for the function `incr_x`, whereas the global `k` would be left untouched.

In order to implement scope, we need to adjust our semantics for function invocation and variable lookup. We want to distinguish, when invoking a function, that its code is being run in a new and separate scope, and to distinguish scopes in the state.

Scopes in the state are stored in the so-called _call stack_: a stack which contains all local variables of functions being active at the current time (yes, there may be more than one: more on this later). When a function is invoked, its parameters are instantiated with the given arguments at the top of the stack. When the function is done, we remove the whole top of the stack in bulk, thereby erasing the scope of the function, which is not needed anymore.

In order to track whether or not the instructions we are running are part of a function call, we define a new instruction: `call`. `call` will be used to delimit the invocation of a function in the running program. Instructions stored inside a `call` are part of an active function call. 

In order to store the stack, the state will also be augmented with a new entry, called `stack`, where the local variables are pushed upon function invocation. The stack can either be empty or not:
- an empty stack $\{ \}$ suggests that no functions are active at the present time;
- a non-empty stack stores the local variables of the active function in the head $h$, and a tail $t$ stores the rest of the stack;

For example $\{ stack := \{ h := \{ x := 1 \}, t := \{ \} \}, x := 1 \}$ would be a state where variable `x` is defined both at the top of the stack and as a global variable.

The new semantics of function invocation will now have to setup the instructions of the function in a way that is recognizable by subsequent evaluations, therefore inside a `call` container instruction, and at the same time setting up the state to contain an extended stack:
    
$\text{eval}(<V(e_1, e_2, \dots)>, S) \rightarrow <call(a_1 := e_1; a_2 := e_2; \dots; L)>, S[\text{stack} := \{ h := \{ a_1 := \text{null}, a_2 := \text{null}, \dots \}, t := S[\text{stack}] \}]$ where $S[V] \rightarrow <(a_1, a_2, \dots) \Rightarrow L>$

Let us now consider the very simple program `incr_x := <k => x := x + k>; incr_x(1)` where the initial state is $\{ k := 0, x := 0 \}$. We now expect that, inside the body of the function, `k` will refer to $S[stack][h][k]$ instead of $S[k]$, because local variables must take precedence and also because we do not want to accidentally erase information stored in the global variable because of a local change that only makes sense inside the function itself. 

Implementing this new lookup strategy will therefore require two rules, in order to first look at the head of the stack and then, if nothing is found, in the globals:
- first we look in the stack: $\text{eval}(<V>, S) \rightarrow <S[\text{stack}][h][V]>, S$, where `V` is a variable name
- then we look in the globals, with the rule we already defined in one of the early chapters: $\text{eval}(<V>, S) \rightarrow <S[V]>, S$, where `V` is a variable name and $\not\exists S[\text{stack}][h][V]$.

Assigning to variables can be one according to multiple strategies. One strategy could be that new variables are always created inside the local scope, instead of globally. Another strategy could be that new variables are created inside the global scope, even from local scopes.

[[CREATING NEW VARIABLES SHOULD BE LOCAL, WRITING TO EXISTING VARIABLES SHOULD BE CONTEXTUAL TO THE SCOPE]]

<div class="alert alert-block alert-info">
Notice that we have implicitly removed $eval-expr$, and are now using $eval$ for both statements and expressions. This has the important side-effect that evaluating expressions can now alter the state, which will be needed when allowing function invocations as part of expressions. For the moment though take notice: all rules that made up $eval-expr$ are lifted inside $eval$, and return an unchanged state.
</div>

Running the `call` instruction simply runs the internal instruction stored inside the `call`, as long as it is not `done`:

$eval(<call(I)>, S) \rightarrow <call(I')>, S'$, where $I$ is an instruction, $I \neq \texttt{<done>}$ and $eval(<I>, S) \rightarrow <I'>,S'$

When a function terminates, then its body becomes `done` and we reach the combination `call(done)`: then we must _pop_ the stack, that is remove the last set of local variables in order for the function to be properly cleaned up:

$eval(<call(done)>, S) \rightarrow <done>, S[stack := S[stack][t]]$

The original example now behaves in a significantly different, and more logical, way: the global variable `k` remains undisturbed throughout the evaluations of the `incr_x` functions:

```
k := 0
x := 0
incr_x := <k => x := x + k>
incr_x(1)
incr_x(3)
...
```

[[STATE TRACE]]

Let us consider a few more examples with name clashes between variables found in different scopes:

```
title := "Mr."
name := "Strange"
add_title := <title => name := title + " " + name>
add_title("Dr.")
add_title(title)
```

[[STATE TRACE]]

```
a := 0
b := 0
x := 0
incr_x := <a => x := x + a>
decr_x := <a => x := x - a>
incr_x(10)
decr_x(3)
...
```

[[STATE TRACE]]

## Nesting function calls
Thanks to our notion of scoping, each function call is segregated from the others. This extends to nested function calls: a function may call other functions, and this will produce a stack with more than one scope at the state level, and multiple nested `call` statements in the program. For example, consider the following program in which the function `quadruple` invokes another function, `double`. In this case, while we are executing the `double` function, there are two `call` statements inside each other (the outer one for `quadruple`, and the inner one for `double`), and the state contains a stack with two levels:

```
k := 0
x := 0
double := <() => x := x + x>
quadruple := <() => double(); double()>
```

[[STATE TRACE GOES HERE]]

Let us consider another example. Drawing of complex figures can itself be decomposed into the drawing of smaller figures in sequence, so we could formulate a $3 \times 3$ square as:

```
s := ""
star := <() => s := s + "*">
blank := <() => s := s + " ">
newline := <() => s := s + "\n">
line := <() => star(); star(); star(); newline() >
square := <() => line(); line(); line() >
square()
```

[[OH NOES, there must be a STATE TRACE missing here!!!!]]


Moreover, even if (some of) these functions use the same parameters, we still do not get clashes because each function variables are referred to a specific scope:

```
k := 0
x := 0
incr_x := <k => x := x + k>
mult_x := <k => x := x * k>
mult_incr_x := <(k,c) => mult_x(k); incr_x(c)>
mult_incr_x(2,3)
...
```

[[THERE WAS a STATE TRACE here, but it has been stolen! And I thought the Netherlands was a safe country, unlike Italy or Portugal!!!]]

<div class="alert alert-block alert-info">
As a closing note, it is worth mentioning that it is not just a random stroke of luck that allowed us to use the very same semantic rules for $eval$ in order to suddenly support arbitrarily nested function calls.

When we observed the problem of name clashes in the presence of a single function call, we might have made unsafe assumptions. The simples such assumption would have been that there is only one function called at a given time. This would have been an over-generalization of the example, but it would have been incorrect seen how the body of a function just contains code, so inside the function we can easily invoke other functions!

There is no universal recipe to make sure that every time we define a system we will cover all important cases, but it is important to always be on the lookout for such circumstances: always assume that the *most complex possible* combination of factors will present itself, and reason accordingly. Large scale, concurrent access, and constructs nested inside each other are the daily issues with which a programmer regularly contends!
</div>