## The structure of languages

We will see in detail today the code for a lispy calculator we'll call `stupidlang`, focusing on environment frames as opposed to parsing. The parsing code is provided here, we shall talk about it later.

This interpreter essentionally reproduces lis.py from Peter Norvig, but stripped down, and simpler. Go read his posts. Our reasoning for doing this is to expose you to some features of python, the use of closures to develop objects with state, and to see some basics of how a language works.

We'll then switch to a birds eye view on how python works.

### Environment Implementation

Our initial implementation of this language will be rough and tumble, but we'll separate it into different files, separate interface from implementation, and package it up nicely as the semester goes...

Indeed we will replace some of our closure-based implementations here by classes.

#### Nested Environment Frames

We implement an environment as frames nested, with the outer(upper) environment captured as a closure.


In [1]:
def function_env(outerenv = None):
    bindings={}
    def lookup(variable):
        try:
            found = bindings[variable]
            env = dispatch
        except KeyError:#not found inside, so go to the outer
            if outerenv is not None:
                found, env = outerenv('lookup', variable)
            else:
                raise NameError("{} <<>> not found in Environment".format(variable))
        return found, env
    def extend(variable, value):
        bindings[variable] = value       
    def extend_many(envtuples):#update can use a list of k,v tuples
        bindings.update(envtuples)
    def printit():#leak it read-only for debugging
        return bindings
    #The dispatch function is what is returned
    def dispatch(message, variable=None, value=None):
        if message == 'lookup':
            return lookup(variable)
        elif message == 'extend':
            return extend(variable, value)
        elif message == 'extend_many':
            return extend_many(value)
        elif message == 'printit':
            return printit()
    return dispatch
    

In [2]:
tryenv=function_env()
tryenv("extend", 'a', 5)
tups=[('b', 1), ('c', 2)]
tryenv("extend_many", None, tups)
tryenv('printit')

{'a': 5, 'b': 1, 'c': 2}

In [3]:
try2env=function_env(tryenv)
tryenv("extend", 'd', 55)
try2env("lookup",'d')[0], try2env("lookup",'a')[0]

(55, 5)

### Parser

We wont say much about the parser here except to remark that `lex` splits the program code into tokens, and converts to appropriate types using `typer`, and then `syn` converts these tokens into a nested list structure which reflects the structure of our language:

```python
program = """
(def rad 5)
rad
(def radiusfunc (func (radius) (* pi (* radius radius))))
(radiusfunc rad)
(def myvar 0)
(if (== myvar 1) (store rad 6) (store rad 7))
(radiusfunc rad)
(== 1 1)
"""
```

Line by line parse output:

```
[]
['def', 'rad', 5]
rad
['def', 'radiusfunc', ['func', ['radius'], ['*', 'pi', ['*', 'radius', 'radius']]]]
['radiusfunc', 'rad']
['def', 'myvar', 0]
['if', ['==', 'myvar', 1], ['store', 'rad', 6], ['store', 'rad', 7]]
['radiusfunc', 'rad']
['==', 1, 1]
[]
```

Nore that the parsing process and the code there has nothing to do with the execution environment. We'll see this later with how python is processed as well.

In [4]:
Symbol = str

def typer(token):
    if token == 'true':
        return True
    elif token == 'false':
        return False
    try:
        t = int(token)
        return t
    except ValueError:
        try:
            t = float(token)
            return t
        except ValueError:
            return Symbol(token)
        
def lex(loc):
    tokenlist =  loc.replace('(', ' ( ').replace(')', ' ) ').split()
    return [typer(t) for t in tokenlist]

def syn(tokens):
    if len(tokens) == 0:
        return []
    token = tokens.pop(0)
    if token == '(':
        L = []
        while tokens[0] != ')':
            L.append(syn(tokens))
        tokens.pop(0) # pop off ')'
        return L
    else:
        if token==')':
            assert 1, "should not have got here"
        return token
    
def parse(loc):
    return syn(lex(loc))

### Evaluator

Now lets talk about the program evaluation. Our evaluator uses the python environment. While it does not define a python frame-stack, it uses recursion via nested environment functions in python. All functionality like ops and built-in funcs are outsourced to python.

This makes our language a **DSL**, or **Domain Specific Language**. Writing a "self-hosting" language is beyond the parameters of this course, but little DSL's using this kind of hosted structure, or even simpler, directly using syntax in the host language are very common. Examples are jquery in javascript, rails in ruby, etc...

We first define the top level:

#### Global Environment

In this environment we put in everything in the math library, and define ops in terms of the built-in python ops and functions, so that we have a reasonably functioning calculator...

In [5]:
import math, operator as op
def global_env():
    "An environment with some Scheme standard procedures."
    env = function_env()
    env("extend_many", None, vars(math))
    env("extend_many", None, {
        '+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv, 
        'abs':     abs,
        'max':     max,
        'min':     min,
        'round':   round,
        '>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '==':op.eq,
        'not':     op.not_
    })
    return env

In [6]:
globenv = global_env()

Notice that we have polluted our namespace. You will be cleaning it in a later homework...

In [7]:
vars(math)

{'__doc__': 'This module is always available.  It provides access to the\nmathematical functions defined by the C standard.',
 '__file__': '//anaconda/envs/py35/lib/python3.5/lib-dynload/math.so',
 '__loader__': <_frozen_importlib_external.ExtensionFileLoader at 0x101989a58>,
 '__name__': 'math',
 '__package__': '',
 '__spec__': ModuleSpec(name='math', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x101989a58>, origin='//anaconda/envs/py35/lib/python3.5/lib-dynload/math.so'),
 'acos': <function math.acos>,
 'acosh': <function math.acosh>,
 'asin': <function math.asin>,
 'asinh': <function math.asinh>,
 'atan': <function math.atan>,
 'atan2': <function math.atan2>,
 'atanh': <function math.atanh>,
 'ceil': <function math.ceil>,
 'copysign': <function math.copysign>,
 'cos': <function math.cos>,
 'cosh': <function math.cosh>,
 'degrees': <function math.degrees>,
 'e': 2.718281828459045,
 'erf': <function math.erf>,
 'erfc': <function math.erfc>,
 'exp': <function math.

#### The workhorse

The workhorse of our language is the `eval_ptree` below. What it does is to go one-by-one through the various types of symbols that arise, and evaluate them in the context of our environment and enclosing environment. It needs to work recursively, evalusting subexpressions in stuff like `if` statements and stupidlang function bodies.

In [8]:
def eval_ptree(x, env):
    truthy_map={'#t':True, '#f':False, 'nil':None}
    if x in ('#t', '#f', 'nil'):#handle boolean tokens first
        return truthy_map[x]        
    elif isinstance(x, Symbol): #else do a lookup
        # variable, op lookup
        return env("lookup", x)[0]
    elif not isinstance(x, list):  # if still not a list, we are a constant
        return x
    elif len(x)==0: #noop for an empty list
        return None
    elif x[0]=='if':#an if statement
        (_, predicate, truexpr, falseexpr) = x
        if eval_ptree(predicate, env):
            expression = truexpr
        else:
            expression = falseexpr
        return eval_ptree(expression, env)
    elif x[0] == 'def':         # variable or function definition, local
        #print('in def x is',x)
        (_, var, expression) = x
        #postorder traversal by nested eval is needed below
        env('extend', var, eval_ptree(expression, env))
    elif x[0] == 'store':           # (store var exp), like set!
        (_, var, exp) = x
        env_found_in = env("lookup", var)[1]#can be found in an outer env
        env_found_in("extend", var, eval_ptree(exp, env))
    elif x[0] == 'func': #this is the function definition, not the execution
        #print("in func x is",x)
        (_, parameters, parsedbody) = x
        return func(parameters, parsedbody, env)
    else:                          # operators, funcs calling
        #print("x", x, "env", env("printit"))
        op = eval_ptree(x[0], env)
        #postorder traversal to get subexpressione before running the op
        args = tuple([eval_ptree(arg, env) for arg in x[1:]])
        #print('argies', args, op.__name__)
        #Function execution
        if op.__name__=='dispatch':#need to handle our defined funcs
            return op('call', *args)
        else:#regular ops and funcs we added to the environment
            return op(*args)

#### Defining a function in our language

Here is where we define a function. The function object represented by the returned `dispatch` function uses a closure to hold in the params, env in which function was defined, and code body.

The actual execution happens when `call` is called, via a function call `eval_ptree` defined above.

In [11]:
def func(params, parsedbody, env):
    def call(argstuple):
        print("in call", params, parsedbody, argstuple)
        funcenv = function_env(outerenv=env)
        funcenv('extend_many', None, zip(params, argstuple))
        #print(funcenv('printit'))
        return eval_ptree(parsedbody, funcenv)
    def dispatch(message, *args):
        print("in dispatch args are", message, args)
        if message=='call':
            return call(args)
    #print("at define time", params, parsedbody, env('printit'))
    return dispatch
            

In [12]:
dafunc=func(['radius', 'area'],[], globenv)
dafunc('call', 5, 7)

in dispatch args are call (5, 7)
in call ['radius', 'area'] [] (5, 7)


### Driver to run the code

We provide some driver functions:

In [448]:
def parse_program(program):
    "parse program line by line"
    output=[]
    program = [e.strip() for e in program.split('\n')]
    for l in program:
        output.append(parse(l))
    return output

In [449]:
def run_program(program, env):
    """
    run program line by line, accumulating python
    output in an array
    """
    output=[]
    program = [e.strip() for e in program.split('\n')]
    for l in program:
        parsed = parse(l)
        print(">", parsed)
        output.append(eval_ptree(parsed, env))
    return output

The `backtolang` function below converts python output from stupidlang code snippets bavk into stupidlang, so appropriate outputs can be printed. The game is simply to stringify numbers, and convert bools and lists appropriately.

We also provide a `repl`, which allows us to run code line by line.

In [450]:
def backtolang(exp):
    boolmap={True:'#t', False:'#f'}
    if  isinstance(exp, list):
        return '(' + ' '.join(map(backtolang, exp)) + ')' 
    elif isinstance(exp, bool):
        return boolmap[exp]
    elif exp is None:
        return 'nil'
    else:
        return str(exp)
    
def repl(env, prompt='calc> '):
    while True:
        try:
            val = eval_ptree(parse(input(prompt)), env)
        except (KeyboardInterrupt, EOFError):
            break
        if val is not None: 
            print(backtolang(val))
            


### Try it out

In [452]:
program = """
(def rad 5)
rad
(def radiusfunc (func (radius) (* pi (* radius radius))))
(radiusfunc rad)
(def myvar 0)
(if (== myvar 1) (store rad 6) (store rad 7))
(radiusfunc rad)
(== 1 1)
"""

In [453]:
for s in parse_program(program):
    print(s)

[]
['def', 'rad', 5]
rad
['def', 'radiusfunc', ['func', ['radius'], ['*', 'pi', ['*', 'radius', 'radius']]]]
['radiusfunc', 'rad']
['def', 'myvar', 0]
['if', ['==', 'myvar', 1], ['store', 'rad', 6], ['store', 'rad', 7]]
['radiusfunc', 'rad']
['==', 1, 1]
[]


In [454]:
for result in run_program(program, globenv):
    print(backtolang(result))

> []
> ['def', 'rad', 5]
> rad
> ['def', 'radiusfunc', ['func', ['radius'], ['*', 'pi', ['*', 'radius', 'radius']]]]
> ['radiusfunc', 'rad']
> ['def', 'myvar', 0]
> ['if', ['==', 'myvar', 1], ['store', 'rad', 6], ['store', 'rad', 7]]
> ['radiusfunc', 'rad']
> ['==', 1, 1]
> []
nil
nil
5
nil
78.53981633974483
nil
nil
153.93804002589985
#t
nil


In [392]:
repl(globenv)# to get out of the repl in the notebook just cause an exception like below

calc> )


NameError: ) <<>> not found in Environment

## How is Python implemented?

### The virtual machine

Python runs on a virtual machine.

A virtual machine is a software implementation of a real machine. As such, it will implement registers and stacks and other such constructs, along with an "assembly" language to program it.

We are interested here in a "process virtual machine".

Wikipedia:
>A process VM, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware or operating system, and allows a program to execute in the same way on any platform.

Perhaps the best known example of a process virtual machine is the JVM. But python has one too, and indeed, python is compiled to an assembly like bytecode. This bytecode is then "interpreted" by the machine: you can think of this bytecode as the machine code for the Python Virtual Machine.

The compiler also emits some other fields, which are needed to run the interpreter. The bytecode and this additional information is stored in a code object.

### Code Objects

Code objects are created whenever a block of python code is compiled. Here a block is defined as (from the manual):

>a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition.

Also treated as blocks are:

- every line in a repl
- strings passed to python -c

The text is transformed into an AST, and then PyAST_Compile is called on it, to produce code objects.

Code objects are immutable.

In [455]:
def f(x):
    a=1
    y = a+x
    return y

In [456]:
fcode = f.__code__
type(fcode)

code

In [457]:
def print_code(c):
    for x in dir(c):
        if x.startswith('co'):
            print(x, '=', getattr(c, x))

In [458]:
print_code(fcode)

co_argcount = 1
co_cellvars = ()
co_code = b'd\x01\x00}\x01\x00|\x01\x00|\x00\x00\x17}\x02\x00|\x02\x00S'
co_consts = (None, 1)
co_filename = <ipython-input-455-587cb7262809>
co_firstlineno = 1
co_flags = 67
co_freevars = ()
co_kwonlyargcount = 0
co_lnotab = b'\x00\x01\x06\x01\n\x01'
co_name = f
co_names = ()
co_nlocals = 3
co_stacksize = 2
co_varnames = ('x', 'a', 'y')


In [459]:
list(fcode.co_code)

[100, 1, 0, 125, 1, 0, 124, 1, 0, 124, 0, 0, 23, 125, 2, 0, 124, 2, 0, 83]

In [460]:
import dis
dis.opname[100], dis.opname[125], dis.opname[124] #see dis docs for all

('LOAD_CONST', 'STORE_FAST', 'LOAD_FAST')

In [461]:
dis.dis(f)

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               1 (a)

  3           6 LOAD_FAST                1 (a)
              9 LOAD_FAST                0 (x)
             12 BINARY_ADD
             13 STORE_FAST               2 (y)

  4          16 LOAD_FAST                2 (y)
             19 RETURN_VALUE


Columns:

1. line number
2. index into the bytecode string
3. instruction
4. argument to the instruction
5. what the argument means

In [462]:
dis.show_code(f)

Name:              f
Filename:          <ipython-input-455-587cb7262809>
Argument count:    1
Kw-only arguments: 0
Number of locals:  3
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1
Variable names:
   0: x
   1: a
   2: y


In [463]:
#from https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798766affb49bfba86e485012af6d16/common/blog.py?at=default&fileviewer=file-view-default
import dis
import types

def get_code_object(obj, compilation_mode="exec"):
    if isinstance(obj, types.CodeType):
        return obj
    elif isinstance(obj, types.FrameType):
        return obj.f_code
    elif isinstance(obj, types.FunctionType):
        return obj.__code__
    elif isinstance(obj, str):
        try:
            return compile(obj, "<string>", compilation_mode)
        except SyntaxError as error:
            raise ValueError("syntax error in passed string") from error
    else:
        raise TypeError("get_code_object() can not handle '%s' objects" %
                        (type(obj).__name__,))

def diss(obj, mode="exec", recurse=False):
    _visit(obj, dis.dis, mode, recurse)

def ssc(obj, mode="exec", recurse=False):
    _visit(obj, dis.show_code, mode, recurse)

def _visit(obj, visitor, mode="exec", recurse=False):
    obj = get_code_object(obj, mode)
    visitor(obj)
    if recurse:
        for constant in obj.co_consts:
            if type(constant) is type(obj):
                print()
                print('recursing into %r:' % (constant,))
                _visit(constant, visitor, mode, recurse)

#### Globals, Locals, and other

In [464]:
def fglb():
    global xxx
    global aaa
    y = 2
    xxx=3
    aaa += 1
diss(fglb)


  4           0 LOAD_CONST               1 (2)
              3 STORE_FAST               0 (y)

  5           6 LOAD_CONST               2 (3)
              9 STORE_GLOBAL             0 (xxx)

  6          12 LOAD_GLOBAL              1 (aaa)
             15 LOAD_CONST               3 (1)
             18 INPLACE_ADD
             19 STORE_GLOBAL             1 (aaa)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE


`STORE_GLOBAL` performs a binding or re-binding in the global namespace while `LOAD_GLOBAL` is generated when the compiler realizes that the variable is referenced in the function's body but never bound there. 

Here aaa may not be defined outside and could lead to a **runtime** error, but this is perfectly legal code from the perspective of the function.

The `*_FAST` opcodes are used when the compiler can infer that the variables are defined in the local namespace. There are optimized versions of `*_NAME` opcodes below.

In [465]:
diss('cc = dd -1')

  1           0 LOAD_NAME                0 (dd)
              3 LOAD_CONST               0 (1)
              6 BINARY_SUBTRACT
              7 STORE_NAME               1 (cc)
             10 LOAD_CONST               1 (None)
             13 RETURN_VALUE


Notice we have not talked anything about frames yet. Thats because, so far, we have only defined functions. Lets see what happens when we execute them.

#### Execution and Binding lookup

Frame creation occurs in when a code object needs to be evaulated:

- when a function is called
- when a module is imported (top-level code is executed)
- when a class is defined
- every command in the repl
- when eval or exec are used
- when the -c switch is used

```C
typedef struct _frame {
   PyObject_VAR_HEAD
   struct _frame *f_back;   /* previous frame, or NULL */
   PyCodeObject *f_code;    /* code segment */
   PyObject *f_builtins;    /* builtin symbol table */
   PyObject *f_globals;     /* global symbol table */
   PyObject *f_locals;      /* local symbol table */
   PyObject **f_valuestack; /* points after the last local */
   PyObject **f_stacktop;   /* current top of valuestack */
   PyObject *f_trace;       /* trace function */

   /* used for swapping generator exceptions */
   PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;

   PyThreadState *f_tstate; /* call stack's thread state */
   int f_lasti;             /* last instruction if called */
   int f_lineno;            /* current line # (if tracing) */
   int f_iblock;            /* index in f_blockstack */

   /* for try and loop blocks */
   PyTryBlock f_blockstack[CO_MAXBLOCKS];

   /* dynamically: locals, free vars, cells and valuestack */
   PyObject *f_localsplus[1]; /* dynamic portion */
} PyFrameObject;
```

- f_code points to precisely one code object per frame. So when we have a call stack of frames, this corresponds to call stack of code objects.
- when python code is evaluated, it is done so in 3 namespaces corresponding to three symbol tables: f_builtins, f_globals, and f_locals. A name will first be resolved in the local scope, then in the global scope, and then in the builtin scope. For nested scopes like in closures, we'll first search the local scopes of the outer functions and only then go to the global and the builtin scope. 

![](http://aosabook.org/en/500L/interpreter-images/interpreter-callstack.png)

There are three stacks alive during the running of a python program. Since we run on a virtual machine, the call stack and stack frames are dependent on the virtual machine, rather than the real machine your code runs on. This is the critical difference between stupidlang and what we are doing now.

- the first is the call stack. This is the stack of environments you are familiar with. Often its not explicitly represented as a stack, but a recursive lookup of environments. Or, as in the C case, offsets into memory.
- the second is the data stack or the value stack. There is one of these per environment frame, and is used to run code in the context of that environment. This is where data-manipulating opcodes like BINARY_ADD run, in conjunction with namespace related opcodes such as STORE_FAST and LOAD_FAST, seen above.
- there is a third stack to handle compound statements: statements that contain other statements. This stack is known as the block stack.

![](https://niltowrite.files.wordpress.com/2010/05/states4.png)