## **Concepts of programming**
---
If you want to be a better programmer, it is important to understand certain concepts of programming languages. Of course, I will focus on Python, but just to make a point some Haskell could pop up as comparison. I am going to keep this pretty terse as it might be a tad boring.
Many a programmer has a less than solid idea of what is a variable. They confuse the concepts value, binding, and variable. One of those misapprehensions is that `identifier` in the following code is a variable.


In [1]:
identifier = [n for n in range(1,11)]
identifier

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

#### **Formal definitions**

`identifier` is an identifier, it binds a human readable name (hopefully a bit more meaningful than this one) to the variable. 

The list on the right side of the allocation is also not the variable, it is the value. But what then is the variable? A variable in Python is a container for the value. The container is a location in memory, where the value is stored, which can be inspected at will by using the binding. In languages like Python the value in the container can be changed via assignment.

This leads us to the following formal definitions:

 * Variable; a variable is a container; the container is a memory cell (or multiple) with an address. the container contains a value.
 * Value; a value is an entity that can be manipulated by a program. Values can be evaluated, stored, passed as arguments, returned as function results. Python supports primitive values like int, float, bool, and it supports composed values like array and objects. 
 * Binding; a binding is made up of an identifier that is bound to a bindable entity. In Python these bindables are values, attributes, methods, objects, packages etc. 
 * Identifier; a human readable name that is used as part of a binding. You call a bindable via the identifier.


In [13]:
def plus2(n:int)->int:
    return n + 2

x = plus2 # x gets bound to the function plus2 

print(f'The value of x(4) = {x(4)}')

The value of x(4) = 6


Let me prove that my definition of a variable is correct.

In [8]:
one = [n for n in range(1,11)]
one

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [9]:
two = one 
two

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [10]:
two.append(11)
two

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


What will be the value of `one`? If I am incorrect then the variable is actually `one = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`. Nothing should have happened after all I didn't change anything to `one`. However, if I am correct and a variable is container (memory cell) that holds a value and that can be accessed by an identifier then `two` identifies the same memory cell and has access to the value stored therein, and by changing that value I did change it for `one`.

In [14]:
one

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

#### **More on variables**
 * Mathematical variables; there are more variables in programming languages, pure functional languages as Haskell, do not have variables as containers, they have mathematical variables, meaning a set but unknown value e.g., x in $x^2+2x$. 
 * We can also distinguish between a primitive variable, those that are contained in one memory cell, and compound variable, those that are contained in multiple memory cells.

As side note when I talk about memory, I talk about the memory on a CPU, not main memory. In modern CPU architecture a memory cell (a.k.a. word) is 64 bits big. (Hence the Windows Intel 64 which you sometimes see)

Python built-in types as float, int, and complex are all primitive values. An object is an example of a compound variable and exhibits the same strange behaviour as the list from above.


In [17]:
import dataclasses # both dataclass and datetime are identifiers, both of a module
import datetime

@dataclasses.dataclass
class Human:
    firstname: str
    surname: str
    date_of_birth: datetime.datetime
    sex: str
    

In [18]:
laurens = Human(firstname='Laurens', surname='Sandt', date_of_birth=datetime.datetime(1971,7,19), sex='male')

In [19]:
laurens

Human(firstname='Laurens', surname='Sandt', date_of_birth=datetime.datetime(1971, 7, 19, 0, 0), sex='male')

In [20]:
alias = laurens

In [21]:
alias.sex='female'
laurens.sex

'female'

#### **Aliasing**
Fairly sure I am a male :-). The above problem is due to something called aliasing. Aliasing occurs when a variable can be accessed using two or more different names.

Alias: an alias is multiple identifiers (names) bound to the same variable.

As a variable is no more than a container for a value, we can access that value via multiple names and assign the container a new value, changing values for all identifiers bound to that variable. Aliasing is large source for bugs, one that is difficult to debug (especially in large systems), and exceedingly difficult to find via testing. This is a repeating phenomenon in several notebooks.

Python has two options to determine if there is an alias:
 1. The `is`-operator; the `is`-operator compares memory addresses.
 2. The `id()` function returns the actual memory address.


In [33]:
laurens is alias

True

In [23]:
(id(laurens), id(alias))

(2954562266832, 2954562266832)

In [24]:
laurens == alias

True

Despite the above result you can't use the `==` operator to see if something is an alias, it compares values not variables. 

In [34]:
f = [n for n in range(1,5)]
g = [n for n in range(1,5)]
f == g

True

In [35]:
id(f),id(g)

(2954567800512, 2954567802944)

#### As you can see the memory addresses are different, they are not aliasses, something confirmed by the `is`-operator 

In [36]:
f is g 

False

#### **`copy()` & `deepcopy()`**
you can solve some of the problems with aliasing by using copy() instead of direct assignment

In [37]:
one = [n for n in range(1,11)]
two = one.copy()
one

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [38]:
two

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [39]:
one==two

True

In [40]:
two.pop(0)
two

[2, 3, 4, 5, 6, 7, 8, 9, 10]

In [41]:
one

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

By using `copy()` I copy the contents of the list to a new variable with identifier two. If I use `two` to access that variable, and use `pop(0)` to change its value, `one` does not also change. 

#### **Storables**
Unfortunately, this doesn't solve all problems to understand that I'll have to introduce the concepts pointer and storable:
 * Pointer: A pointer is a value that is either null (None in Python) or refers to a variable. It points to that variables memory location.
 * Storable: A value is storable if it can be stored in a single memory cell. 

Python knows only two types of storables, primitives and you guessed it pointers. The `copy()` functions copies the elements of for instance a list. If it is a primitive it copies that if it is a pointer, it copies that too.


In [63]:
three = [1,2,[3,4],5]
four= three.copy()

In [64]:
three

[1, 2, [3, 4], 5]

In [65]:
four

[1, 2, [3, 4], 5]

three and four are not aliasses, I can prove that.

In [66]:
three is four

False

In [67]:
four[2][1] = 666
four

[1, 2, [3, 666], 5]

In [68]:
three

[1, 2, [3, 666], 5]

Houston, we have a problem. What just happened? `three` and `four` are not aliases as we have proven!

Correct, however unfortunately the third element of each list are aliases. `copy` copies the elements, and the element in this case is a pointer, a pointer to a value in memory location. Creating a defacto alias. You can prevent this problem by using `deepcopy` which doesn't copy just the pointer but keeps following the pointer to a memory location with a value and copies that.

`deepcopy` is part of the copy module and the method of list and therefore needs to be imported.


In [48]:
from copy import deepcopy

five = [1,2,[3,4],5]
six = deepcopy(five)

In [49]:
five

[1, 2, [3, 4], 5]

In [50]:
six

[1, 2, [3, 4], 5]

In [51]:
six[2][1] = 666
six

[1, 2, [3, 666], 5]

In [52]:
five

[1, 2, [3, 4], 5]

You could implement a `copy()` method in your own class and prevent any mishaps.

In [53]:
from dataclasses import dataclass, field
from copy import deepcopy

@dataclass
class A:
    l:list=field(default_factory=list)
    
    def copy(self):
        return deepcopy(self)

In [54]:
a = A(l=[1,2,[3,4],5])
a

A(l=[1, 2, [3, 4], 5])

In [55]:
b = a
deep = a.copy()

In [56]:
a

A(l=[1, 2, [3, 4], 5])

In [57]:
deep

A(l=[1, 2, [3, 4], 5])

In [58]:
b.l[2][1]=666
b.l

[1, 2, [3, 666], 5]

In [59]:
a.l

[1, 2, [3, 666], 5]

In [60]:
deep.l

[1, 2, [3, 4], 5]


By implementing `copy()` within our class we create an easy and safe way to create objects by copy while avoiding aliases. A question that might pop-up in your mind is, how to know if all your values are primitives especially if you have a million values or so. You can use the hash function, it only hashes immutables.

* hash function: A hash function is any function that can be used to map data of arbitrary size to fixed-size values.


In [61]:
hashable = (1,2,3,4,5)
hash(hashable)

-5659871693760987716

In [62]:
unhashable = (1,2,[3,4],5)
hash(unhashable)

TypeError: unhashable type: 'list'

As you can see we get a TypeError, you cannot hash a list.

#### **Type**
Type: a type is a set of values equipped with one or more operations that can be applied uniformly to all these values. If two objects contain the same set of values and have the same operations, then they are of the same type. 

Programming languages in general have different types: 
 * primitive types
 * composite types
 * recursive types.
 
Python also has the function `type` which return us the class an object belongs to. But it can also be used as a class factory. 

In [69]:
type(5)

int

In [70]:
type(laurens)

__main__.Human

#### **Primitive types**
Built-in atomic types, in Python:
 * Int 
 * Float
 * Complex

see  https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str.

Other languages than Python (for instance Java and Haskell) often know more primitive types:
 * Double
 * Char
 * Long

In [76]:
type('a')

str

In Python there is no char type. A char in Python is a string and a string is a composed type. A Python string is an immutable object.

In [77]:
x = 'George'
id(x)

2954539346544

In [78]:
hash(x)

-526477365775056394

In [79]:
y = ' is a rhino!'
x = x + y
x

'George is a rhino!'

In [80]:
id(x)

2954567642176

#### **Garbage collection**
The first x (`x = 'George'`) has now become an orphan value, there is no way it Python to access the value that is stored in the original variable. So, what happens now to that variable? After all it takes up expensive real estate. It will be cleaned up by the inbuild garbage collector. The collector frees up memory space of unreachable objects and relinquishes control back to the operating systems. Garbage collection is a complex issue which I will endeavour to enlighten a bit at the end of this notebook.


#### **Composed types**
Most Python types are composite types. We can further subclass composite types into:
 1. Cartesian Product $S\times T = \{(x,y)|x \in S, y \in T\}$. A tuple is an example of a Cartesian product, so is a database record. 
 2. Mappings $S \rightarrow T = \{f| x \in S, f(x) \in T\}$. The obvious example of mapping in Python is a dictionary where you map keys on values, but of course a function is a mapping too, from domain to codomain.
 3.  Disjunct Union $S + T = \{left \space x | x \in S\} \cup \{right \space y| y \in T\}$ An object we can understand as disjunct union, simplistically we could understand it as a tuple of components. However, in OO objects need to have identity, therefore they need a unique tag to identify class. An object is a tagged tuple, or in Python lingo a named tuple.


In [81]:
from collections import namedtuple
'''the collection class has a namedtuple  which allows you to create classes without methods'''

Card = namedtuple('Card',['rank','suit'])
seven_clubs = Card(7, 'clubs')
type(seven_clubs)

__main__.Card

In [82]:
seven_clubs

Card(rank=7, suit='clubs')

#### **Recursive Types**
A recursive type is one that is defined in terms of itself. Python does not know recursive types, language which you recursion as main form of repetition as Haskell do.The Haskell definition of a list is a recursive type:

`list a = Empty | Cons a (list a)`  

A list is either empty or it is the element a is consed upon a list of a. A definition in terms of itself. I can do someting similar in Python

In [87]:
from dataclasses import dataclass

class IntList:
    first=None
    
    def __init__(self, first):
        self.first=first

        
                
class IntNode:
    elem:int
    succ=None
    
    def __init__(self, elem:int, succ):
        self.elem = elem
        self.succ = succ

primes = IntList(IntNode(2, IntNode(3, IntNode(5, IntNode(7, None)))))  
primes

<__main__.IntList at 0x2afe9a0fad0>

However recursive types are not a natural fit to Python.

Python in general has a problem with recursion. Even with recursive functions, Python has a limited recursion depth. 

In [88]:
import sys
print(sys.getrecursionlimit())

3000


#### **Interpreter/compiler**
Python is an interpreted language. Python directly executes instructions written in the program language without first compiling them to machine code. Python does this in two steps:
 1. Translate the code into object code.
 2. Immediately execute that code.

I do not go to deep in this, but there are more Python interpreters. I know of four, CPython, Anaconda, JPython, PyPy, and there probably more. The standard Python interpreter is CPython, this uses the above two steps. PyPy on the other hand is a just in time compiler, it doesn't interpret the code. The PyPy compiler is faster than the CPython compiler, but that does not really matter if you do not have a specific use case stick to CPython. 

Both CPython and PyPy are largely written in the C language. C doesn't have an optimization for tail recursion. No matter what that means, but it has a consequence, Python has an extremely limited recursion depth.


#### **Call by sharing** 
Python uses the call by sharing mechanism for her arguments. What does that mean and why do I need to know? It means that every function receives a copy of the references in the arguments it uses. Thus, they are in fact aliases and come with alias problems.


In [124]:
def f(a,b):
    a += b # this assignment changes the argument a which can linked to an outside variable
    return a

In [125]:
x = 1 
y = 2
f(x,y)

3

In [126]:
(x,y) 

(1, 2)

This goes all as expected, now we try with composite types.

In [127]:
x = [1,2]
y = [3,4]
f(x,y)
(x,y)

([1, 2, 3, 4], [3, 4])

 have inadvertently changed x. The cause: again, Python knows two storables primitives and pointers.

In the above code a is a reference (pointer) to x which is a pointer to memory location, when I return, I have changed x, it is the same memory location.


#### **garbage collection**
Just quick word on garbage collection Python objects don't get explicitely destroyed when deleted the reference to the object is removed. Orphan values need to be collected by the garbage collector. 

In [128]:
a = [1,2]
b = a
a,b

([1, 2], [1, 2])

We delete a explicitely and than call b.

In [129]:
del a 
b

[1, 2]

No problem, but now we rename b.

In [130]:
b = 'george is rhino'
b

'george is rhino'


At this moment in time there is still a memory cell that contains the value `[1,2]`. However, there are no longer any bindings, there is no more name that links to the location in memory, it is an orphaned variable, unreachable. 

In [131]:
import weakref

s1 = {1,2,3}
s2 = s1 
def bye():
    print('Give me a snacky!')   

In [132]:
ender = weakref.finalize(s1, bye) # print bye if there is no longer a bind to the location in memory with value {1,2,3}
ender.alive

True

In [133]:
del s1 
ender.alive

True

In [134]:
s2 = 'Croc is peckish!' # At this moment the previous binding is relinquished

Give me a snacky!


In [135]:
ender.alive

False

#### **Python quirks!**
Python has many stranger things, I just thought I point this one out. Python creates aliasses where you don't expect it, for instance if you create a tuple from a tuple.


In [136]:
t1 = (1,2,3)
t2 = tuple(t1)
t1 is t2

True

You don't have this behaviour if you use the tuple partially.

In [137]:
t3 = tuple(n for n in range(1,11))
t4 = tuple(n for n in t3 if n % 2 == 1)
t3 == t4

False

It gets more curious, consider the small string and the larger string.

In [122]:
x = 'ABC'
y = 'ABC'
x is y

True

In [123]:
x = 'George is a rhino!'
y = 'George is a rhino!'
x is y

False

The small string is an alias, the larger isn't...

This is due to a Python optimization technique called interning. If you meet this behaviour hopefully you will be less surprised than I was. Enough about concepts in programming, most oddities in Python should be clear now.

---
## **The End**