## Structured programming (control flow) 

---
With structured programming, a developer uses no more than three basic control structures to structure the code. Large blocks of code are avoided by the use of these abstractions in conjunction with a hierarchy of procedures. This might sound a bit opaque, but it is the way we all program. No matter if we programme in Python, C, or Haskell, We use only three basic control structures:

1. Sequencing: statements are executed one after the other in a specific order. In Python, this is achieved by indentation, a semicolon, or simply a new line.
2. selection: choice: if B, then S1; otherwise, S2. Here B is a boolean and S is a statement (a single executable instance). 
3. repetition through iteration: where you have the condition check in advance or afterwards (while B does S) and (do S until B) and/or repetition through recursion: repeated calling of a procedure until some base case is met.
 
One of the great results of programming language theory, the Böhm-Jacopini theorem, proves that we need no more than these three to compute any computable function. For anyone creating programmes larger than a few lines of code, it is imperative to have a good understanding of the control structures. In this notebook, I will expound on these three structures and on other tools Python gives you to increase the structure of your programming. For instance, closures, context managers, and the `else` keyword. Some of these subjects I want to investigate in depth; some pass over quite quickly as they are not well suited to Python, e.g., recursion.

In [1]:
def a():
    x = 3
    return x


def b():
    x = 4
    return x

#### Sequencing & Scope
I am sure you know most about sequencing, so I will focus this discussion on scope. Let me start off with the definition of scope: 
- The scope of a declaration is the part of a programme where the bindings produced by the declaration are available.

Just think of bindings as variables; they are not really the same, but that is a topic for another notebook, one on programming concepts.

In [2]:
a()

3

In [3]:
b()

4

We have two variables named x; the scope of these variables is the functions `a` and `b` that encompass them. The scope of a name binding (an association of a name to an entity, such as a variable) is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity, e.g., I can never get 4 if I call `a()`

In [4]:
x = 7


def a(arg: int):
    x = 3
    return x


def b():
    x = 4
    return x

In [5]:
a(x)

3

#### Inner scope has preference
The x declared outside the scope of `a()` and `b()` is never used; the inner scope is preferred to the outer scope. Only if I call it outside the scope, respectively, `a()` and 'b()', can I use it.

In [6]:
print(x)

7


#### **Dynamic vs static scoping**
There are two forms of scoping: dynamic and static. If we would look at the code below, in a dynamic scoped language,`s` would have changed, and the outcome `q(4)` would be 12.

Python is a static-scoped language. Static scoping is more efficient than dynamic scoping and allows for [information hiding](https://en.wikipedia.org/wiki/Information_hiding).

In [7]:
s = 2


def f(x: int) -> int:
    return s * x


def p(y: int) -> None:
    print(f(y))


def q(z: int) -> None:
    s = 3
    print(f(z))


p(4)
q(4)

8
8


#### Scoping and namespace
Scoping and namespaces are related subjects. A namespace is a mapping of names to objects; in Python, these namespaces take the form of a dictionary. 
There are multiple namespaces in Python:

- the global namespace, which contains, for instance, all the built-in functions of Python.

- the modular namespace containing all methods (and perhaps attributes) of that module.

- the object namespace containing all methods and attributes of the object.

The important thing to know about namespaces is that there is absolutely no relationship between names in different namespaces. We can thus comfortably have the same names, as long as they belong to different namespaces. A.name and B.name. This is also the reason you really have no need for awfully long names; namespaces will rarely get big enough to have conflicting names. 

The namespace of an object is the scope of all that is named in the class.

In [8]:
class A:
    s = 2

    def f(self, x: int) -> int:
        return self.s * x

    def p(self, y: int) -> None:
        print(self.f(y))

    def q(self, z: int) -> None:
        self.s = 3
        print(f"{type(self).__name__}`s s variable is {self.s}")

We can call on the built-in function `dir` to get the names in the scope of the object. Or the global scope of this notebook, if we do not give an argument.

Calling dir will probably give you more information than you wanted, as you inherit many methods from `Builtins.object`.

In [9]:
dir()

['A',
 'In',
 'Out',
 '_',
 '_2',
 '_3',
 '_5',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'a',
 'b',
 'exit',
 'f',
 'get_ipython',
 'p',
 'q',
 'quit',
 's',
 'x']

In [10]:
a = A()
a.p(4)
a.q(4)  # this is another form of sequencing

8
A`s s variable is 3


#### Closures
In Python closures are used quite often. Closures are used to expand the scope of a function or class. So, what is a closure? A general definition of a closure is a nested function that references one or more variables from its enclosing scope. This definition probably makes more sense with an example.

In [11]:
def say():
    greeting = "Hello"

    def display():
        print(greeting)

    return display()

In [12]:
say()

Hello


Here `display` is the closure; it is nested within `say`; it uses a greeting that is part of `say`, which is also the enclosing scope. 

With the use of closures, it is possible to give a sub-procedure one or more private variables that remain in existence between procedure calls. You actually see closures quite often in Python, as you use a lot of decorators. Decorators can be used in Python as higher-order functions, and the decorated function has access to the decorator; the decorated function contains the closure. 

Of course, now we want to know what a higher-order function is. In mathematics and computer science, a higher-order function is a function that does at least one of the following:
1. It takes one or more functions as arguments.
2. It returns a function as its result.

As we know from mathematics, function composition is an example of a higher-order function: $f \circ g$ or $g(f(x))$.

A curried function is a function that returns a function. In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument. $f(x,y)=h \rightarrow f(x)=g, g(y)=h$. Don't worry, Python doesn't do currying, but it does do pattern matching, which is sort of related and will be discussed in this notebook.

Let's look quickly at a function decorator, consult the notebook on decorators and dataclasses for more information.

In [13]:
ok = 10
waisting_time = 40
are_you_crazy = 1000


def fibonacci(n: int) -> int:
    """this fibonacci method returns the n-th fibonacci number"""
    return n if n < 2 else fibonacci(n - 1) + fibonacci(n - 2)


fibonacci(ok)

55

This code works, but is not very efficient, as you can see below.

In [14]:
%time fibonacci(waisting_time)

CPU times: user 38.6 s, sys: 0 ns, total: 38.6 s
Wall time: 38.7 s


102334155

#### **dynamic programming**
The inefficiency is in that we don't store intermediate results. After all we have already computed the eight Fibonacci number (34 in 34 + 55  = 89) so why compute it gain, it is much faster to look it up.

With the `@functools.cache` decorator we enrich this function to store intermediate results. Caching intermediate results is an example of dynamic programming and in particular a technique called memoization. This is an advanced programming technique which I won't discuss in this notebook, but in a separate notebook on efficiency. 

In [15]:
import functools


@functools.cache
def fibonacci(n: int) -> int:
    """this fibonacci method returns the n-th fibonacci number"""
    return 1 if n < 2 else fibonacci(n - 1) + fibonacci(n - 2)


%time fibonacci(are_you_crazy)

CPU times: user 1.37 ms, sys: 0 ns, total: 1.37 ms
Wall time: 1.38 ms


70330367711422815821835254877183549770181269836358732742604905087154537118196933579742249494562611733487750449241765991088186363265450223647106012053374121273867339111198139373125598767690091902245245323403501

As you can see computed the 1000th Fibonacci number in less than a nano second using the decorator.

You can even use multiple decorators, but you have to think about the order is it $f \circ g \circ h$ or is it $f \circ h \circ g$. 

Function composition is an important subject you should master it, but I will not discuss it any further than this cell in this notebook. Say you have a large file of numbers (as data scientist you will) and you only want a selection of it. With function composition you can do this on the fly. Let’s say you want all multiples of 3 and 5 bigger than 300 smaller than 500 in a list of a thousand entries. The following single line of code will do.

In [16]:
tuple(
    filter(
        lambda x: 300 < x < 500,
        filter(lambda x: x % 3 == 0 and x % 5 == 0, tuple(n for n in range(1, 1001))),
    )
)

(315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495)

#### **Selection**
![afbeelding.png](attachment:c92c9b0f-df94-4285-9c96-69169b16ee48.png)

The pattern of selection is illustrated by the above diagram. All selection patterns are based on this model. You can have a nested `if... then... else...`  and make as complicated a pattern as you want, but the base is always the same.

In Python, selection is achieved by one of three means:
1. Conditional statement
2. Conditional expression
3. Structural pattern matching

In [17]:
def choice(b: bool) -> str:
    return "statement A" if b else "statement B"


choice(True)

'statement A'

In [18]:
choice(False)

'statement B'

#### Conditional expression
I cheat a little bit here because what I use is a conditional expression. A conditional expression consists of a condition `if b` in our case and two or more subexpressions. The condition value (in our case, true or false) will determine which subexpressions are evaluated. An expression is a syntactic entity in a programming language that may be evaluated to determine its value. 

Conditional expressions have been around in Python ever since 2003 see [PEP308](https://peps.python.org/pep-0308/). You could use a conditional statement just as well. Whatever you find easier to read.

In [19]:
def choice(b: bool) -> str:
    if b:
        return "statement A"
    return "statement B"


choice(True)

'statement A'

#### Structural pattern matching
As of Python 3.10, you don't have to make a choice with nested `if ... then ... else ...` statements anymore. You can use structural pattern matching; see [pep 635](https://peps.python.org/pep-0635/).

Pattern matching is a technique used by most functional programming languages, probably ever since their conception. For instance, if I wanted to write a function that returns the first element of a tuple in Haskell, I could define it as: 

```

fst :: (a,b) -> a
fst (x,_) = x

``` 

The realisation is that I really don't need the second element; the pattern of `fst` is always the first element. I can do very much the same in Python.


In [20]:
from typing import Any


def fst(t: tuple) -> Any:
    a, _ = t
    return a

In [21]:
tup = ("x", "y")
fst(tup)

'x'

The complement pattern is of course second.

In [22]:
def snd(t: tuple) -> Any:
    _, b = t
    return b

In [23]:
snd(tup)

'y'

Consider the code below: a common nested if... then... else... you will quite often see in Python code.

In [24]:
def nested(arg):
    if arg == "George":
        return "George is a rhino"
    elif arg == "Croc":
        return "Croc is peckish!"
    elif arg == "Rhino":
        return "I like to run around while snorting loudly"
    elif arg == "Ente":
        return "Ich brauch ein Taxi"
    else:
        return "there is nothing else"

In [25]:
nested("George")

'George is a rhino'

In [26]:
nested("James Bond")

'there is nothing else'

A match/case pattern captures they underlying the structure much clearer.

In [27]:
def match(arg):
    match arg:
        case "George":
            return "George is a rhino"
        case "Croc":
            return "Croc is peckish!"
        case "Rhino":
            return "I like to run around while snorting loudly"
        case "Ente":
            return "ich brauch ein Taxi"
        case _:
            return "there is nothing else"

In [28]:
match("Rhino")

'I like to run around while snorting loudly'

In [29]:
match("Croc")

'Croc is peckish!'

#### Wilcard
The underscore _ is a wildcard here; it says I don't care about this value to the compiler. I could have used the other keyword too, but we are actually pattern-matching on a structure; we are matching a string literal, so I find the underscore clearer, but it is of no consequence which one you use.

This type of structural pattern matching is the next big thing in Python; see https://peps.python.org/pep-0636/. You should read this excellent tutorial because pattern matching cleans up your code.

Here, I will limit this to a few subjects that I feel will show you the advantages of structural pattern matching. As you have seen, and as the rest probably guessed, you have several patterns you can match:

- Capture patterns (stand-alone names like direction, action, and objects). We never discussed these separately but used them as part of other patterns.
- Literal patterns (string literals, number literals, True, False, and None)
- The wildcard pattern

You can also pattern match on sequences, e.g., if you want the head of a list.

In [30]:
from typing import Any


def head(l: list) -> Any:
    [x, *xs] = l
    return x

In [31]:
l = [n for n in range(1, 11)]
head(l)

1

The antonym of head is the tails function easily written with pattern matching on a sequence.

In [32]:
def tails(l: list) -> Any:
    [x, *xs] = l
    return xs


tails(l)

[2, 3, 4, 5, 6, 7, 8, 9, 10]

As you've guessed you can expand this quite easily.

In [33]:
def fst3(l: list) -> list:
    [x, y, z, *xs] = l
    return [x, y, z]

In [34]:
fst3(l)

[1, 2, 3]

We can caputure richer patterns, for instance forking with a logical or `|`.

In [35]:
def crocs_snackies(arg: str) -> str:
    match arg:
        case "Blue Heron" | "Wallaby" | "Cote du boeuf":
            return "Delicious snacky"
        case "Tuna" | "Salmon" | "Dolphin":
            return "OK I will eat this but honestly this is Gator food"
        case _:
            return "Are you trying to poison me with a veggie?"

In [36]:
crocs_snackies("Wallaby")

'Delicious snacky'

In [37]:
crocs_snackies("Dolphin")

'OK I will eat this but honestly this is Gator food'

In [38]:
crocs_snackies("carrot")

'Are you trying to poison me with a veggie?'

For more on pattern matching, I do advise you to check out this [tutorial](https://peps.python.org/pep-0636/). 

I just want to show you one last one, and maybe you will realise that you can do much more with patterns in Python than just a match or case. I personally like to use it to write clear and concise functions.

In [39]:
def swap(t: tuple) -> tuple:
    x, y = t
    return y, x

In [40]:
t = ("a", "b")
swap(t)

('b', 'a')

#### **Iteration**
Iteration is the main subject of this notebook. I will show you what iteration means in detail, what a generator is in Python, and how you can create your own iterators and let Python do it for you. We even take a look at laziness; when using laziness, we can even have an infinite number of repetitions. Finally, we look at the functional programming tools in the itertools and functools modules that allow you to use iteration concepts as building blocks to build increasingly more powerful algorithms.

Iteration is the technique of marking out a block of statements within a computer program for a defined number of repetitions. 

![afbeelding.png](attachment:0d92bc4e-33b6-468b-938c-3c10b3c86226.png)

What we see is, in fact, a loop. We do something while a condition is true.

In [41]:
x = 1

while x < 6:
    print(x)
    x += 1

1
2
3
4
5


The for-loop with index that we know in Python follows the same pattern. 
It is syntactic sugar for the while loop; logically, they are the same.

In [42]:
for n in range(1, 6):
    print(n)

1
2
3
4
5


In Python, the for-each loop is more commonly used than the for-loop with index. The for-each loop traverses items in a collection. The main difference between the for-each loop and the while statement is that we can infinitely loop with a `while True` which we can't with the for-each loop. 

In [43]:
l = [1, "a", True, 4, 5]

for item in l:
    print(item)

1
a
True
4
5


My presumption is, that this all familiar to you. Like I presume that you have seen generators before.

In [44]:
t = tuple(n for n in range(1, 21) if n % 2 == 0)
t

(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

#### **Iteration topics**
The topics I want to discuss concerning iteration are:
1. How do you make your classes iterable?
2. What is Duck Typing?
3. What is the difference between an iterable and an iterator?
4. What is a generator, and how is that different from an iterator?
5. What is laziness?
6. The itertools module.

#### 1. Making your classes iterable
Consider the code below:

In [45]:
import re  # https://docs.python.org/3/library/re.html
import reprlib
from dataclasses import dataclass, field


@dataclass
class Sentence:
    """an iterable class"""

    text: str
    words: list[str] = field(default_factory=list)

    def __post_init__(self) -> None:
        RE_WORD = re.compile(r"\w+")
        self.words = RE_WORD.findall(self.text)

    def __getitem__(self, index: int) -> str:
        return self.words[index]

    def __len__(self) -> int:
        return len(self.words)

    def __repr__(self) -> str:
        return f"Sentence({reprlib.repr(self.text)})"

The class sentence is made up of two attributes: text, which is the whole sentence, and words, which are the words that make up the sentence.

We created a few methods. These are all special methods. By now, you should have realised the importance of special methods. The `__repr__` uses a method from the `reprlib` library to shorten the length of the sentences.

In [46]:
s = Sentence('"George is a rhino", Croc said')
s

Sentence('"George is a...o", Croc said')

In [47]:
s.words

['George', 'is', 'a', 'rhino', 'Croc', 'said']

In [48]:
s.text

'"George is a rhino", Croc said'

In [49]:
for word in s:
    print(word)

George
is
a
rhino
Croc
said


#### Duck Typing
How did I make this class iterable? By implementing the special methods `__getitem__` and `__len__`, I have implemented the [sequence interface](https://docs.python.org/3/library/collections.abc.html#module-collections.abc). The sequence interface is a subclass of the iterable interface. Python allows you to do this form of super simple interface implementation; it is called DuckTyping. By implementing a the required methods of sequence, I have implemented the interface. Duck Typing is one of Python's best features, yet also one of its worst, of course, depending on who you talk to. I am on the fence here. I do appreciate the flexibility Duck Typing offers, but I also appreciate the errors that come with it. See the standard Python example of Duck Typing: 

In [50]:
class Duck:
    def swim(self):
        print("Duck swimming")

    def fly(self):
        print("Duck flying")


class Whale:
    def swim(self):
        print("Whale swimming")


for animal in [Duck(), Whale()]:
    animal.swim()
    animal.fly()

Duck swimming
Duck flying
Whale swimming


AttributeError: 'Whale' object has no attribute 'fly'

Sometimes you see Duck Typing refered to as polymorphism, what they mean is that Duck Typing a form of subtyping polymorphism, it is not. It is a form of [row polymorphism](https://en.wikipedia.org/wiki/Row_polymorphism), subtyping in Python as in most OO-languages is done by inheritance.    

subtyping polymorphism would be:

In [51]:
class Animal:

    def swim(self):
        print(f"{self.__class__.__name__} swimming")


class Duck(Animal):

    def swim(self):
        super().swim()

    def fly(self):
        print("Duck flying")


class Whale(Animal):

    def swim(self):
        super().swim()


d = Duck()
w = Whale()
d.swim()

Duck swimming


In [52]:
issubclass(Duck, Animal)

True

I will expound on inheritance, typing, why subclassing is not the same as subtyping in Python, etc. in some detail in the notebook on object-oriented programming in Python. For now, we need to return to iteration. Given that the class Sentence implements the sequence interface, we can slice the sentence `s`.

In [53]:
s[4]

'Croc'

Or I can just list them all

In [54]:
tuple(s)

('George', 'is', 'a', 'rhino', 'Croc', 'said')

#### The iter function
Whenever Python needs to iterate over an object, it calls the `iter` function. This built-in function checks whether: 
1. The object implements `__iter__`; if so, it calls that to obtain an iterator.
2. If `__iter__` is not implemented, it checks for `__getitem__`. If this is the case, `iter()` returns an iterator that fetches items by index starting from 0. This is what we have done in the above code.

If both fail, then Python raises a TypeError 'C' object that is not iterable.

You can use iter() in combination with a callable (functions, methods, and classes are callable). Every next element in our sequence is now a call to the callable. In the example below, I use a function that keeps throwing until 3 comes. As three is the [sentinal value](https://en.wikipedia.org/wiki/Sentinel_value), this code will never show a three. You need to run the cell a few times to see the difference.

In [59]:
from random import randint


def die():
    return randint(1, 6)


die_iter = iter(die, 3)
[roll for roll in die_iter]

[2, 2, 4, 6, 6, 4, 5, 5, 5, 6, 1, 6, 4, 2]

#### **Iterator**
An iterator is a software design pattern (see the design patterns notebook), that abstracts the process of scanning through a sequence of elements one element at the time. All programming languages that implement iterators (not all do for instance [Clojure](https://clojure.org/) does not) have two questions in common:
 1. How to know if there is a next element? For instance, you need to have some sort of `hasNext` function (like Java's util.iterator interface) that returns a Boolean or catch an error when there is no more next. Python uses an error to signal that there are no more items.
 2. How to go to the next element in a sequence a `next` function that returns the next element in the sequence. 

Pythons `iter` function returns an iterator object if `__iter__` is defined in the iterable, here a callable iterator.


In [60]:
die_iter

<callable_iterator at 0x7fd117791570>

#### **Iterable**
Any object that is a sequence: a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Any sequence is a priori iterable.

The very specific definition of an iterable in Python is: Any object from which the built-in `iter` function can obtain an iterator. 

We can show that a Python iterator object actually has `next()` implemented. 

In [61]:
t = "ABC"
it = iter(t)  # create the iterator

while True:  # (1)
    try:
        print(next(it))  # (2)
    except StopIteration:
        del it  # (3)
        break

A
B
C


#### Code comment
1. We have, in principle, an endless loop.
2. The built-in function `next` part of the iterator pattern
3. `it` is a callable iterator; it is kind of a function; it will not dissapear from itself; there is no garbage collection; it will occupy memory space. We need to clean this up actively after use.

This code is, of course, much easier to write with a for-each loop. 

In [62]:
for char in t:
    print(char)

A
B
C


#### Python's iterator model

![image.png](attachment:53d0bbc4-5db8-4bf5-ab5d-e1c9d5c881a4.png)

There are two things you should notice in this diagram.
1. As we can see, Python hasn't implemented a `hasNext` function; instead, it raises an error. It has implemented an `__iter__()` that allows iterators to be used where an iterable is expected. 
2. The iterable builds the iterator. 

Let's investigate these points.

We can iterate over our sentence and will end up with an error.

In [71]:
it = iter(s)
next(it)

'George'

In [72]:
next(it)

'is'

In [73]:
next(it)

'a'

In [74]:
next(it)

'rhino'

In [75]:
next(it)

'Croc'

In [76]:
next(it)

'said'

In [77]:
next(it)

StopIteration: 

We can rewrite our Sentence class so that it is an iterable.

In [78]:
import re  # https://docs.python.org/3/library/re.html
import reprlib
from dataclasses import dataclass, field


@dataclass
class Sentence:
    text: str
    words: list[str] = field(default_factory=list)

    def __post_init__(self):
        RE_WORD = re.compile(r"\w+")
        self.words = RE_WORD.findall(self.text)

    def __repr__(self) -> str:
        return f"Sentence({reprlib.repr(self.text)})"

    def __iter__(self):
        return SentenceIterator(self.words)


@dataclass
class SentenceIterator:
    words: list[str] = field(default_factory=list)
    index: int = 0

    def __next__(self) -> str:
        try:
            word = self.words[self.index]
            self.index += 1
        except IndexError:
            raise StopIteration()
        return word

    def __iter__(
        self,
    ):  # returning self allows SentenceIterator to be a subclass of an iterable
        return self

In [79]:
s2 = Sentence("Croc is peckish as always")

In [80]:
it = iter(s2)
next(it)

'Croc'

In [81]:
next(it)

'is'

In [82]:
for word in s2:
    print(word)

Croc
is
peckish
as
always


#### Subscriptable
However, having implemented an iterator and an iterable, I can't slice. I have not implemented `__getitem__`. The difference between `__getitem__` and `__iter__` is that the latter loops over a collection and the former loops over an index, which it assumes starts with 0. I need an index to slice, to be subscriptable.

In [83]:
s2[3]

TypeError: 'Sentence' object is not subscriptable

Instead of throwing an error as Python does, we could create a `has_next` function.

In [89]:
import re  # https://docs.python.org/3/library/re.html
import reprlib
from dataclasses import dataclass, field


@dataclass
class Sentence:
    text: str
    words: list[str] = field(default_factory=list)

    def __post_init__(self):
        RE_WORD = re.compile(r"\w+")
        self.words = RE_WORD.findall(self.text)

    def __repr__(self) -> str:
        return f"Sentence({reprlib.repr(self.text)})"

    def __iter__(self):
        return SentenceIteratorTwo(self.words)


@dataclass
class SentenceIteratorTwo:
    words: list[str] = field(default_factory=list)
    index: int = 0

    def __next__(self) -> str:
        if self.has_next():
            word = self.words[self.index]
            self.index += 1
            return word
        else:
            return "There is no next item"

    def has_next(self) -> bool:
        return self.index < len(self.words)

    def __iter__(self):
        return self

In [90]:
s3 = Sentence("short sentence")
it = iter(s3)

In [91]:
next(it)

'short'

In [92]:
next(it)

'sentence'

In [93]:
next(it)

'There is no next item'

**don't confuse iterables and iterators!**
An iterable builds an iterator but never implements the iterator interface itself. If you did, you could only traverse that iterable once. You should consider this point carefully; it is quite important. You might not understand why your code is not working properly. 

The goal is, of course, to allow multiple traversals of the same object, so the same object can't be the iterator and the iterable.

#### Generator
A generator is a function that can control the iterating behaviour of a loop. Invented by Barbera Liskov, well known for her substitution principle,.

In Python, you can create a generator by simply using the keyword yield. You should now be thinking that we are Duck Typing; using yield is sufficient to implement the `Generator interface`. [Generator interface](https://peps.python.org/pep-0342/). Python generators are how Python implements coroutines. A coroutine components of a program that allow for suspending and resuming of the execution, of for instance a method. The use of coroutines is ubiquitous in Web APIs and other forms of distributed computing. 

In [95]:
@dataclass
class Sentence:
    text: str
    words: list[str] = field(default_factory=list)

    def __post_init__(self):
        RE_WORD = re.compile(r"\w+")
        self.words = RE_WORD.findall(self.text)

    def __repr__(self) -> str:
        return f"Sentence({reprlib.repr(self.text)})"

    def __iter__(self):
        for word in self.words:
            yield word

As you can see, the code just got a lot easier to read. We removed the whole iterator object.

The yield expression is used when defining a generator function and thus can only be used in the body of a function definition. Using a yield expression in a function’s body causes that function to be a generator function that returns a generator object. A generator object has a next function and a StopIteration error; indeed, it is an iterator.

It is important to remember that a yield statement is different from a return statement.
* A return statement leaves the current function call with an expression result list (you can have multiple return values in Python) or None as the return value.
* A yield statement yields elementwise the elements of a sequence; it doesn't return. 

The use of return in a generator function would immediately see the end of the loop, and control is handed over to the outer scope.

This code generates a stream of words.

In [96]:
s4 = Sentence(
    "Running around in circles naked while snorting loudly is every rhinos hobby"
)
it = iter(s4)
next(it)

'Running'

In [97]:
next(it)

'around'

Now you might think we have made our iterable an iterator, but we have not.

We used a generator to allow us to control the for loop in iter; there is no iterator object here. There is, however, a generator object.

In [99]:
s4.__iter__()

<generator object Sentence.__iter__ at 0x7fd114fd0c80>

To prove that our class is not an iterator, let's iterate over it multiple times.

In [100]:
it2 = iter(s4)
next(it2)

'Running'

In [101]:
next(it)

'in'

In [102]:
next(it2)

'around'

In [103]:
it3 = iter(s4)
next(it3)

'Running'

In [104]:
next(it)

'circles'

#### Evaluation strategy
Imagine you have a sentence that is billions of words long. This sentence is only called upon over a network connection and asks it for words. Do we want all the words in memory on our side of the connection? Of course not.

Python evaluates its arguments in a greedy manner; the moment it encounters an argument, it straight away evaluates it. This evaluation strategy is known as strict evaluation. There are some advantages to using strict evaluation, to name two:
1. a smaller call stack.
2. easier to debug.

There is a serious disadvantage to strict evaluation: you put a lot of things in memory when you might not need them at this moment in time. This is why other programming languages, for instance, Haskell, opt for normal order evaluation, only evaluating the argument when needed. This is the reason why I can generate an infinite list in Haskell `[1..]` where I can't in Python. 

But not only other programming languages use lazy evaluation, but also Python libraries such as PySpark and the regular expressions [`re` library](https://docs.python.org/3/library/re.html). With the `re` library, I can make our sentence class lazy.

In [108]:
import re
import reprlib
from dataclasses import dataclass


@dataclass
class LazySentence:
    text: str

    def __repr__(self) -> str:
        return f"Sentence({reprlib.repr(self.text)})"

    def __iter__(self):
        for match in re.compile(r"\w+").finditer(self.text):
            yield match.group()

In [109]:
s5 = LazySentence(
    "Running around in circles naked while snorting loudly is every rhinos hobby"
)
it = iter(s5)
next(it)

'Running'

In [110]:
next(it)

'around'

#### Generator expression
`['Running','around','in','circles','naked','while','snorting','loudly','is','every','rhinos','hobby]` is only evaluated one element at a time, meaning that only one element is put in memory and then gets evaluated. 

If we have an endless stream of words we are trying to iterate over, now we can. We don't have to put the entire stream in memory or buffer parts of the stream first before evaluating it. 

Instead of using yield, we could also simply return a generator object by using a generator expression, which creates the object.

We could rewrite LazySentence using a generator expression as:

In [114]:
import re  # https://docs.python.org/3/library/re.html
import reprlib
from dataclasses import dataclass


@dataclass
class LazySentence:
    text: str

    def __iter__(self):
        return (match.group() for match in re.compile(r"\w+").finditer(self.text))

In [115]:
s6 = LazySentence(
    "Running around in circles naked while snorting loudly is every rhinos hobby"
)
it = iter(s6)
next(it)

'Running'

In [116]:
next(it)

'around'

In [None]:
s6

#### **Generator vs Iterator**
Using the internal strength of the Python we have managed to bring down our code to one method and still LazySentence can be iterated over. I just want to recap quicly what we have learned. 

The big difference between a generator and an iterator is that you create an iterator by implementing a next function. 

    `def __next__(self)->str:
        if self.has_next():
            word = self.words[self.index]
            self.index += 1
            return word
        else:
            raise StopIteration()`
          
Where a generator is an iterator that is created by the compiler. You can tell the compiler in two different ways you want an iterator; you can use the yield keyword,


      `def __iter__(self):
         for word in self.words:
            yield word`
            
or you can use a generator expression.


      `def __iter__(self):
         return (match.group() for match in re.compile(r'\w+').finditer(self.text))`

In use a generator and an iterator are the same. You should always assume the interpreter knows better, you should have the compiler create the iterator (via a generator) instead of creating your own!


In [121]:
import itertools

tuple(itertools.permutations("ABCD", 2))

(('A', 'B'),
 ('A', 'C'),
 ('A', 'D'),
 ('B', 'A'),
 ('B', 'C'),
 ('B', 'D'),
 ('C', 'A'),
 ('C', 'B'),
 ('C', 'D'),
 ('D', 'A'),
 ('D', 'B'),
 ('D', 'C'))

#### Parametric Polymorphism
We have seen an example of inclusion polymorphism via subclassing. In a language like Haskell or SML, you have a concept called parametric polymorphism, which refers to code that is written without knowledge of the actual type of the arguments; the code is parametric in the type of the parameter. We have already seen an example of this in Python with amongst others, the `swap` function, but let me start with some code in Haskell and then show you the advantage in Python by redefining the standard `range` class.

In Haskell, we can write a range function as such:
```
range :: (Num a, Enum a) => a -> a -> a -> [a]
range start stop step = [start, step + start..stop]
```
This function is parametric in type; `a` is a generic type. `a` is bound by two type classes: 'Enum', which defines operations on sequentially ordered types, and 'Num', which defines operations on basic number types. It does not matter what you feed this function as long as these constraints are met. The function is polymorphic in the parameters it takes.

Now to create a similar polymorphic range function in Python using a generator.

In [5]:
from dataclasses import dataclass
from decimal import Decimal
from fractions import Fraction
from numbers import Number
from typing import Any


@dataclass(frozen=True)
class PolyRange:
    begin: Number
    step: Number
    end: Number

    def __iter__(self) -> Any:
        result_type = type(self.begin + self.step)  # 1
        result = result_type(self.begin)  # 1
        forever = self.end is None
        index = 0
        while forever or result < self.end:  # 2
            yield result
            index += 1
            result = self.begin + self.step * index  # 3

#### Code comment 
1. the result type of our function, which is the broadest type, of `begin` and `step`
2. the list wil either end or not if forever is true
3. we increase or result with the next step

In [14]:
poly_range = PolyRange(0, 1, 5)
list(poly_range)

[0, 1, 2, 3, 4]

In [15]:
def from_ten_to_twenty():
    poly_range = PolyRange(10, 1, 21)
    return list(poly_range)

In [16]:
from_ten_to_twenty()

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Of course, Python has its own range function too, but where in Haskell I can use any numeric type, in Python I can't. I will get a TypeError.

In [17]:
tuple(range(Decimal("0"), Decimal(".1"), Decimal(".3")))

TypeError: 'decimal.Decimal' object cannot be interpreted as an integer

In [18]:
poly_range = PolyRange(0, Decimal(".1"), 0.3)
tuple(poly_range)

(Decimal('0'), Decimal('0.1'), Decimal('0.2'))

Our `PolyRange` class is much more generic than Pythons range. We can use a decimal or a fraction, or any numeric type 

In [19]:
poly_range = PolyRange(0, Fraction(1, 3), 2)
list(poly_range)

[Fraction(0, 1),
 Fraction(1, 3),
 Fraction(2, 3),
 Fraction(1, 1),
 Fraction(4, 3),
 Fraction(5, 3)]

#### Itertools and functools
The itertools & functools modules are full of extremely handy functions on iterators. Functions that you can combine virtually endlessly to evermore complex algorithms. For instance, you can easily chain iterables.

In [33]:
import functools
import itertools

list(itertools.chain("George ", "is ", "a ", "rhino"))

['G',
 'e',
 'o',
 'r',
 'g',
 'e',
 ' ',
 'i',
 's',
 ' ',
 'a',
 ' ',
 'r',
 'h',
 'i',
 'n',
 'o']

This might not seem so handy but combined with `accumulate` it becomes more powerful I just concatenated four strings.

In [22]:
import operator

list(
    reversed(
        list(
            itertools.accumulate(
                list(itertools.chain("George ", "is ", "a ", "rhino")), operator.add
            )
        )
    )
)[0]

'George is a rhino'

These are lots of brackets :-). I could just use `functools.reduce` and be done with it so much easier

In [23]:
import operator
from functools import reduce

str(reduce(operator.add, itertools.chain("George ", "is ", "a ", "rhino"), ""))

'George is a rhino'

In [24]:
list(itertools.product([1, 2, 3, 4, 5], "xy"))

[(1, 'x'),
 (1, 'y'),
 (2, 'x'),
 (2, 'y'),
 (3, 'x'),
 (3, 'y'),
 (4, 'x'),
 (4, 'y'),
 (5, 'x'),
 (5, 'y')]

If we want to ask all elements of a list, then there is another option than iteration. We can simply repeat the question over and over. See the code below.

In [34]:
def factorial(n):
    """function that returns the factorial of a number"""
    return n if n == 1 else n * factorial(n - 1)


factorial(5)

120

#### Recursion

Repeating the question over and over is called recursion. Recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. Recursion is the twin of induction, with induction being a mathematical technique to construct proofs. A statement P ( n ) P(n) is true for every natural number n, that is, that the infinitely many cases $P ( 0 ) , P ( 1 ) , P ( 2 ) , P ( 3 ) , … $  all hold.

A proof by induction consists of two cases. The first, the base case, proves the statement for $n = 0$  without assuming any knowledge of other cases. The second case, the induction step, proves that if the statement holds for any given case $n = k$ then it must also hold for the next case $n = k + 1$ I started this notebook with the claim that you only need three basic control structures to compute anything that is computable. This was proven by Bohm and Jacopini; for their proof, they used induction.

In computer programming, proof by [structural induction](https://en.wikipedia.org/wiki/Structural_induction) is often used. The same ideas we find in recursion: you define a base case or more base cases and make a recursive call until you hit the base case, after which you compute the result on the call stack.

The base case in the `factorial` function is simply 1. Once we have reached the base case, we end the recursion. We can also have more base cases; the famous `fibonacci` function has two base cases, 0 and 1.

In [40]:
@functools.lru_cache
def fibonacci(n: int) -> int:
    '''optimised Fibonacci function using recursion, returns the n-th Fibonacci number'''
    return n if n < 2 else fibonacci(n - 1) + fibonacci(n - 2)

In [39]:
fibonacci(567)

14012222329320380360477957577902026327196843291708621628988463816224289805272203709520415969453367214046785150335122978


What happens with factorial is that it puts the intermediate function calls on the [call stack](https://en.wikipedia.org/wiki/Call_stack.)
It looks something like this:
 * stack `[factorial 5]` <- first call
 * stack `[factorial(5) * (factorial(4)*factorial(3)]`
 * stack `[factorial(5) * (factorial(4)*(factorial(3)*factorial(2))]`
 * stack `[factorial(5) * (factorial(4)*(factorial(3)*(factorial(2)*factorial(1)))]`
 * stack `[factorial(5) * (factorial(4)*(factorial(3)*(factorial(2)*1)))]`
 * stack `[factorial(5) * (factorial(4)*(factorial(3)*(2*1)))]`
 * stack `[factorial(5) * (factorial(4)*(3*(2*1)))]`
 * stack `[factorial(5)*(4*(3*(2*1)))]`
 * stack `[5*4*3*2*1]`
 * ...
 * stack `[120]`

As you can see, the stack gets built up with all the function calls, then with the intermediate results, and finally the result.

Python has a recursion depth of 3000, meaning your call stack cannot grow beyond that. A recursion that goes deeper than 3000 means your programme will crash. You could technically set the recursion depth to be deeper with `sys.setrecursionlimit(limit)` The depth of that limit is platform-dependent, meaning dependent on CPU and/or OS. 

In [41]:
import sys

sys.getrecursionlimit()

3000

In [42]:
def product(ns: list[int], acc) -> int:
    '''product function using recursion'''
    if len(ns) == 0:
        return acc
    else:
        acc = ns.pop(0) * acc
        return product(ns, acc)

In [43]:
product([1, 2, 3, 4, 5], 1)

120

In [45]:
longlist = [n for n in range(1, 10_000)]
product(longlist, 1)

RecursionError: maximum recursion depth exceeded while calling a Python object

The itertools module has a product function, but this doesn't create a product but a Cartesian product. If you want to create a product function, use reduce from functools (or a generator). This function is polymorphic too, as long as we use a type for which the multiplication operation is defined.

In [49]:
def product(lst:list[Number]) -> Number: 
    return functools.reduce(operator.mul, lst, 1)

In [51]:
product([*range(1,6)])

120

In [53]:
product([Decimal("3.0"), Decimal("4.0"), Decimal("2.0")])

Decimal('24.000')

#### Runtime Context
What is runtime context? Runtime context is a block of code that, at runtime, is treated as a whole where certain procedures are executed in a specific order or with a specific meaning. A runtime context captures a certain pattern that is not applicable outside of the context. 

The most famous of this type of context is the `try... except... finally...` block of code. Which is used to manage resources such as files, network connections, database connections, and software locks. Managing resources requires an explicit set-up phase and an explicit teardown phase. Unfortunately, the latter is often forgotten, and only the error is caught. The actual resource is kept in use.


In [40]:
# set-up fase
file = open("hello.txt", "w")

try:
    file.write("George is a rhino")
except Exception as e:
    print(f"unfortunately {e} has occured")

#### Expecting exceptions
You will see this kind of code quite often. For instance, when you open a database connection, acquire a lock or network service. With these operations, the chance of an error occurring increases exponentially. We want to catch the error and continue running our program instead of crashing.

![image.png](attachment:215f26d0-454d-4a1c-af0c-a3b9e7809c3d.png)

Running this code with an external resource is problematic; catching the error does not release the resource. Having the resource not released can lead to very specific, hard-to-debug problems. You can lock someone out of a thread they need by not releasing the lock, you can slow down database and network connections by keeping unused connections open (memory leakage). You can prevent file access by keeping the file open. If there is a policy that only one process can write to a file at a time, which there should be, you can block the operation of an entire program.

For file, network, database, and concurrent operations, you need to use the keyword finally and explicitly tear down the resource you used.
![afbeelding.png](attachment:a17efca4-e739-40d0-b844-9c6618b6ad42.png)

In [54]:
# set-up fase
file = open("hello.txt", "w")

try:
    file.write("George is a rhino")
except Exception as e:  # you might be specific about the exception
    print(f"unfortunately {e} has occured")
finally:  # teardown phase
    file.close()

Reading is a different CRUD operation than writing and should have it's own context.

In [55]:
# set-up fase
file = open("hello.txt", "r")

try:
    print(file.readline())
except Exception as e:  # you might be specific about the exception
    print(f"unfortunately {e} has occured")
finally:  # teardown phase
    file.close()

George is a rhino


#### **with**
Python introduced the `with` statement to factor out standard use cases of `try... except... finally`, and replace them with `with expression as target:` 

The with statement creates a context manager object by evaluating the expression after the with statement. The object that is returned must implement the context manager interface, which consists of two special methods. 
1. `__enter__` You enter the runtime context; this is your explicit set-up phase.
2. '__exit__' is called when the runtime context is exited. Python ensures the resource is closed properly.

In [56]:
with open("hello.txt", "w") as file:
    file.writelines(
        ["Purple is a great colour for a rhino", ", ", " George is a rhino"]
    )

We can verify that the file was opened and closed by the context manager.

In [58]:
file.readline()

ValueError: I/O operation on closed file.

We need to reopen the file to read the content.

In [59]:
with open("hello.txt", "r") as file:
    print(file.readlines())

['Purple is a great colour for a rhino,  George is a rhino']


You can use multiple expressions after a with statement `with A() as a, B() as b:`.

In [60]:
with open("hello.txt", "w") as f, open("hello2.txt", "w") as g:
    f.writelines(["Croc", " is", " Peckish!"])
    g.write("snackies!!!")

In [61]:
with open("hello.txt", "r") as f, open("hello2.txt", "r") as g:
    print(f.readlines()[0])
    print(g.read())

Croc is Peckish!
snackies!!!


This idea of using a context manager to define a runtime context is immensely powerful.

You can do much more in such a context than grouping a set of operations together. I can create abnormal behaviour in such a context and, once leaving that context, return to the default behaviour. See the code below.

In [65]:
import sys


class ReversePrint:
    '''context manager class that reverses a print'''

    def __enter__(self):  # 1
        self.original_write = sys.stdout.write  # 2
        sys.stdout.write = self.reverse_write  # 3
        return "and George"  # 4

    def reverse_write(self, text):  # 5
        self.original_write(text[::-1])  

    def __exit__(self, exception_type=None, exception_value=None, traceback=None):  # 6
        sys.stdout.write = self.original_write  # 7
        if exception_type is OSError:  # 8
            print(f"an OSError has occured with as value {exception_value}")
            return True  # 9

In [66]:
with ReversePrint() as wtf:
    print("Ente, Rhino, Croc")
    print(wtf)

corC ,onihR ,etnE
egroeG dna


In [67]:
wtf

'and George'

#### Code comment
This code is a bit hard to understand. We have taken a regular print statement and reversed the print order, but only within the context of the code block, starting with the `with` keyword. Let's look at it point-for-point:
1. Python invokes the special method `__enter__`
2. We capture the actual sys.stdout.write (which is a method, and as argument has text) and save it so we can restore it later.
3. We [monkey patch](https://en.wikipedia.org/wiki/Monkey_patch) sys.stdout. write with our own method. https://stackoverflow.com/questions/5626193/what-is-monkey-patching 
4. We have named a target variable `wtf` with the as keyword; returning this gives `wtf` a value, namely 'and George'.
5. A simple function that reverses the value it is given.
6. Python calls the special method `__exit__` with None, None, None. It knows to do this because we have executed `reverse_write`.
7. We set the original value of sys.stdout.write back
8. If there was an error to propagate, it gets handled now.
9. returns True, so the interpreter knows the error was caught.
 
When the context we have just created runs in runtime, every call to print('text') will print that text reversed. If there are no more calls to print, control is given back to the regular runtime. 

The creation of a context where I can determine the suite of actions that execute is indeed an enormously powerful idea.


#### **contextlib**
Python has a library with all kinds of useful utility functions for [context managers](https://docs.python.org/3/library/contextlib.html).

One of them is the decorator `@contextmanager`. We can use the decorator to rewrite ReversePrint this time not as an expensive class but as a cheap function (why is a function cheap and a class not?).

In [68]:
import sys
from contextlib import contextmanager


@contextmanager
def reverse_printer():
    original_write = sys.stdout.write

    def reverse_write(text):
        original_write(text[::-1])

    sys.stdout.write = reverse_write
    yield "and George"  #
    sys.stdout.write = original_write

In [69]:
with reverse_printer() as wtf:
    print("Ente, Rhino, Croc")
    print(wtf)

corC ,onihR ,etnE
egroeG dna


Using a generator has a specific effect when used with the `@contextmanager`:
* Everything before the yield will be executed at the beginning of the block when the interpreter calls `__enter__`.
* Everything after the yield will run when `__exit__` is called. 

However, there is a flaw in this code. I don't expect you to see it, but if there is an exception in this code, the Python interpreter will propagate that to the yield expression. We should fix this.

In [72]:
import sys
from contextlib import contextmanager


@contextmanager
def reverse_printer():
    original_write = sys.stdout.write

    def reverse_write(text):
        original_write(text[::-1])

    sys.stdout.write = reverse_write
    msg = ""
    try:
        yield "and George"
    except Exception as e:
        msg = f"unfortunately {e} has occured"
    finally:
        sys.stdout.write = original_write
        if msg:
            print(msg)

In [73]:
with reverse_printer() as wtf:
    print("Ente, Rhino, Croc")
    print(wtf)

corC ,onihR ,etnE
egroeG dna


In [74]:
with reverse_printer() as wtf:
    print(zz)
    print(wtf)

unfortunately name 'zz' is not defined has occured


#### Context Manager use case
You could use the `@contextmanager` to create a context where you deviate from existing behaviour. Say you want a terribly exact calculation, where usually a standard will suffice. For exact calculations, you can use the [decimal library]( https://docs.python.org/3/library/decimal.html) with `localcontext` as a context manager.

In [75]:
from decimal import Decimal, localcontext

with localcontext() as precise:
    precise.prec = 42
    print(Decimal("1") / Decimal("3"))

0.333333333333333333333333333333333333333333


Or maybe you need that not so precise in a context as Decimal's standard precision is already 28 digits after the point.

In [76]:
from decimal import Decimal, localcontext

with localcontext() as approx:
    approx.prec = 3
    print(Decimal("1") / Decimal("3"))

0.333


## After word
---
This was a very long notebook that introduced you to the tools Python offers to bring order to your code. It is long because I took the time to introduce complicated concepts such as evaluation strategies, parametric polymorphism, and structural pattern matching, so you are better equipped to understand the Python constructs I have discussed. These concepts are not easy; you might need to read this notebook several times and follow the links interspersed throughout. 