# Lecture 4: comprehensions, scope, custom types, program design

The first half of this lecture covers a few assorted topics about the language:

- Comprehensions are a cool and useful way to create lists, sets and dictionaries
- Scope is about when variables are discarded from memory, and when they are accessible
- Custom types is about using the `class` construct to make new types; the building block of "Object Oriented" design.

In the final half I will give some pointers about how to design and write programs more systematically and reliably.

From the next lecture onward, I will introduce a bunch of *modules*: extensions to the language that you can use to easily do all kinds of scientific computing, and statistical and machine learning tasks.

## Great big overview of types and their properties

| type       | iterable | mutable |
| ---------- |:--------:|:-------:|
| `int`      |          |
| `float`    |          |
| `bool`     |          |
| `NoneType` |          |
| `str`      |    +     |         |
| `range`    |    +     |         |
| `tuple`    |    +     |         |
| `list`     |    +     |    +    |
| `set`      |    +     |    +    |
| `frozenset`|    +     |         |
| `dict`     |    +     |    +    |
| `generator`|    +     |         |


- Iterable types can be used in `for` loops and many type constructors like `tuple`, `list`, `set`.
- Mutable types can be changed, which can be tricky but which can also be very handy

- Mutable types cannot be stored in sets or frozensets, or as keys in a dictionary.


## Python awesomeness: Comprehensions and generators

Python offers a particularly convenient and powerful way of constructing values of its major data types, which may be the main reason for Python's popularity: *comprehensions*.

Comprehensions can be used to create lists, sets and dictionaries, but strangely, *not tuples*. If you try to create a tuple using a comprehension you will end up with something else instead; more about that below.

Comprehensions look like this:


| comprehension                         | result      |
| ------------------------------------- | ----------- |
| `[<expr>       <for and if clauses>]` | `list`      |
| `{<expr>       <for and if clauses>}` | `set`       |
| `{<expr:expr>  <for and if clauses>}` | `dict`      |
| `(<expr>       <for and if clauses>)` | `generator` |

So it looks very much like the normal way of creating these values, except that a comprehension includes at least one `for` clause and potentially some `if` clauses as well. To see how this works, let's try out list comprehensions first:

**List comprehensions:**

In [1]:
[ "hello" for i in range(3) ]

['hello', 'hello', 'hello']

The expression is evaluated for each value of `i` specified in the `for`-clause. In fact, the expression may also involve the variables of the `for` clause:

In [2]:
# Exercise: what list is created here?

[ i*i for i in range(8) ]

[0, 1, 4, 9, 16, 25, 36, 49]

And finally, `if` clauses can be used to filter the resulting list, excluding some results. They can involve variables defined in `for` specifiers that are more to the left, so this works:

In [3]:
# Exercise: what will this do? Why?

[ i for i in range(20) if i%10<5 ]

[0, 1, 2, 3, 4, 10, 11, 12, 13, 14]

In [4]:
# Exercise: what will this do? Why?

[ i if i%10<5 for i in range(20) ]

SyntaxError: invalid syntax (<ipython-input-4-97d713137223>, line 3)

Note: comprehensions are a special syntax. **`for` and `if` used in a comprehension are not the same as `for` and `if` used normally.** For example `break` and `continue` cannot be used here.

**Comprehensions of sets and dictionaries**

These work exactly like list comprehensions, here are some examples:

In [5]:
# Example: find the set of numbers below 100 that can *not* be
#          represented as 3x+5y for positive integers x,y

everything = set(range(100))

# build using set comprehension
canMake = { 3*x+5*y for x in range(100//3) for y in range(100//5) }

cannotMake = everything - canMake # set minus!

cannotMake

{1, 2, 4, 7}

In [6]:
# Example: make a dictionary with the square roots of the first 10 squares.

roots = { i*i : i for i in range(1,11) }

print("root of 36 is", roots[36])
roots

root of 36 is 6


{1: 1, 4: 2, 9: 3, 16: 4, 25: 5, 36: 6, 49: 7, 64: 8, 81: 9, 100: 10}

In [7]:
"steven"

'steven'

In [8]:
[ "steven"[i:] for i in range(6) ]

['steven', 'teven', 'even', 'ven', 'en', 'n']

**Comprehension that makes a generator**

This is the odd one out: it looks as if it would create a tuple, but instead it makes an iterable value of a type called `generator`. It is similar to a range in that you can use it in a `for` loop and anything else that can work with an iterator.

The advantage of a generator is that it does not create the entire sequence of values immediately. Rather they are produced one by one, and so they don't all have to take up memory space simultaneously, which can be handy.

The Python designers thought that this would be useful to have and kicked out tuple comprehensions in favour of generators.

You can combine it with a tuple type constructor if you want to build the actual tuple.

In [9]:
g = ( i*i for i in range(10) ) # make some squares

g

<generator object <genexpr> at 0x10d783c50>

In [10]:
# Create the corresponding tuple
tuple(g)

(0, 1, 4, 9, 16, 25, 36, 49, 64, 81)

In [11]:
# The generator generates all its values only once!
tuple(g)

()

In [12]:
# You can also use it in a for loop:

# make a tuple with a bunch of numbers
a = [213,12,313,4]

# use a generator comprehension in a for-loop
# to iterate over the squares of the numbers in a
for sq in ( n*n for n in a ):
    print("Here is a square:", sq)

Here is a square: 45369
Here is a square: 144
Here is a square: 97969
Here is a square: 16


In the code above, notice that the list `a` is stored in memory, but even if that list is very long, the generator comprehension does not require a lot of additional memory.

## Scope and lifetime

In all programming languages, all variables have a *scope* and a *lifetime*. These concepts are related but not the same:

- The *scope* of a variable is the part of the program from which it can be accessed.
- The *lifetime* of a variable is the time interval during which the variable is defined.

One might guess that a variable would remain in existence indefinitely, and that it would be accessible from anywhere. That is in fact the case for global variables.

In [13]:
test1 = "I exist" # global variable.
test1

'I exist'

The variable `test1` is global: its value will exist until the Python kernel is killed or reset, or until the variable is explicitly reassigned or deleted using `del`. Until that time, its value can be accessed from anywhere in the program.

However, variables that are defined *inside a function definition* can only be accessed from inside that function definition, and they are destroyed once the function execution is done:

In [14]:
def f():
    test2 = "hello, world!"
    print(test2)

f()
test2

hello, world!


NameError: name 'test2' is not defined

So, the variable `test2` was *local* to the function `f`. Its lifetime and scope are restricted to the time and place where it is defined, to the end of the function.

Keeping variables local helps with the readability of your program! The fewer variables you have to keep in your head at the same time, the better.

If you have global and local variables of the same name, then the local variable goes first:

In [15]:
test3 = "global"

def f():
    test3 = "local"
    print("Value inside the function:", test3)

f()
print("Value outside the function:", test3)

Value inside the function: local
Value outside the function: global


If you want to assign a value to a global variable from within a function, you can do so using the keyword `global`:

In [16]:
def f():
    global a
    a = 3
    
f()
a

3

## Creating your own types with `class`.

Python allows you to create your own type. This is a key component in so-called "Object Oriented" program design, which is a programming philosophy that has dominated software engineering for the last 25 years or so.

Object oriented design can be useful, but it can also be confusing. It is also a very large topic that cannot be done justice in an initial programming course.

Most of Python's extension libraries however export new types that have been created using the `class` keyword. So it will be useful to have a brief look how these new types are created, and more importantly, how they are used.

In [19]:
# Define a new custom made type, called "test".

class test():
    None

We can now use the name of the class, `Test`, as a function to create values of the new type. (Incidentally, this works for the built in types too, that's why you can construct strings with `str`, tuples with `tuple`, and so on and so forth.)

In [20]:
t1 = test() # create object t1 of type test
t2 = test() # create object t2 of type test

print("Equal according to ==?", t1 == t2)
print("Equal according to is?", t1 is t2)
print("Type:", type(t1))

Equal according to ==? False
Equal according to is? False
Type: <class '__main__.test'>


You can add data fields to objects using `.` as follows:

```<object>.<field_name> = <value>```

So an object is a little bit like a dictionary: it can remember names with associated values. The difference is that the `field_name` is always spelled out literally in your program, while a key in a dictionary may come from user input.

Examples:

In [21]:
t1.aardvark = 13
t2.aardvark = "boo"

print(t1.aardvark)
print(t2.aardvark)

print(t1)

13
boo
<__main__.test object at 0x10d897128>


So, objects allow you to bundle some information into a nice little package. The power of classes is that you can also define *methods*: those are functions that are supposed to do something with objects of that class.

Methods are defined in the class definition. Let's change the class specification to add a method. The first input of a method is always the object for which it's invoked. It's customary to call that input `self`.

If you have an object, you can call its methods using the same syntax with `.` that's also used to access data fields.

In [22]:
# redefine the class 'test'; equip it with a method.

class test:
    
    def message(self, arg):
        print("The method 'message' got called with two inputs:")
        print("- self (", self, "), the object on which the method was called", sep="")
        print("- arg  (", arg, "), the second argument", sep="")
        
t1 = test()         # create object t1 of type test
t1.message("zebra") # call method on t1

The method 'message' got called with two inputs:
- self (<__main__.test object at 0x10d897470>), the object on which the method was called
- arg  (zebra), the second argument


Often, methods will *do something with the data fields of the object*, which they can do using the `self` input, like so:

In [23]:
class test:
    
    def setname(self, name):
        self.name = name # copy the name into a data field of the object
    
    def greet(self):
        print("Why hello there, ", self.name, "!", sep="")

        
t1 = test()           # create object t1 of type test
t1.setname("Steven")  # call the method "setname" on t1
t1.greet()            # call the method "greet" on t1

Why hello there, Steven!


Sometimes, it's convenient to initialise the data fields of the object the moment that the object is created, instead of later. To do so, you can add a special method `__init__` (four underscores), which is known as the *constructor*. The *constructor* always gets called when the object is created, and is given any inputs that you supply when you construct it.

The code below does the same as above, but instead of a `setname` method, the name now gets set in a constructor method.

In [24]:
class test:
    
    def __init__(self, name, age):
        self.name = name # copy the name into a data field of the object
        self.age = age   # as well as the age
    
    def greet(self):
        print("Why hello there, ", self.name, "! My, ", self.age,"? That's ancient!", sep="")

        
t1 = test("Steven", 123) # create object t1, with two extra inputs to the constructor
t1.greet()               # call the method "greet" on t1.

Why hello there, Steven! My, 123? That's ancient!


## Program design

**Before any code is typed in**
1. How would you solve the problem by hand? First give a high level description. Type these as comments of your program.
- Refine your description. Is every step precise?
- What inputs should be allowed? Are there boundary cases?
- What kinds of data are you dealing with when you solve it by hand? Write down comments detailing these "kinds".
- What do you *do* with the data when you solve it by hand? What operations do you perform?
- Choose appropriate ways to represent data (low level)
- Write the specifications of the functions that you will need (high and low level)
  - Documentation
  - Naming of functions
  - Spaghetti code vs Separation of concerns
  - Bottom up approach: for each kind of data, specify what it can do
  - Top down approach: start with a description of the complete problem. Break it down in chunks. Write a function for     each chunk.
  
**Writing the actual program**  
1. Write tests to check appropriate behaviour of each function, also for boundary cases
- Implement the functions, use assertions

**Reality check**

Do programmers actually work this way? To a large extent yes, they do. However:

1. There are fads: test driven development, extreme programming, object oriented design, agile software development. These contain useful ideas but are often exaggerated and followed religiously.
- Programmers develop personal styles and that's okay, as long as it's clear for other people.
- Documentation is important but if you do it too early, it can be a lot of wasted effort, so there is an art to how and when you document.
- Refactoring (rewriting code to make it better) is important. But again, if you do it too early, it can be a lot of wasted effort. Do it when making changes starts to be hindered by the quirks of code you wrote earlier.
- Good idea: write incrementally: keep working towards a program that is fully functional and can be tested.


# Demo: Conway's Game of Life

John Conway invented this game as a model of population dynamics, in 1970. Idea: represent the world as an infinite grid. Every grid cell can be "alive" (contain a creature) or "dead". Once an initial configuration has been determined, the grid proceeds to evolve according to fixed rules:

* Every cell that is currently alive will survive into the next generation if exactly two or three of its neighbours are alive. If it has fewer live neighbours, it dies of underpopulation, if it has more, it dies of overcrowding.

* Every cell that is currently dead will come alive if exactly three of its neighbours are alive.

The game is interesting because with these simple rules, incredibly complex behaviours can be observed. To practice designing programs, we will implement Conway's game of life in Python.

First, let's try and work out how the game evolves by hand. Consider this starting configuration.

$$\begin{array}{cccc}
-&O&-&-\\
-&-&O&-\\
O&O&O&-\\
-&-&-&-\\
\end{array}$$

Since the rules depend on the number of neighbouring cells that are alive, let's count those:

$$\begin{array}{cccc}
1&1&2&1\\
3&5&3&2\\
1&3&2&2\\
2&3&2&1\\
\end{array}$$

Cells that were not alive but that have three neighbours will come alive (+); live cells that have fewer than 2 or more than 3 neighbours die out (X).

$$\begin{array}{cccc}
-&X&-&-\\
+&-&O&-\\
X&O&O&-\\
-&+&-&-\\
\end{array}$$

New situation:

$$\begin{array}{cccc}
-&-&-&-\\
O&-&O&-\\
-&O&O&-\\
-&O&-&-\\
\end{array}$$


So, some observations:
- The most important step was counting the neighbours for all cells that were currently alive.
- Once we've counted the neighbours, it's not hard to determine the new configuration.

Let's turn this into a tentative algorithm:

In [None]:
# Game of Life

# in : a bunch of cells that are alive
# out: those cells that will be alive in the next generation.

# method:
# - for all live cells:
#   - count how many live neighbours it has
#   - if it has less than two or more than three live neighbours, kill it
#   - otherwise, leave it alive
# - for cells that are currently not alive:
#   - count how many live neighbours it has
#   - if it has exactly three live neighbours, then bring it alive

First question:
- This plan wants us to do something "for all cells that are not alive". There are infinitely many of those! How could we get around that?
- Are we going to place limitations on the size of the grid? Simply do work only for a limited region of the infinite grid?
- Can we get around this?

In [None]:
# Game of Life (mark 2)
#
# "cell" data definition:
# (x,y) integer tuple
#
# "configuration" data definition:
# a set of cells
#
# "step" function:
# in : a configuration
# out: the configuration of the next generation
# method:
# - for all live cells:
#   - count how many live neighbours it has
#   - if it has less than two or more than three live neighbours, kill it
#   - otherwise, leave it alive
# - for cells that are currently not alive but that have at least one
#     live neighbour:
#   - count how many live neighbours it has
#   - if it has exactly three live neighbours, then bring it alive

Some aspects to the algorithm are still a bit vague: how do we know what the neighbours of the live cells are? How do we count the number of neighbours of a cell?

In order to avoid spaghetti code, we will separate those subproblems out into separate functions as much as possible:

In [None]:
# "count_neighbours" function:
# in:   a cell, and a configuration
# out:  the number of neighbours that are in the configuration.
# note: the cell itself does NOT count!


# "neighbours" function:
# in:  a configuration
# out: a set of cells that are not in the configuration, but that are neighbours


# "step" function:
# in : a configuration
# out: the configuration of the next generation
# - for cells in the configuration (live cells):
#   - count how many live neighbours it has using count_neighbours
#   - if it has less than two or more than three, kill it
#   - otherwise leave it alive
# - for cells in the set of configuration (obtained using neighbours):
#   - count how many live neighbours it has using count_neighbours
#   - if it has exactly three, then bring it alive


This is a pretty detailed description of a program. Before we start writing those functions, we will implement a test. In fact, depending on your testing philosophy and patience, we should test all three functions. But we will only test the main function "step", by checking if it works correctly on the input grid from the start:

In [32]:
# "test_step" test function:
# verify that the next generation of the pattern described above
# is computed correctly by the "step" function.

GLIDER = { (1,0), (2,1), (2,2), (1,2), (0,2) }

def test_step():
    correct_output = { (0,1), (2,1), (1,2), (2,2), (1,3) }
    actual_output = step(GLIDER)
    if correct_output == actual_output:
        print("Test succeeded! Hurray!")
    else:
        print("Test failed: should be", correct_output, "got:", actual_output)


We were lazy about testing the other functions. However we will write tests for the other functions if we run into trouble later: that way we will get a more robust program *and* find out more about what goes wrong!

Okay. Now it's time to put everything together:

In [38]:
# "count_neighbours" function:
# in:   a cell, and a configuration
# out:  the number of neighbours that are in the configuration.
# note: the cell itself does NOT count!
def count_neighbours(cells, config):
    neighbours = {(cells[0]+x, cells[1]+y) for x in (-1,0,1) for y in (-1,0,1) if not(x==0 and y==0)}
    alive = config * neighbours
    return len(alive)

# "neighbours" function:
# in:  a configuration
# out: a set of cells that are not in the configuration, but that are neighbours
def neighbours(config):
    neighbours = {(cell[0]+x, cell[1]+y) for cell in config for x in (-1,0,1) for y in (-1,0,1)}
    dead_neighbours = neighbours - config
    return dead_neighbours

# "step" function:
def step(config):
    
    
    stay_alive = set()
    for live_cell in stay_alive:
        if count_neighbours(live_cell, config) in (2,3):
            stay_alive.add(live_cell)
    
    come_alive = set()       
    for dead_cell in neighbours(config):
        if count_neighbours(dead_cell, config) == 3:
            come_alive.add(dead_cell)
    
    return stay_alive | come_alive
# in : a configuration
# out: the configuration of the next generation
# - for cells in the configuration (live cells):
#   - count how many live neighbours it has using count_neighbours
#   - if it has less than two or more than three, kill it
#   - otherwise leave it alive
# - for cells in the set of configuration (obtained using neighbours):
#   - count how many live neighbours it has using count_neighbours
#   - if it has exactly three, then bring it alive

test_step()

TypeError: unsupported operand type(s) for *: 'set' and 'set'

Let's visualise the results. Using matplotlib. We're skipping how this works. More about matplotlib next week.

In [None]:
# Visualisation code. Feel free to not look at this.

# Import the libraries required for animation
import numpy as np
import matplotlib.pyplot as plt 
import matplotlib.animation as animation

# Define the size of the viewport and which cells count as neighbours
DIM = 100,100
ALIVE = None

# Render the live cells into a DIM-sized array of colour values
# that can be displayed by matshow.
def render(alive):
    m = np.zeros(DIM)
    for x,y in alive:
        cell_x = x + DIM[0]//2
        cell_y = y + DIM[1]//2
        if 0 <= cell_x < DIM[0] and 0 <= cell_y < DIM[1]:
            m[cell_y, cell_x] = 255
    return m

# Every frame, the ALIVE set is updated according to the rules
# of Game of Life. It is then rendered in the matrix that is being
# displayed.
def update(it, mat):
    global ALIVE
    ALIVE = step(ALIVE)
    mat.set_data(render(ALIVE))

# Some library calls to set up the animation. Don't worry about it.
def start_animation():
    fig,ax = plt.subplots()
    mat = ax.matshow(render(ALIVE), cmap=plt.cm.gray)
    ani = animation.FuncAnimation(fig, update, interval=150,
                                  save_count=150, fargs=[mat])
    plt.show()


In [None]:
ALIVE      = GLIDER
start_animation()

It works! Now let's plug in one of the more interesting patterns in Game of Life. To define it easily, I use a multi-line string, that I haven't told you about but you can look up how it works. (Even better would be to read such a pattern from a text file.)

In [34]:
GLIDERGUN_PATTERN = """
........................*...........
......................*.*...........
............**......**............**
...........*...*....**............**
**........*.....*...**..............
**........*...*.**....*.*...........
..........*.....*.......*...........
...........*...*....................
............**......................
"""

print(GLIDERGUN_PATTERN)


........................*...........
......................*.*...........
............**......**............**
...........*...*....**............**
**........*.....*...**..............
**........*...*.**....*.*...........
..........*.....*.......*...........
...........*...*....................
............**......................



We need to convert the pattern to a configuration. Let's write the function. Again, we should break it down like before: how do we implement it by hand?

In [None]:
# method:
# - Find out the width and height of the pattern
# - Iterate over all coordinates in the rectangle
# - If at those coordinates, the pattern contains a '*', add it to the set

Now, how do we find out the width and height of the pattern?

In [None]:
# "convert" function:
# in:  a string describing a game of life pattern (each line is a row)
# out: the configuration, with (0,0) being the top left of the pattern
# method:
# - Break the pattern into a list of its constituent lines. Drop empty lines.
# - The length of this list is the height of the pattern.
# - The length of the string at index 0 is the width of the pattern
# - (we could verify that all rows have the same length if we wanted to)
# - (let's add this as a requirement to the specification for now)
# - Iterate over all coordinates in the rectangle
# - If at those coordinates, the pattern contains a '*', add it to the set

def convert(grid):
    lines = [ line for line in grid.split("\n") if len(line)>0 ] 
    (width, height) = (len(lines[0]), len(lines))
    return { (x,y) for x in range(width) for y in range(height) if lines[y][x]=='*' }

convert(GLIDERGUN_PATTERN)

Okay! So how will we check that this function actually works?

In [30]:
GLIDER_PATTERN = """
.*.
..*
***
"""

def test_glider():
    glider_config = convert(GLIDER_PATTERN)
    if glider_config == GLIDER:
        print("Test succeeded! Yessss!")
    else:
        print("Test failed :-( I got", glider_config, ", while it should be", GLIDER)

In [31]:
ALIVE = convert(GLIDERGUN_PATTERN)
start_animation()

NameError: name 'convert' is not defined

In [27]:
OSCILLATOR_PATTERN = """
..*....*..
**.****.**
..*....*..
"""

ALIVE = convert(OSCILLATOR_PATTERN)
start_animation()

NameError: name 'convert' is not defined

In [28]:
SPACESHIP_PATTERN = """
...*.
....*
*...*
.****
"""

ALIVE = convert(SPACESHIP_PATTERN)
start_animation()

NameError: name 'convert' is not defined

In [29]:
import random
ALIVE = set()
for i in range(1000):
    ALIVE.add((random.randrange(-50,50), random.randrange(-50,50)))

start_animation()

NameError: name 'start_animation' is not defined