# Notes about this lecture
Because of the Corona virus outbreak, this lecture will not be held in the classroom but online only. Further, the lecture will only be available in this written form. In order to offer support for the students we will use the gitlab issue tracker as a question & answer forum: https://git.ee.ethz.ch/python-for-engineers/class-fs20-forum.

## Software

### Necessary software
Please install the following tools:
* python3 (https://www.python.org/downloads/ version 3.8.2 is fine.
Python is a prerequisite for jupyter)
* jupyter-notebook (https://jupyter.org/install.html)
* **Hint for Windows and OSX**: Try to install conda or miniconda (https://docs.conda.io/en/latest/miniconda.html) first. This will install Python and jupyter-notebook automatically.

### Optional (but highly recommended) software
* git (https://git-scm.com/download/). Git is harder to install but not strictly necessary. **Hint**: On Windows Git will automatically install a Linux compatible shell which can then be found as 'Git BASH'.
* If git is not available, solutions shall be uploaded on https://polybox.ethz.ch instead and the folder shall be shared with the lecturers. 

## Support
**For any issues please use the forum** at: https://git.ee.ethz.ch/python-for-engineers/class-fs20-forum and follow the instructions therein. In case of need, we will open a room on https://jitsi.riot.im/ and share the audio, video or the screen: make sure you have a microphone and speakers functioning. 

This service is offered only **during the normal lecture hours**.

# Obtaining the material for this lecture
### If git is available on your system (preferred option)
Pull the new material from the upstream repository:

```bash
cd class-fs20
git pull upstream master
```

Then launch the jupyter-notebook and open the Lecture_04 file:

```bash
anaconda # Only on ETH computers to load the Python environment.
jupyter-notebook &
```

### If git is **not** available on your system
Download the latest material from:
https://git.ee.ethz.ch/python-for-engineers/class-fs20/-/archive/master/class-fs20-master.zip
and unpack it on your computer.

# Summary of lecture 3 (Python basics 2 )
Before starting with new material, let's refresh the content of the last lecture.

* Functions
* Modules
* Error Handling

## Advanced function syntax
The following shows a simple example of a function definition in Python:

In [None]:
# Define a function with two arguments.
def my_function_name(some_argument1, some_argument2):
    some_computation_result = some_argument1 + some_argument2
    
    # Pass back a value to the caller.
    return some_computation_result

Functions can be called by their name with arguments passed in the parentesis:

In [None]:
# Call the function.
result = my_function_name(1, 2)
print('result =', result)

### Default argument values

Function arguments can have default values that will be used if the argument is ommitted in the function call.
Note that arguments with default values must be defined after arguments without default values.

In [None]:
def function_with_default_values(a, b = 1, c = 2):
    print('a =', a)
    print('b =', b)
    print('c =', c)
    print()

# The arguments with default values can be fully ommitted.
function_with_default_values('test 1')

# The arguments with default values can be partially ommitted.
function_with_default_values('test 2', 7) # c defaults to 2.

# The arguments with default values can be overwritten.
function_with_default_values('test 2', 7, 77)

### Positional arguments, keyword arguments and mixed arguments

* Arguments passed to the function *without* their name are called **positional arguments** because their position determines the assignment.
* Arguments passed to the function *with* their name are called **keyword arguments**.

Keywords arguments allow to pass arguments in a *different order* than in the function definition.

In [None]:
# Passing keyword arguments with a different 
# order than in the function definition:
function_with_default_values(c = 3, a = 1, b = 2)

It is possible to **mix** *positional* and *keyword* arguments. However, all positional arguments must be listed *before* any keyword arguments:

In [None]:
# Mixed two positional and one keyword arguments
function_with_default_values(1, c = 3, b = 2) # The first argument is mapped to `a`.

### Local and global variables
When a function is defined in python, a *local* namespace is defined. All variables defined inside the body of the function, as well as the arguments of the function, belong to this *local* namespace. These values are therefore not visible from outside the function, as exemplified below:

In [None]:
def example_of_local_namespace():
    my_local_variable = 33
    print('Function executed correctly.')
    
# Main program:
example_of_local_namespace()
try:
    print(my_local_variable)
except:
    print('Error.')

When a function is executed and a variable is invocated, the variable is first searched inside the *local* namespace of the function. If the value of the variable is not found, its value is search is the *global* namespace which consists of the variables which have been defined previously outside of the function. This property is illustrated in the following example:

In [None]:
def my_function():
    try:
        print(something)
    except:
        print('Error.')
        
# main program:
my_function()
something = 37
my_function()

### The `global` keyword
If it is wished to define a variable directly as a global variable, also from within a function, it is possible to use the keyword `global`. Using global variables, however, is sometimes considered as a bad programming practice and should therefore be avoided when possible.

Example:

In [None]:
def input_function():
    global x
    x = input('Please insert an integer: ')
        
        
def print_function():
    print(x)
    
# Main program:
input_function()
print(x)  # this shows that the variable 'x' is available inside the main program
print_function()  # this shows that the variable 'x' is available inside other functions too

### Python's default return Value: `None`
If the code in a function terminates without reaching a `return` statement then `None` will be returned. `None` is an object of type `NoneType`. It's main purpose is to indicate in a well-defined way the absence of a value. (Java, C or C++ programmers might be familiar with the `null`-pointer or `NULL` which is used for a similar purpose in those languages)

In [None]:
def greet(name):
    print("Hello, {:s}!".format(name))
    
value = greet("students")

print('returned value: ', value)

To check if a variable equals `None` the `is` operator should be used instead of `==`.

This is because a class can re-define the `==` operator and therefore `x == None` can in principle lead to arbitrary results. The `is` operator, instead, cannot be re-defined.

In [None]:
if value is None:
    print('value is none...')
else:
    print('value is NOT none...')

### Returning more than one value
The preferred way to return more than one value is to return a tuple.

In [None]:
def return_multiple_things():
    return 'X', 1, 1.0  # Return a tuple.


# The returned tuple is unpacked
# into the variables on the left.
retval_1, retval_2, retval_3 = return_multiple_things()  
print(retval_1, 'is a', type(retval_1))
print(retval_2, 'is a', type(retval_2))
print(retval_3, 'is a', type(retval_3))

## Modules

[Documentation Link](https://docs.python.org/3/tutorial/modules.html)

**Modules** consist of separate Python files. One module can contain multiple functions. Module files can be ordered into folders, whose structure is reflected by the `import` statement.

Modules are meant to structure code and improve it's reusability.

Modules can be imported in four different ways:
1. `from my_module import *` 
1. `from my_module import my_function_1, my_function_2`  (This method allows to import a subset of available functions in the module).
1. `import my_module`   (This method forces to "cite" the module name every time a function is called).
1. `import my_module as my_favourite_name`  (Like the previous, but allows to rename a module).

**Observation:** The complete search path in which Python looks for modules is in the `sys.path` list.

To find the modules in a `import` statement Python searches in defined directories for modules with the same name. The directories are defined in the Python *search path* which is stored in the `sys.path` list.

In [None]:
import sys
sys.path

The official Python documentation has [a comprehensive guide](https://docs.python.org/3/library/index.html) of all already built-in modules which are ready to be used with any standard Python installation.

## Error and exception handling
In Python *errors* and *exceptions* are terms that often used interchangably. In the following we will consider the two terms a synonyms.

Erros lead to an abrupt termination of code execution in a Python program. Errors can be caused, for example, by wrong syntax, division by zero or access to an undefined variable.

Because errors abort code execution they are essentially another way of controlling the program flow. To prevent that an error terminates the whole program Python allows to 'catch' errors using the `try` and `except` blocks. Code within a `try` block is executed as a first choice. If it raises an error the program execution jumps into the `except` block. This allows to react in a meaningful way to the occurrence of an error.

Additionally, multiple `except` blocks can be used to handle different types of errors separately.

More information on errors can be found [here](https://docs.python.org/3/tutorial/errors.html).

A short example is given in the following:

In [None]:
try:
    x = int(input("Please enter a number: ")) # Here a `ValueError` can be raised if the input is not convertible into an int.
    y = 1/x # Here a `ZeroDivisionError` could be raised if `x` is 0.
    print('All went fine: 1/x =', y)
except ZeroDivisionError:
    print("Cannot divide by zero.")
except ValueError as err:  # Note the `as` statement.
    # The raised error object can be accessed by the `err` variable.
    print(type(err))
    print("Could not convert data to an integer.", err)
except Exception as e:
    # It is hard to trigger this exception in this example.
    print("Unexpected error:", e)

### Raising errors

Errors can be raised with the `raise` keyword. This is particularly usefull to handle exceptional cases which do not allow the code to be continued as intended. Raising an error enables the caller of the function to decide how to react.

In [1]:
def greet(name):
    if type(name) is not str:
        raise ValueError('Can only greet strings!')
    print('Hello, {:s}!'.format(name))
    
greet('students')
greet(1.1)

Hello, students!


ValueError: Can only greet strings!

### Assertions

The above pattern of checking something and conditionally raising an error is very common. The `assert` statement can be used in such cases.

The `assert` statement has the following syntax:

```python
assert condition, 'Some error message.'
```

Only if the condition is *not* met (`False`) the assertion will trigger, and an `AssertionError` will be raised.

In [None]:
def greet(name):
    assert type(name) is str, 'Can only greet strings!'
    print('Hello, {:s}!'.format(name))
    
greet('students')
greet(1.1)

### How to use errors and assertions

As a rule-of-thumb errors can be divided into two classes.

The **first class** are implementation flaws or errors in the code such as a function being called with wrong argument types. This type of errors can be attributed to the programmers. In this cases correct error handling is difficult because the code is not correct. Often, the only way to reasonably deal with this is to abort the program and fix the code. `assert` statements are a good way to detect this type of errors. As in the example above it makes often sense to check function arguments for the types or check that some value is not `None`.

The **second class** of errors does not stem from the code itself but from the environment. This includes for example wrong user inputs, wrong file formats, interrupted network connections or a file that does not exist.
This cases should be properly handled with `try` and `except` blocks.

### ✏️ $\mu$-exercise

At this point, please switch to your Exercise_04 notebook and complete $\mu$-exercises __1 and 2__.

# Python basics 3 - sets, list comprehensions & file input/output (IO)

## Sets

* [Official documentation](https://docs.python.org/3/library/collections.html)

So far, this course introduced three container data types: `list`, `dict` and `tuple`. *Container* or *collection* data types hold data in a structured way. Each of the container types has its advantages and disadvantages. A list, for instance, provides a memory-efficient way to store many elements and offers constant-time access to elements when addressing them by the offset in the list. Membership tests, however, are very inefficient for lists: To test if a certain element is in the list, in fact, the whole list has to be searched in general.

There is another built-in container data type in Python: `set`. A set in Python behaves very similar to sets in the mathematical sense. In contrast to a list, a set does not define an ordering of the elements nor allows an element to be in the set more than once.

**Observation**: in the background, however, Python usually stores the elements of the sets in and ordered fascion (using hash tables), but this ordering is not exposed to the user.

A set behaves very similar to a dictionary with *keys* but without *values*. Moreover, the syntax to create sets is very similar to the one for dictionaries:

In [None]:
# The `{x, ...}` syntax defines a short-hand 
# notation for creating sets.
a = {1, 3, 2}
b = {3, 2, 1}

# Sets can be created from sequences using the `set()` function.
c = set([1, 3, 2])

# Sets allow an element only to appear once at most.
d = {1, 1, 1, 2, 3, 2, 3, 2, 3}

# Notice the order when printing the set! It might be 
# different from the order above.
# Sets are UNORDERED.
print(a)
print(b)
print(c)
print(d)

In [None]:
# An empty set has to be created with 
# the `set()` function because `{}` is already 
# used to define an empty dictionary.
empty_set = set()
print(empty_set)

print('Note: {} defines not a set but a', type({}))

In [None]:
# Sets are mutable which means elements can be added and removed.
# On average adding and removing elements is more efficient
# for sets than for lists.
s = {1, 2, 3}
print(s)
s.add(4)
print(s)
s.remove(1)
print(s)

# Note that the printed sets look sorted. This should not be relied upon.
# Background: For the case of integers the reason 
# is that the hash function is trivially ordered: `hash(i) = i`.

In [None]:
# For floating point values this ordering disappears most likely:
floats = {1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8}
print('Note the ordering of:', floats)

In [None]:
# Sets allow to check membership very efficiently.
# Let's compare this experimentally: First we create 
# a huge list and a huge set with the same elements.
# Then we compare the time it takes to confirm that -1 
# is not in the list respective in the set.

# Import a module for measuring time.
import time
from random import seed
from random import randint

# Feel free to do multiple experiments with 
# different orders of magnitude for `num_elements`. 
# What do you observe?
num_elements = 2_000_000

seed(1) # Seed random number generator.
rand = [randint(0, 2**64-1) for i in range(num_elements)]

start = time.time()
l = [i for i in rand]
time_to_create_list = time.time() - start

seed(1)
start = time.time()
s = {i for i in rand} # Populate a set with the contents of the list.
time_to_create_set = time.time() - start

# Observation: The time it takes to create a list of random 
# float numbers is short compared to the time it takes 
# to convert that list to a set:

print('Creating the list is {:.4f} times faster than creating the set.'.format(time_to_create_set / time_to_create_list)) # It is bigger than 1.

In [None]:
# Check if -1 is in the list. -1 is not in the list.
# Therefore the whole list has to be 
# checked to be sure it is not there.

# Also measure the time it takes to search the list.
start = time.time()
result = -1 in l
end = time.time()

duration_list = end - start

print('Duration (list): {}s'.format(duration_list))

In [None]:
# Check if -1 is in the set. In contrast
# to a list, this is much more efficient 
# and can be done in O(log(n)) time.

# Also measure the time it takes to search the set.
start = time.time()
result = -1 in s
end = time.time()

duration_set = end - start

print('Duration (set): {}s'.format(duration_set))

print('Searching in the set is {:.2f} times faster than in the list!'.format(duration_list/duration_set))

**Observation**: Membership test is faster with sets than list but creating a set takes more time.

## Accessing set elements
Like lists, sets are iterable which implies that a set can be looped over in a for loop.

In [None]:
# Iterate over elements in a set.
# Note that the order of iteration can be other than the order in the code.
s = {1.1, 1.2, 1.3, 1.4, 1.5, 1.6}
for e in s:
    print(e)

In [None]:
# The `pop` function allows to take out one element
# at a time from the set (undefined order!).
s = {1.1, 1.2, 1.3, 1.4, 1.5, 1.6}
print(s.pop())
print(s)
print(s.pop())
print(s)
print(s.pop())
print(s)

### Set operations
The main benefit of sets comes with boolean set operations: Unions, intersections, differences.

The set operations come in non-modifying and modifying versions. The non-modifying versions create a new set holding the result of the operation while letting the original sets unchanged. The modifying operations change a set in-place.

The following code examples introduce the boolean set operations.

#### Non-modifying set operations

In [None]:
# Define two sets used for the following illustrations.
a = {1, 2}
b = {2, 3}

In [None]:
# Union
# Union finds the elements that are at least in `a` OR `b`.
a_union_b = a.union(b)

# Union can be written using the `|` operator.
a_union_b = a | b

# By definition the union of `a` and `b` should now be a superset of both `a` and `b`.
assert a_union_b.issuperset(a)
assert a_union_b.issuperset(b)

print('Union: a | b = b | a = ', a_union_b)

In [None]:
# Intersection
# Intersection finds the elements that are in `a` AND `b`.
a_intersection_b = a.intersection(b)

# Intersection can be written using the `&` operator.
a_intersection_b = a & b

# By definition the intersection of `a` and `b` should be a subset of both `a` and `b`.
assert a_intersection_b.issubset(a)
assert a_intersection_b.issubset(b)

print('Intersection: a & b = b & a =', a_intersection_b)

In [None]:
# Difference
# Compared to the previous set operations this is not commutative!
a_difference_b = a.difference(b)

# Difference can be written using the `-` operator.
a_difference_b = a - b
b_difference_a = b - a

print('Difference: a - b = ', a_difference_b)
print('Difference: b - a = ', b_difference_a, ' != a - b')

In [None]:
# Symmetric Difference (Exclusive Or)
# Symmetric Difference finds the elements that are in EITHER `a` OR `b` (but not in both).
# a ^ b = (a | b) - (a & b)
a_symdiff_b = a.symmetric_difference(b)
# Symmetric Difference can be written using the `^` operator.
a_symdiff_b = a ^ b
print('Symmetric Difference: a ^ b = ', a_symdiff_b)

#### Modifying set operations

The set operations introduced before do not modify the input sets and create a new output set. This set operations also come in variants that update one of the input sets to hold the results.

In [None]:
# Union
a = {1, 2}
b = {2, 3}

# There are two variants of writing the modifying set union.
# One can also think of this as adding all elements in `b` to `a`.
# a |= b # Equivalent form of the following.
a.update(b)

print(a)

In [None]:
# Intersection
a = {1, 2}
b = {2, 3}

# There are two variants of writing the modifying set intersection.
# One can also think of removing all elements from `a` that are not in `b`.
# a &= b # Equivalent form of the following.
a.intersection_update(b)

print(a)

In [None]:
# Difference
a = {1, 2}
b = {2, 3}

# There are two variants of writing the modifying set difference.
# One can also think of removing all elements from `a` that are in `b`.
# a -= b # Equivalent form of the following.
a.difference_update(b)

print(a)

In [None]:
# Symmetric Difference
a = {1, 2}
b = {2, 3}

# There are two variants of writing the modifying set difference.
# a ^= b # Equivalent form of the following.
a.symmetric_difference_update(b)

print(a)

### Modifying vs. non-modifying (optional)

Choosing a modifying set operation over a non-modifying one can be beneficial for example in the following cases:

* One of the sets is huge and only a small change will be made (such as removing or adding a few elements). In this case a non-modifying set operation has to make a copy of most of the elements, while the modifying operation only changes a few elements.
* A set should be passed to a function and should be modified in the function such that the modification is visible from outside of the function.

The following experiment investigates advantages of the different variants in terms of speed:

In [None]:
import time

# Run this experiment for different orders of magnitude of `num_elements`
# and observe the ratio of the two durations.
# Also try different set operations!
# Can you find a set operation where the non-modifying variant is much slower than the modifying variant?
# Caution: If `num_elements` is too large this could slow down your computer.
num_elements = 1_000_000
huge_set = set(range(num_elements))
tiny_set = {1, 2, 3}


start = time.time()
result_mod = huge_set.difference(tiny_set)
end = time.time()
duration_nonmod = end - start

start = time.time()
huge_set.difference_update(tiny_set)
end = time.time()
duration_mod = end - start

assert result_mod == huge_set

print("Modifying operation was {:.4f} times faster than the non-modifying operation.".format(duration_nonmod/duration_mod))

### Background (advanced & optional): `__hash__` and `__eq__`
**Reminder (hash function):** A [hash function](https://en.wikipedia.org/wiki/Hash_function) is any function that can be used to map data of arbitrary size to fixed-size values. For example:

In [None]:
import hashlib
#help(hashlib)
h1 = hashlib.sha1(b'short text') # note the "b" to encode string to byte
h2 = hashlib.sha1(b'long long long long long long text') 
print(h1.hexdigest())
print(h2.hexdigest())

**Observation**: The hash functions used for sets and dicts are much simpler than `sha1`.

The default sets in Python are so-called 'hash-sets'. The name refers to the algorithms used to implement the efficient set data structure. For *dictionaries* the situation is very similar.

In order to be put into a set or a dict a data type must be 'hashable' (`__hash__` function must be implemented). Also the `==` operator must be implemented (`__eq__` function must be implemented).

If a type implements `__hash__` then the `hash()` function can be called on this type.
If a type implements `__eq__` then the `==` operator can be used for this type.

Having the equivalence operation defined is necessary for sets. Otherwise it is not possible to perform any set operations in a meaningful way. The hash function, however, is specific to the algorithms used to implement efficient set operations. More information on the implementation of hash sets and 'hash tables' (= dicts) can be found online (https://en.wikipedia.org/wiki/Hash_table).


In [None]:
# Mutable collections such as list, dict and set do NOT implement __hash__.
# Therefore this FAILS with a `TypeError` (would work for a tuple, though):
my_list = [1, 2, 3]
s = {my_list}

## Comprehensions

* [Official documentation](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)


Performing element-wise operations on a list or other sequence is a very common pattern. The following example shows element-wise squaring of a range implemented with a for loop:

In [None]:
squares = [] # Create empty output list.
for x in range(1, 11): # Loop over all elements in a sequence.
    squares.append(x ** 2) # Perform an element-wise operation and append the result to the output list.
    
print(squares)

A much simpler and more readable way is using so-called list comprehensions. The following code if functionally equivalent to the one above. One can think of it as a shorter way to write this type of for-loops.

In [None]:
#         square brackets
#         |operation
#         ||          loop variable
#         ||          |    sequence for looping over
#         ||          |    |
squares = [x ** 2 for x in range(1, 11)]

print(squares)

A list comprehension consists of brackets (denoting the generation of a list), and therein a `for` statement followed by zero or more `for` or `if` statements. This allows to implement nested loops together with filtering as follows:

In [None]:
pairs = []
for x in [1, 2, 3]:
    for y in [3, 1, 4]:
        if x != y:
            pairs.append((x, y))
print(pairs)

The above nested loop can be translated into a list comprehension with multiple `for` statements. Note that the order of the for and if are the same.

In [None]:
pairs = [(x, y) for x in [1, 2, 3] for y in [3, 1, 4] if x != y]
print(pairs)

## Applying functions and filtering elements
The syntax shown here allows to achieve many different goals with a concise and short notation.

For example, a nested list can be flattened by appending all elements of the sub-lists into one single list:

In [None]:
# List-in-list structure could be used to define a class room.
rows_of_students = [['Alphonse', 'Betty', 'Charles'], ['Dora', 'Elias', 'Fabienne'], ['Gustav', 'Harry', 'Ian']]
# Get a list of all students.
class_room = [student for row in rows_of_students for student in row]
print(class_room)

A function can be evaluated for each element.

In [None]:
[x.upper() for x in class_room]

The `if` statement allows to filter a list and only take the elements into the output which satisfy the condition.

In [None]:
[x for x in class_room if len(x) <= 4]

### ✏️ $\mu$-exercise

At this point, please switch to your Exercise_04 notebook and complete $\mu$-exercises __3 and 4__.

## The `zip()` function

Sometimes it is desired to loop over two or more lists "simultaneously" incrementing the index of all lists at the same time. The `zip` function can be used for such cases:

In [None]:
chocolates = ['dark', 'brown', 'white']
calories = [5, 10, 15]

list(zip(chocolates, calories))

Again, *unpacking* can be used when looping:

In [None]:
for name, cal in zip(chocolates, calories):
    print('{:s} chocolate has {:d} calories'.format(name, cal))

## Nested list comprehensions

In a list comprehension the expression to generate the elements of the new lists can as well be another list comprehension.

For example, we can define the identity 3x3 matrix as follows:

In [2]:
# List comprehensions can be written on multiple lines to improve readability.
[
    [1 if col_idx == row_idx else 0 for col_idx in range(3)] 
    for row_idx in range(3)
]

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]

## Dict comprehensions
A syntax similar to list comprehensions also exists for dictionary comprehensions, set comprehensions and generators. Generators are shortly treated in the optional last section of this lecture.

Dict comprehensions allow to create a dictionary entry for each element in a sequence.

The basic syntax is as follows:
```python
new_dict = {key_function(e): value_function(e) # Create a key and a value for the dictionary entry.
            for e in some_sequence
            if condition(e) # Condition is optional.
           }

```
In contrast to list comprehensions not only a new list element is created but a key-value pair. `key_function(e)` is a placeholder for any expression that generates a key for the entry and `value_function(e)` is a placeholder for any expression that generates the value of the dictionary entry. Key and value can depend on `e` but don't have to. As with list comprehensions it is possible to conditionally create a dictionary entry. The condition can depend on `e`.

The above code could as well be written in a for loop:

```python
new_dict = dict()
for e in some_sequence:
    if condition(e):
        new_dict[key_function(e)] = value_function(e)
```

The following shows a simple example of generating a look-up table for square numbers.

In [3]:
# Example: Compute look-up table for square numbers using a dict comprehension.
squares = {i: i**2 for i in range(10)}
print(squares)

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}


Like list comprehensions, dict comprehensions can be used for many different applications. The following example shows how a dict comprehension can be used to swap keys with values of a dictionary. This essentially creates a reverse look-up table.

In [None]:
# Example: Exchange keys and values and create an inverse look-up table.
roots = {value: key for key, value in squares.items()}
print(roots)
print("The square root of 64 is {}.".format(roots[64]))

# Set comprehensions
Set comprehensions are very similar to list or dictionary comprehensions with the only difference that they create a set instead of a list or dict. The syntax is exactly the same as for list comprehension with the only difference that curly brackets `{}` are used instead of square brackets `[]`.

In [None]:
# Create a set using a comprehension.

# Take a list with strings and maybe some data types that we don't want.
faulty_name_list = ['alice', 'Alice', 'ALICE', 'bob', 'boB', 'BOB', 123, 1.234]

# Now create a set of names after converting them to lower case.
# Filter out everyting that is not a string.
unique_names = {name.lower() for name in faulty_name_list if type(name) == str}

print(unique_names)

### ✏️ $\mu$-exercise

At this point, please switch to your Exercise_04 notebook and complete $\mu$-exercises __5 and 6__.

# Navigating folders and file input/output (IO)

This section explains how Python can be used to interact with the file system. This includes navigating in the directory tree, creating folders and reading and writing files.

## The `os` module
[Documentation Link](https://docs.python.org/3/library/os.html)

With the `os` module Python offers a way to interact with operating-system dependent functionality in a portable way. In the following `os` will be used mainly to interact with the file system.

When working with file paths, the `os` module is to some extent independent from the operating system. For instance, paths under Linux are separated with `/` and under Windows with `\`. The `os` module will automatically convert the file paths into the right form.


The most fundamental commands to interact with the file system are shown in the following examples.

In [None]:
import os

# Get the 'current working directory'. This works like the `pwd` command in the Linux shell.
os.getcwd()

In [None]:
# Get a list of the content of the current working directory.
os.listdir()

In [None]:
# Get a list of the content of some other directory.
# For linux:
os.listdir('/tmp')
# For windows:
# os.listdir('C:/')

In [None]:
# Create a new directory.
os.mkdir('a_new_directory')

In [None]:
# Get an updated list of directories.
os.listdir()

In [None]:
# Check if the new directory is actually there.
os.path.exists('a_new_directory')

In [None]:
# Clean-up, remove the new directory.
os.rmdir('a_new_directory')
os.path.exists('a_new_directory')

### File paths

File locations are specified with a 'path' like `./some_folder/some_other_folder/some_file.xyz`. A path defines a sequence of folders that one must go through in order to find a file.

There are two classes of paths, namely absolute and relative paths.

* Absolute paths start with a slash `/` which stands for the root of the file system tree (Linux and OSX). An absolute path is not dependent on the current working directory.
* A relative path does *not* start with a slash (root). Relative paths start in the current working directory.

In every directory there are two special directory entries (for Linux and OSX): '`.`' and '`..`'.
The single dot '`.`' is an alias for the directory itself while the double dots `..` is an alias for the parent directory. '`..`' can therefore be used to go towards the root of the file system tree.

In [None]:
# Check if a path is absolute (Linux and OSX).
os.path.isabs('/etc/')

In [None]:
# This path is not absolute.
os.path.isabs('../Lecture_04/')

It gets interesting when we convert paths, especially to get the absolute path from a relative one:

In [None]:
# Convert a relative path into an absolute path starting from the current working directory.
os.path.abspath('../Lecture_04/')

The `shutil` module provides many functions for manipulating files and directories. Essentially, it offers similar functionality as the Linux *sh*ell does with commands like `cp`, `mv`, `rm` and `rmdir`.

In [None]:
import shutil
shutil.copy('Lecture_04.ipynb', 'Lecture_04.ipynb.backup')  # Copy a file.
os.mkdir('a_new_directory') # Create a new directory.
os.rename('a_new_directory', 'a_newer_directory')  # Rename a directory or a file.
os.remove('Lecture_04.ipynb.backup')  # Remove a file.
os.rmdir('a_newer_directory')  # Remove an empty folder.

### ✏️ $\mu$-exercise

At this point, please switch to your Exercise_04 notebook and complete $\mu$-exercise __7__.

## Reading and writing files
[Documentation Link](https://docs.python.org/3/library/io.html)

The next logical step is to interact with files. In Python, we use the built-in `open()` function to get a file-stream object:

In [None]:
with open('example-people.csv', 'r') as file_stream:
    content = file_stream.read()

content

Whenever a file is opened, it should be closed after use. This releases system resources and, in case of writing, makes sure that changes are actually saved on the hard disk.

The `with` statement is a control-flow structure and it is called a **Context Manager**. Among other purposes, it can be used to *automatically* close a file after it has been read. The `with` statement guarantees that the file is closed, even if an error is raised.

A more manual approach for opening and closing the file could be:

In [None]:
# NOT RECOMMENDED way of opening files.
file_stream = open('example-people.csv', 'r') # Open the file for reading ('r').
content = file_stream.read()  # Reads the entire file.
file_stream.close()  # Manually close the file.

content

The `open` function takes two arguments: i) the filename and ii) the mode. The mode is a string of one or two characters describing the desired file access:

- 'r' = read access
- 'w' = write access (any written data will overwrite an existing file!)
- 'a' = append access (any written data will be appended to the end of an existing file.)

These regular modes open files which contain (encoded) text, such as text files, source code, latex files, etc. For opening other files (like JPG, PDF, ...), it is possible to append a 'b' to the mode to open them as *binaries*. Example: `open('photo.jpg', 'rb')`.

Whenever a file is opened, it should be closed when no more needed. This liberates system resources and to makes sure, in case of writing, that the changes are saved to the hard drive.

Apart from using `file_stream.read()` above which reads the whole file content into a string in memory, reading text files can be done line-by-line. Going through files line-by-line can be advantageous from a memory perspective because not all of the file has to be loaded into memory at the same time. This allows to process files larger that the available memory.

In [None]:
with open('example-people.csv', 'r') as file_stream:
    something = ""
    for line_string in file_stream:  # Access the file line-by-line, i.e. only save one line in memory at a time.
        # Do example operation on the line_string.
        something += line_string[1:4]
    
something

### Writing to files

For writing, access is very similar: either use `write` to write a single string or `writelines` to write whole lists of strings. You find many more write functions in the Python documentation.

In [None]:
with open('writing-to-file.csv', 'w') as f:
    f.writelines(content)

### Structured data files: CSV and JSON

In many cases we encounter formated data files, like CSV [(Comma Separated Values)](https://en.wikipedia.org/wiki/Comma-separated_values) for table-style data or JSON [(JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) for other structures. Both are human readable but also made such that programs can easily interprete it too.

Python has built-in support for these two standards.
- for CSV, use the `csv` module [(docs)](https://docs.python.org/3/library/csv.html)
- for JSON, use the `json` module [(docs)](https://docs.python.org/3/library/json.html)

# Bonus: Generators (optional & advanced)

* [Official documentation](https://docs.python.org/3/howto/functional.html#generators)

For may types of operations on sequences it is not necessary and possibly not even possible to have the full sequence in memory at once. This shall be illustrated in the following example.

In [None]:
# This will probably crash on an average personal computer because it will consume multiple exa bytes of memory.
# sequence = [x**2 for x in range(1000_000_000_000_000_000)]

# An alternative way is to create a 'generator' sequence with a very similar syntax to list comprehensions.
# In contrast to a list or other container, generators are evaluated lazily.
# (`range` is also evaluated lazily as well as `zip` and `enumerate`).
# This means, elements are only computed when they are requested.

huge_sequence = range(1000_000_000_000_000_000)

# A generator is created with a similar syntax like a comprehension.
# It requires round brackets.
generator = (x**2 for x in huge_sequence if x%2 == 0)

print('Type of generator:', type(generator))

print(next(generator)) # Compute first element.
print(next(generator)) # ... now the second.
print(next(generator)) # ... now the third.

In [None]:
# To highlight the lazy evaluation a line shall be printed as soon as an element is requested.

def f(x):
    print('evaluating f({})'.format(x))
    return x**2

huge_sequence = range(1000_000_000_000_000_000)
generator = (f(x) for x in huge_sequence if x%2 == 0)

print("Until now, f has NOT yet been evaluated at all!")

# Note that the line is only printed when an element is accessed.
print(next(generator)) # Here the first element is computed.
print(next(generator)) # ... now the second.
print(next(generator)) # ... now the third.

In [None]:
# If we use a list comprehension, things look different.
list_instead_generator = [f(x) for x in range(10) if x%2 == 0]

In [None]:
# When passed to functions which take an iterable sequence as argument, the round parenteses can be ommitted.
# Note that for computing this sum there is no need to have all summands in memory at the same time.
sum(x**2 for x in range(1_000_000) if x%2 == 0)

### Create generator functions with the `yield` keyword

There is another way to define generators or 'generator functions' using the `yield` keyword.

In [4]:

def infinite_range_function(start: int = 0, step: int = 1):
    """
    This function looks like it will never return!
    """
    i = start
    while True: # Infinite loop.
        print('before `yield`: i =', i)
        yield i # <-- Note the `yield` keyword.
        print('after `yield`: i =', i)
        i += step

# Calling a function which contains `yield` keywords will create a new generator object
# instead of evaluating the function as one might expect for normal functions.
# Multiple independent generator objects could be created here.
infinite = infinite_range_function(0, 2)

print('Return type of the function: ', type(infinite))

# Print the first few elements of the infinite sequence.
for _ in range(4):
    print(next(infinite))
    

Return type of the function:  <class 'generator'>
before `yield`: i = 0
0
after `yield`: i = 0
before `yield`: i = 2
2
after `yield`: i = 2
before `yield`: i = 4
4
after `yield`: i = 4
before `yield`: i = 6
6


One can think of a generator as a function whose execution can be paused and continued at `yield` statements. When the execution is interrupted the internal function state is preserved. Therefore the function evaluation can be continued at any time later.

In [None]:
# Generator functions don't have to be infinite. The generator sequence will end once the function returns.

def infinite_range_function(start: int = 0, step: int = 1, stop: int=None):
    """
    Note that this is very similar to the `range` function.
    """
    i = start
    while True: # Infinite loop.
        
        if i == stop:
            # End of the sequence.
            return
        
        yield i
        i += step

# This will create a finite list.
list(infinite_range_function(start=0, step=2, stop=20))

In [None]:
# Generators can also be combined with other generators.

squares = (x**2 for x in infinite_range_function())

for _ in range(20):
    print(next(squares))

### Optional further reading

Generators fit nicely into the 'functional' programming paradigms. More on this can be found in the official documentation: https://docs.python.org/3/howto/functional.html

# Exercise time!
Now its your turn! Solve the rest of the exercises.

# Uploading solutions
Before the end of the class at about 16:00, please "push" your solutions. 

Please do so even if you have not solved all problems: additional
uploads can be made in the following days. Instructions are below.

### If git is available on your system (preferred option)
Add, commit and push your changes to the remote server:

`git add -A`

`git commit -m 'My solutions to Lecture 04'`

`git push origin master`

### If git is **not** available on your system
This is **not** the favourite solution and it should be avoided whenever possible.

Upload your Lecture_04 folder (containing the Exercise file) to the polybox https://polybox.ethz.ch and share the folder with luca.alloatti@ief.ee.ethz.ch, thomas.kramer@ief.ee.ethz.ch, and raphael.schwanninger@ief.ee.ethz.ch . To share the folder go on https://polybox.ethz.ch , then on the right of the folder there is a graph with one vertex connecting to two other vertices: click on it and then type the three emails.