<a href="https://colab.research.google.com/github/edoardochiarotti/class_datascience/blob/main/2024/00_Python-Basics/00_Python-Basics_3_Iteration-Comprehension.ipynb" target="_blank" rel="noopener"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Basics: iteration

<img src='https://www.agent-x.com.au/wp-content/uploads/2011/06/Perfect-Programmer-dfe194b-e8d3b11-b960bd5.jpg' width="400">

Source: [Agent-X Comics - Perfect Programming](https://www.agent-x.com.au/comic/perfect-programming/)

## Contents

In our previous notebook, we have manipulated collection of elements, i.e., lists, tuples, and dictionaries. We have often performed operations on single elements of our object, e.g., adding a new key to our dictionary or replacing a value. What if we want to repeat the same operations? One way would be to repeat the same lines of code over and over again... That is very inefficient, for example imagine you have a list containing one thousand elements and you want to perform the same operation on each of them, that would be one thoudand lines of code! Fortunately, there is a better way, called *iteration*!

- [Iteration](#Iteration)
  - [For loop](#for-loop)
      - [Iterate over dictionary](#dico-iterate)
  - [While loop](#while-loop)
  - [Break, continue, else](#break-continue-else)
- [Comprehensions](#Comprehensions)

## Iteration <a name="Iteration"></a>

### For loop <a name="for-loop"></a>

A `for` loop is used for iterating over a sequence, such as a list or a tuple.

Let's start with a simple example. We will print all the elements of a list:

In [1]:
canton_romand = ['Jura', 'Neuchâtel', 'Vaud', 'Genève', 'Berne', 'Fribourg', 'Valais']

for c in canton_romand:
    print(c)

Jura
Neuchâtel
Vaud
Genève
Berne
Fribourg
Valais


We printed a list of the Romandy cantons. 

Let's review what we did. For every item in our list `canton_romand`, we printed this item (canton). More generally, a `for` loop will do something for every item `in` a sequence. 

Note that a `str` is also sequence: just like a list or tuple, a string is an ordered collection of characters. Thus, we can use a `for` loop on a string:

In [2]:
for c in 'circular economy':
    print(c)

c
i
r
c
u
l
a
r
 
e
c
o
n
o
m
y


Let's go back to our illustration on cooperation in social dilemma. Remember the condition for cooperation that we previously discovered: individuals cooperate if their social benefit weighted by their degree of morality is greater or equal than their individual cost weighted by their degree of selfishness. Suppose we have several individuals with various degrees of morality stored in a list. For each of them, we wish to know if they cooperate or not. Let's try!

In [3]:
cost = 1            # individual cost 
benefit = 3         # social benefit

kappa_list = [0, 0.2, 0.3, 0.5, 0.7, 1]     # list of degrees of morality

for i in range(len(kappa_list)):
    if benefit*kappa_list[i] >= cost*(1-kappa_list[i]):
        print('Individual '+str(i+1)+' cooperates.')
    else:
        print('Individual '+str(i+1)+' does not cooperate.')

Individual 1 does not cooperate.
Individual 2 does not cooperate.
Individual 3 cooperates.
Individual 4 cooperates.
Individual 5 cooperates.
Individual 6 cooperates.


Wow, we start to do really cool stuffs! Ok, what actually happened? 

First, we used `len(kappa_list)` to know the length of our list `kappa_list` :

In [4]:
len(kappa_list)

6

Then, we created a **range** with the `range()` function. This function gives an iterable that enables counting: 

In [5]:
for i in range(6):
    print(i, end='  ')

0  1  2  3  4  5  

We see that `range(6)` gives us six numbers, from `0` to `5`. As with indexing, `range()` inclusively starts at zero by default, and the ending is exclusive. It turns out that the arguments of the `range()` function work much like indexing. If you have a single argument, you get that many integers, starting at 0 and incrementing by one. If you give two arguments, you start inclusively at the first and increment by one ending exclusively at the second argument. Finally, you can specify a stride with the third argument.

Going back to our loop. We iterated over our range, and we mixed a `for` loop with a `if` statement. For every element of our range (`0`, `1`, `2`, `3`, `4`, `5`), we assessed whether the condition `benefit*kappa_list[i] >= cost*(1-kappa_list[i])` was `True`. As we have seen before, `kappa_list[i]` extracts the element of the list `kappa_list` located at index `i`. Because we combined `range` and `len`, we iterate over all the elements of the lists, checking our condition for each degree of morality.

Finally, when the condition is `True`, we `print()` the string `'Individual '+str(i+1)+' cooperates.'`. This string joins three strings: `'Individual '`, `str(i+1)` where we converted our `int` variable `i+1` into a string using the `str()` function, and `' cooperates.'`. Similarly, when the condition is `False`, we printed `'Individual '+str(i+1)+' does not cooperate.'`.

Neat!

Well, technically, there was an even better way to obtain the same result, using the `enumerate()` function. This function gives an iterator that provides both the index and the item of a sequence. Again, this is best demonstrated in practice:

In [6]:
cost = 1            # individual cost 
benefit = 3         # social benefit

kappa_list = [0, 0.2, 0.3, 0.5, 0.7, 1]     # list of degrees of morality

for i, kappa in enumerate(kappa_list):
    if benefit*kappa >= cost*(1-kappa):
        print('Individual '+str(i+1)+' cooperates.')
    else:
        print('Individual '+str(i+1)+' does not cooperate.')

Individual 1 does not cooperate.
Individual 2 does not cooperate.
Individual 3 cooperates.
Individual 4 cooperates.
Individual 5 cooperates.
Individual 6 cooperates.


The `enumerate()` function allowed us to use an index and a degree of morality `kappa` at the same time. Let's visualize this by printing the index and degree of morality for each individual:

In [7]:
for i, kappa in enumerate(kappa_list):
    print(i, kappa)

0 0
1 0.2
2 0.3
3 0.5
4 0.7
5 1


The `enumerate()` function is really useful and should be used in favor of just doing indexing. It is indeed more generic: the `range(len())` construct will break on an object without support for `len()`. 

Note that you can use the underscore, `_`, as a throwaway variable when you do not use it. There is no rule for this, but this is generally accepted Python syntax and helps signal that you are not going to use the variable.

Here are a two other useful iterators functions. First, the `zip()` function enables us to iterate over several iterables at once. In the example below we iterate over the jersey numbers and names of ice hockey players playing for the Detroit Red Wings (What do you mean you do not know them??)

In [8]:
names = ('Raymond', 'Seider', 'Larkin')
numbers = (23, 53, 71)

for num, name in zip(numbers, names):
    print(num, name)

23 Raymond
53 Seider
71 Larkin


Second, the `reversed()` function  is useful for giving an iterator that goes in the reverse direction. Imagine we are the NASA counting down:

In [9]:
count_up = ('ignition', 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10)

for count in reversed(count_up):
    print(count)

10
9
8
7
6
5
4
3
2
1
ignition


#### Iterate over dictionary <a name="dico-iterate"></a>

Using `items()` is the best way to **iterate over a dictionary**:

In [10]:
dictionary_morality = {'Florence': 0.3, 'Jordane': 0.2, 'Julia': 0.5, 'Ale':0.4}

for key, value in dictionary_morality.items():
    print(key, ':', value)

Florence : 0.3
Jordane : 0.2
Julia : 0.5
Ale : 0.4


Note that when using `item()` to iterate over values, you actually iterate over copies of them. This implies that if you make changes within the `for`loop, you will not change the entries in the dictionary: 

In [11]:
for key, value in dictionary_morality.items():
    value = 'this string will not be in dictionary.'

dictionary_morality

{'Florence': 0.3, 'Jordane': 0.2, 'Julia': 0.5, 'Ale': 0.4}

You will, however, if you use the keys:

In [12]:
d = dictionary_morality.copy()          # creates copy of our dictionary (to avoid losing our data...)

for key, _ in d.items():
    d[key] = 'this string will be in dictionary.'

print(dictionary_morality)
print(d)

{'Florence': 0.3, 'Jordane': 0.2, 'Julia': 0.5, 'Ale': 0.4}
{'Florence': 'this string will be in dictionary.', 'Jordane': 'this string will be in dictionary.', 'Julia': 'this string will be in dictionary.', 'Ale': 'this string will be in dictionary.'}


### While loop <a name="while-loop"></a>

A `while` loop allows iteration until a conditional expression evaluates `False`. 

Let's go back to our example on cooperation in a social dilemma, seen in our first "Python-Basics" notebook. Suppose we would like to find the threshold degree of morality allowing cooperation. We can use a `while` loop:

In [13]:
cost = 1            # individual cost 
benefit = 3         # social benefit

# Initialize sequence index
k = 0               # degree of morality

# condition for not cooperating: 
# social benefit times k is stricly lower than individual cost times (1-k)
while benefit*k < cost*(1-k):
    k+=0.01

print(k)

0.25000000000000006


Thus, in this illustration, all individuals with a degree of morality greater than 0.25 will cooperate. 

Let's take a minute to understand what is happening in a `while` loop. The value of `k` is changing with each iteration, being incremented by `0.01`. Each time we consider doing another iteration, the condition is checked: is the social benefit weighted by the degree of morality strictly lower than the individual cost weighted by the degree of selfishness? If yes, i.e., the condition is evaluated to `True`, then the iteration continues. In other words, iteration continues in a `while` loop until the condition returns `False`.

<img src='https://www.freecodecamp.org/news/content/images/size/w1000/2020/11/image-24.png' width="600">

Image : Estefania Cassingena Navone - [Python While Loop Tutorial](https://www.freecodecamp.org/news/python-while-loop-tutorial/)

We have to be extra cautious when using `while` loop. If the condition is always `True`, then the condition can never returns `False`. We are thus stuck in an **infinite loop** and the code runs forever! If this happen, you can interrupt the kernel to stop the endless calculation. 

Better, do not be stuck in infinite loop! For example, you can first check the condition with a few values before writing the while loop to make sure it can returns `False`. Also check that your incrementation is working as expected outside of the loop. You can add a second condition, one that will for sure returns `False` after a given number of iterations. For instance:

```python
cost = 1            # individual cost 
benefit = 3         # social benefit

# Initialize sequence index
k = 0               # degree of morality
i = 0               # iteration
max_it = 1000       # maximum number of iteration

# condition for cooperation: 
# social benefit times k is greater than individual cost times (1-k)
while benefit*k < cost*(1-k) and i < max_it:
    k+=0.01
    i+=1
```

In the above code, `i` keeps track of the number of iteration and `max_it` defines a maximum number of iteration (1000). We added the condition `i < max_it`. This condition will for sure returns `False` if we reach 1000 iterations.

Finally, another way to avoid infinite loop is to use the `break` statement that we will discover below.

Now you may wonder, when to use `for` loop and when to use `while` loop? In most cases, you could use one or the other. Here is a general rule:
- If you know how many times you have to do something (or if your program knows), use a `for` loop. 
- If you don't know how many times the loop needs to run until you run it, use a `while` loop. 

### Break, continue, else <a name="break-continue-else"></a>

The `break` statement can stop a `for` or `while` loop before it has looped through all the items. 

For example, going back to our example on Romandy cantons, imagine we wish to stop the loop once we have reached `'Vaud'`:

In [14]:
canton_romand = ['Jura', 'Neuchâtel', 'Vaud', 'Genève', 'Berne', 'Fribourg', 'Valais']

for c in canton_romand:
    print(c)
    if c == 'Vaud':
        break

Jura
Neuchâtel
Vaud


The `continue` statement can stop the current iteration of the loop, and continue with the next.

For example, let's print all the cantons except `'Vaud'`:

In [15]:
canton_romand = ['Jura', 'Neuchâtel', 'Vaud', 'Genève', 'Berne', 'Fribourg', 'Valais']

for c in canton_romand:
    if c == 'Vaud':
        continue
    print(c)

Jura
Neuchâtel
Genève
Berne
Fribourg
Valais


The `else` keyword specifies a block of code to be executed when the loop is finished.

In [16]:
canton_romand = ['Jura', 'Neuchâtel', 'Vaud', 'Genève', 'Berne', 'Fribourg', 'Valais']

for c in canton_romand:
    print(c)
else:
    print('In Romandy cantons, French is an official language. Unfortunately, Python is not.')

Jura
Neuchâtel
Vaud
Genève
Berne
Fribourg
Valais
In Romandy cantons, French is an official language. Unfortunately, Python is not.


Note that the `else` block will NOT be executed if the loop is stopped by a `break` statement:

In [17]:
canton_romand = ['Jura', 'Neuchâtel', 'Vaud', 'Genève', 'Berne', 'Fribourg', 'Valais']

for c in canton_romand:
    print(c)
    if c == 'Vaud':
        break
else:
    print('In Romandy cantons, French is an official language. Unfortunately, Python is not.')

Jura
Neuchâtel
Vaud


Alright, now it is your turn! In the notebook "Practice-with-Python", you can practice for and while loop. In exercises 5 and 6, you will study the influence of financial incentives on cooperation. Good luck!

## Comprehensions <a name="Comprehensions"></a>

<img src='https://i.imgflip.com/8mk1op.jpg' width="450">

We have previously built lists and tuples, and dictionaries by constructing them directly. What if we want to create a new list based on the values of an existing list? As always, we will take a simple example. Suppose we already defined the list `[1,2,3]` and we would like the list `[3,6,9]`. Unfortunately, multiplying by three using the `*` operator would not do the trick. Remember that the `*` operator on lists actually replicates and concatenates a list. We could write a for loop:

In [18]:
my_lis = [1,2,3]
new_lis = []

for i in my_lis:
    new_lis.append(3*i)
    
new_lis

[3, 6, 9]

Ok, it is working, but it feels quite a lot of coding for a very simple operation! Well, luckily for us, **comprehension** allows to do the same as above, but in one line of code:

In [19]:
[3*i for i in [1,2,3]]

[3, 6, 9]

We can even add a conditional test inside. For example, say we only want the odd numbers:

In [20]:
[3*i for i in [1,2,3] if 3*i %2 !=0]

[3, 9]

More generally, the structure of list comprehension is: 
```python
newlist =[expression_to_put_in_list for item in iterable if condition_1== True]
```

The condition acts as a filter that only selects the items that valuate to `True`.

Ok, let's look at a more complex example, taken from the great course of Justin Bois [Introduction to Programming in the Biological Sciences Bootcamp](http://justinbois.github.io/bootcamp/2022_epfl/index.html). We want to build a list containing the information about the 2018 Nobel laureates. We have, in three separate arrays, their names, nationalities, and category for the prize.

In [21]:
names = (
    "Frances Arnold",
    "George Smith",
    "Gregory Winter",
    "postponed",
    "Denis Mukwege",
    "Nadia Murad",
    "Arthur Ashkin",
    "Gérard Mourou",
    "Donna Strickland",
    "James Allison",
    "Tasuku Honjo",
    "William Nordhaus",
    "Paul Romer",
)

nationalities = (
    "USA",
    "USA",
    "UK",
    "---",
    "DRC",
    "Iraq",
    "USA",
    "France",
    "Canada",
    "USA",
    "Japan",
    "USA",
    "USA",
)

categories = (
    "Chemistry",
    "Chemistry",
    "Chemistry",
    "Literature",
    "Peace",
    "Peace",
    "Physics",
    "Physics",
    "Physics",
    "Physiology or Medicine",
    "Physiology or Medicine",
    "Economics",
    "Economics",
)

Remember the `zip()` function: it allows to iterate over several iterables at the same time:

In [22]:
[(cat, name, nat) for name, nat, cat in zip(names, nationalities, categories)]

[('Chemistry', 'Frances Arnold', 'USA'),
 ('Chemistry', 'George Smith', 'USA'),
 ('Chemistry', 'Gregory Winter', 'UK'),
 ('Literature', 'postponed', '---'),
 ('Peace', 'Denis Mukwege', 'DRC'),
 ('Peace', 'Nadia Murad', 'Iraq'),
 ('Physics', 'Arthur Ashkin', 'USA'),
 ('Physics', 'Gérard Mourou', 'France'),
 ('Physics', 'Donna Strickland', 'Canada'),
 ('Physiology or Medicine', 'James Allison', 'USA'),
 ('Physiology or Medicine', 'Tasuku Honjo', 'Japan'),
 ('Economics', 'William Nordhaus', 'USA'),
 ('Economics', 'Paul Romer', 'USA')]

What if we are only interested in Economics winners? We can add an `if` statement:

In [23]:
[
    (cat, name, nat)
    for name, nat, cat in zip(names, nationalities, categories)
    if cat == "Economics"
]

[('Economics', 'William Nordhaus', 'USA'), ('Economics', 'Paul Romer', 'USA')]

Pretty cool, no? In case, [William Nordhaus](https://en.wikipedia.org/wiki/William_Nordhaus) received the prize for integrating climate change into macroeconomic models, being a pioneer in the design of [Integrated Assessment Models](https://en.wikipedia.org/wiki/Integrated_assessment_modelling). 