# Day 2: Loops/Comprehensions
Kieran Didi

Loops are essential to process large amounts of data. With loops you can search through strings, lists, sets, dictionaries and many more. The most relevant loop is the `for` loop.

In general, loops can be classified into two categories: Definite (for) and indefinite (while) iterations.

To drill down on for-loops, these loops can be classified in how they iterate:
- numeric range loop
- three-expression loop
- collection-based/iterator-based loop 

For more information see [this post](https://realpython.com/python-for-loop/#a-survey-of-definite-iteration-in-programming). For now, it is important that Python only implements the iterator-based loop, but can emulate the other ones with this construct.

A `for` loop has two parts: a **header** which specifies the iteration and a **body** which is executed once per iteration. It iterates over the items in a sequence in the order they appear. The number of iterations is always determined beforehand, not during the iteration. The values change during every iteration. 

In [1]:
for amino_acid in "TEWQIPFV":       # this is the header
    print(amino_acid)               # this is the body 

T
E
W
Q
I
P
F
V


What Python does here is creating an iterator from that string. It implicitly calls the `iter` method which returns an iterator:

In [None]:
iterator = iter('TEWQIPFV')                             # String
next(iterator)
next(iterator)
next(iterator)

iter(['TEWQIPFV', 'WQRTSG', 'MGPWL'])        # List

iter(('TEWQIPFV', 'WQRTSG', 'MGPWL'))                # Tuple

iter({'TEWQIPFV', 'WQRTSG', 'MGPWL'})                # Set

iter({'TEWQIPFV': 20, 'WQRTSG': 50, 'MGPWL': 10})       # Dict

#fails for the following ones
iter(42)                                   # Integer

iter(3.1)                                  # Float

iter(len)                                  # Built-in function

#get values from iterator back via list function
iterator = iter('TEWQIPFV')
sequences = list(iterator)

Here a short overview from the article linked above on the inner workings of the for loop in Python: 

| Term      | Meaning                                                                             |  
|-----------|-------------------------------------------------------------------------------------|
| Iteration | The process of looping through the objects or items in a collection               |
| Iterable  | An object (or the adjective used to describe an object) that can be iterated over |
| Iterator  | The object that produces successive items or values from its associated iterable  |
| iter()    | The built-in function used to obtain an iterator from an iterable                 | 



<img src="https://files.realpython.com/media/t.ba63222d63f5.png" alt="drawing" width="400"/>

The loop stops when it has processes every character in the protein sequence above.
With the `range()` expression you can define how often a command is executed:

In [5]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [3]:
for i in range(10):
    print("Hello World!")

Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!


The `range()` statement also shares the start,stop, step syntax: 

In [4]:
print(list(range(5,10)))
print(list(range(0,10,3)))
print(list(range(-10,-100,-30)))

[5, 6, 7, 8, 9]
[0, 3, 6, 9]
[-10, -40, -70]


When we combine loops with conditionals, we can create very useful applications. In the example below we want to count the number of thymines present in the genetic sequence:

In [2]:
genetic_sequence = 'TTAATCTTGTGATACGATATGAGA'
T_count = 0

for nucleotide in genetic_sequence:
    if nucleotide == 'T':
        T_count += 1
        
print(T_count)

9


In [None]:
#continue/break/else

 `for` loops can not run forever, because you need to define an end point (length of a list, string or `range()` statement). \
 `while` loops are different, since they stop only if a condition chosen by you is `True`. (Think about a reapeating if statement). The iteration is **indefinite**.

 A `while` loops also has a a **header** which specifies the iteration and a **body** which is executed once per iteration. The header statement is evaluated in a Boolean context. If it is `True`, Python executes the body. The  header statement is checked again, again and again until the expression defined becomes `false`.

In [3]:
number = 0

while number < 10:          # header
    print(number)           # body
    number = number + 1     # body

0
1
2
3
4
5
6
7
8
9


If you want to count thymine nucleotides, you can actually rewrite the program above with a `while` loop:

Sidenote: the expression `i += 1` is equivalent to `i = i + 1`

In [4]:
T_count_while = 0 
i = 0 

while i < len(genetic_sequence):
    if genetic_sequence[i] == "T":
        T_count_while += 1 
    i += 1

print(T_count_while)
#Are the T_counts equal for both loops?
print(T_count_while, T_count)

9
9 9


It is also possible to have indefinite while loops with the `while True` statement. The expression is always evalueated as `True` (we use the `time` package to slow down the execution). We use the `break` statement to stop the loop execution.

In [15]:
import time

i = 0 

while True:
    i += 1
    print(i)
    time.sleep(0.5)
    if i == 21:
        break

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21


If we run the program without the `if` statement, it would run indefinitely. Finally, the computer would stop the program or crash. Therefore: be careful with your `while` loops.


In [None]:
#while loops: continue/break/else statements


<img src="https://files.realpython.com/media/t.899f357dd948.png" alt="drawing" width="300"/>


Filling up lists with loops is easy. To create a list of numbers from 0-9 just use:

In [22]:
int_list = []
for i in range(10):
    int_list.append(i)
print(int_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
#one-liner while loops (only simple cases, cannot combine it with e.g. if statement)
n = 10
while n > 0: n -= 1; print(n)

You often use loops to search through lists or strings. If you have a list of strings, you can even search through each individual string in a second loop. 

In [16]:
list_of_primers = ["CGCAAATGGGCGGTAGGCGTG","GACTATCATATGCTTACCGT","CAGGAAACAGCTATGAC"]
list_of_t_counts = []

for sequence in list_of_primers:
    t_count = 0
    for nucleotide in sequence:
        if nucleotide == 'T':
            t_count += 1
    list_of_t_counts.append(t_count)

print(list_of_t_counts) 

[3, 7, 2]


It would be better if we could directly combine the primers with their respective t_count. For that, we use the `zip()` function. We need to put the `zip`
output in a new list, then we can print it. This combines both lists into one, creating tuples in a list.  

In [19]:
list(zip(list_of_primers, list_of_t_counts))

[('CGCAAATGGGCGGTAGGCGTG', 3),
 ('GACTATCATATGCTTACCGT', 7),
 ('CAGGAAACAGCTATGAC', 2)]

This is useful if you want to iterate through two lists in parallel:

In [1]:
codons = ['GCA','AGA','GAT','AAT','TGT']

amino_acid = ['A','R','D','N','C']

for codon, aa in zip(codons,amino_acid):
    print(codon,aa)

GCA A
AGA R
GAT D
AAT N
TGT C


The enumerate function gives you even more flexibility by providing you with an index variable while iterating:

In [3]:
#enumerate function
for pos, (codon, aa) in enumerate(zip(codons,amino_acid)):
    print(pos,codon,aa)

0 GCA A
1 AGA R
2 GAT D
3 AAT N
4 TGT C


We can combine what we learned about lists, loops and conditionals to form list comprehensions, a short-hand and efficient way to create lists.

In [None]:
#list comprehensions
squares = [i * i for i in range(10)]

#generator comprehensions
squares = (i * i for i in range(10))


Combining this syntax with conditionals allows us to do complex jobs in 1 line while still remaining readable:

In [1]:
sentence = 'Speaking words of wisdom: let it be'
vowels = [i for i in sentence if i in 'aeiou']
vowels

['e', 'a', 'i', 'o', 'o', 'i', 'o', 'e', 'i', 'e']