# Day 1: Loops

Loops are essential to process large amounts of data. With loops you can search through strings, lists, sets, dictionarys and many more. The most relevant loopis the `for` loop.

In [1]:
for amino_acid in "TEWQIPFV":
    print(amino_acid)

T
E
W
Q
I
P
F
V


The loop stops when it has processes every character in the protein sequence above.
With the `range()` expression you can define how often a command is executed:

In [5]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [3]:
for i in range(10):
    print("Hello World!")

Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!


When we combine loops with conditionals, we can create very useful applications. In the example below we want to count the number of thymines present in the genetic sequence:

In [4]:
genetic_sequence = 'TTAATCTTGTGATACGATATGAGA'
T_count = 0

for nucleotide in genetic_sequence:
    if nucleotide == 'T':
        T_count += 1
        
print(T_count)

9


 `for` loops can not run forever, because you need to define an end point (length of a list, string or `range()` statement). 
 `while` loops are different, since they stop only if a condition chosen by you is `True`

In [6]:
number = 0

while number < 10:
    print(number)
    number = number + 1 

0
1
2
3
4
5
6
7
8
9


If you want to count thymine nucleotides, you can actually rewrite the program above with a `while` loop:

Sidenote: the expression `i += 1` is equivalent to `i = i + 1`

In [9]:
T_count_while = 0 
i = 0 

while i < len(genetic_sequence):
    if genetic_sequence[i] == "T":
        T_count_while += 1 
    i += 1

print(T_count_while)
#Are the T_counts equal for both loops?
print(T_count_while, T_count)
print(i, len(genetic_sequence))

9
9 9
24 24


It is also possible to have indefinite while loops (we use the `time` package to slow down the execution). We use the `break` statement to stop the loop execution.

In [15]:
import time

i = 0 

while True:
    i += 1
    print(i)
    time.sleep(0.5)
    if i == 21:
        break

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21


If we run the program without the `if` statement, it would run indefinitely. Finally, the computer would stop the program or crash. Therefore: be careful with your `while` loops.

Filling up lists with loops is easy. To create a list of numbers from 0-9 just use:

In [22]:
int_list = []
for i in range(10):
    int_list.append(i)
print(int_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


You often use loops to search through lists or strings. If you have a list of strings, you can even search through each individual string in a second loop. 

In [16]:
list_of_primers = ["CGCAAATGGGCGGTAGGCGTG","GACTATCATATGCTTACCGT","CAGGAAACAGCTATGAC"]
list_of_t_counts = []

for sequence in list_of_primers:
    t_count = 0
    for nucleotide in sequence:
        if nucleotide == 'T':
            t_count += 1
    list_of_t_counts.append(t_count)

print(list_of_t_counts) 

[3, 7, 2]


It would be better if we could directly combine the primers with their respective t_count. For that, we use the `zip()` function. We need to put the `zip`
output in a new list, then we can print it. This combines both lists into one, creating tuples in a list.  

In [19]:
list(zip(list_of_primers, list_of_t_counts))

[('CGCAAATGGGCGGTAGGCGTG', 3),
 ('GACTATCATATGCTTACCGT', 7),
 ('CAGGAAACAGCTATGAC', 2)]

We could also create a `dictionary` with keys and values:

In [20]:
dict(zip(list_of_primers, list_of_t_counts))

{'CGCAAATGGGCGGTAGGCGTG': 3, 'GACTATCATATGCTTACCGT': 7, 'CAGGAAACAGCTATGAC': 2}

Sometimes it is necessary to combine the elements of a list into one string. In the example below we use the `.join()` function to 
create a RNA sequence out of RNA characters in a list:

In [23]:
rna_letters = ['U', 'C', 'G', 'U', 'G', 'U', 'C', 'A', 'G', 'U', 'G', 'A', 'G', 'A', 'C']
rna_sequence = "".join(rna_letters)

print(rna_sequence)


UCGUGUCAGUGAGAC


We can control what comes in between the characters by defining a string before the `.join` statement.

In [24]:
rna_sequence = "____".join(rna_letters)
print(rna_sequence)

U____C____G____U____G____U____C____A____G____U____G____A____G____A____C
