# Loops

The power of computing lies in the ability to perform lots of repetitive tasks very fast - we're learning to program so that we can process large biological datasets, perhaps from genomics, transcriptomics or proteomics data. 

If we were to write out the code in full for each of these tasks, the program would be as repetitive and tedious as the task we're trying to do. 

Let's say we have a list of DNA (oligonucleotide) sequences and we want to print out each item in the list. Writing out each item of our list of oligonucleotide sequences as shown below would be tedious, especially with a much longer list. In addition, if the number of elements in our list increased or decreased, we would need to edit our code, and we couldn't write code this way if we didn't know the number of elements that were going to be in the list.

In [1]:
oligos_list = ["AGTCGT", "GCTAGC", "TAGTAG", "AGAATC"]
#print all the oligos in the list, index 0 to 3
print(oligos_list[0])
print(oligos_list[1])
print(oligos_list[2])
print(oligos_list[3])

AGTCGT
GCTAGC
TAGTAG
AGAATC


Instead, programmers make use of **loops** to repeatedly process data without having to write out the code in full. The first loop we will encounter is the `for` loop.

`for` loops repeat a block of indented code for each item in a sequence. For example, the code could be repeated for each element (item) in a list, for each key in a dictionary or for each character in a string

```python
for iterating_variable in sequence:
	#repeat this indented block
```

<div class = "alert alert-info">
**Indents - use 4 spaces instead of tabs** <br>
Indents in Python code can be made with tabs or 4 spaces, however if your Python code contains both tabs and spaces, it will produce an error when the code is run. To try to prevent this and make sure you use the recommended multiples of 4 spaces for indentation, most Python text editors and development environments including Jupyter, will insert 4 spaces when you press the tab key.
</div>

The `for` loop defines an iterating variable - this will be set to the first item in the sequence, and the block is then executed. If there is another item in the sequence, the iterating variable will then be set to the value of that item and the code block will be executed again. This is repeated until the sequence is exhausted. 

The process of repeating a block of code for each element of a list (or other data type such as string or dictionary that contain sequences of items) is called iteration.

The following code repeats the indented code block for each element in the list. 

In [4]:
oligos_list = ["AGTCGT", "GCTAGC", "TAGTAG", "AGAATC"]
for oligo in oligos_list:
    print(oligo)

AGTCGT
GCTAGC
TAGTAG
AGAATC


This code produces exactly the same output as the code before, is shorter and more readable.  Unlike the example before, this code is flexible as you don't need to know how many elements there are in the list, since it just repeats the same block for each list element (oligo sequence).

<div class="alert alert-info">Note: we could call the iterating variable a wide variety of names, as long as we refer to the same variable name within the block if we want to access it (see later in this section for an example that illustrates this - 'a very common misunderstanding'). As for all variables, it is wise to use a mneumonic name, and you must avoid reserved words. You should also not use `string`, `list`, `dict` etc as these could be confusing and could lead to unexpected errors or results. 
Loops can also be used to repeat a block of code a set number of times, by using `range()` to specify a set of numbers to iterate over.
</div>


In [4]:
for number in range(1, 6):
    print("This block has been executed {} times".format(number))

This block has been executed 1 times
This block has been executed 2 times
This block has been executed 3 times
This block has been executed 4 times
This block has been executed 5 times


### Looping through a list

As we saw in the sample above, looping through a list is very simple. Building on the oligos example above, instead of printing each oligo to the screen, we'll calculate the total number of nucleotides in the whole list by calculating the length of each oligo and adding them together. Note that we need to unindent the `print(nucleotide_count)` function call - experiment to see what would happen if it were part of the loop (i.e. indented).

In [7]:
oligos_list = ["AGTCGT", "GCTAGC", "TAGTAG", "AGAATC"]
nucleotide_count = 0 #First we need to set a variable to store our count - note this is done BEFORE starting the loop

for oligo in oligos_list:
    #Now we use the increment operator (+=) to add the length of the oligo to the variable nucleotide_count
    nucleotide_count += len(oligo)
    
#After the loop is finished, print the value stored in nucleotide_count
print("Total number of nucleotides in all oligos is {}".format(nucleotide_count))

Total number of nucleotides in all oligos is 24


<div class="alert alert-info">
**Indentation matters**</br>
When we write a loop - the code within the indented block is run repeatedly, and the next unindented statement is run after the loop has finished. Experiment to see what happens if the line
```print("Total number of nucleotides in all oligos is {}".format(nucleotide_count))```
line is indented so that it becomes part of the loop.
</div>

### Looping over the characters of a string

Since strings are also contain sequences of multiple items, we can iterate through them using a `for` loop in the same way as we would for a list. 


In [3]:
dna_sequence = "ATGCATAGTAA"
t_base_count = 0 #set variable to be used to count Ts

for base in dna_sequence:
    if base=="T":
        t_base_count += 1 #add another T to count
        
print("There are {0} thymidine residues in {1}".format(t_base_count, dna_sequence))

There are 3 thymidine residues in ATGCATAGTAA


### A very common misunderstanding

Many newcomers to Python incorrectly believe that the name of the iterating variable influences the function of the code. Iterating variables are usually chosen as mneumonics to indicate what the variable will contain, however it is useful to realise that the following two loops are equivalent and do the same thing. 

Experiment with the examples below and you will see that any non-reserved word can be used as an iterating variable - as long as you change the variable within the loop to match.

In [1]:
DNA_sequence = "AGCTAGCGGCATC"
for nucleotide in DNA_sequence:
    print(nucleotide)

A
G
C
T
A
G
C
G
G
C
A
T
C


In [2]:
DNA_sequence = "AGCTAGCGGCATC"
for goat in DNA_sequence:
    print(goat)

A
G
C
T
A
G
C
G
G
C
A
T
C


### Loop Control Statements

`break` and `continue` statements can be used in both while and for loops. They both exit the current iteration of a loop, but the `continue` statement leads to the execution of the next iteration of a loop, whereas the `break` statement exits a loop completely.

The `continue` statement returns to the beginning of the loop. When the `continue` statement is executed, the remainer of the loop is abandoned for the current iteration of the loop.

In [4]:
dna_sequence = "ATGCAT"
for base in dna_sequence:     
   if base == 'C':
      continue
   print('Current Base: {}'.format(base))

Current Base: A
Current Base: T
Current Base: G
Current Base: A
Current Base: T


When a `break` statement is executed, it exits the current loop without executing any of the reminder of the current iteration, or any future iterations of that loop. 


In [16]:
dna_sequence = "ATGCAT"
for base in dna_sequence:     
   if base == 'C':
      break
   print('Current Base: {}'.format(base))

Current Base: A
Current Base: T
Current Base: G


#### reversed()

The reversed function reverses the order of iteration through a string (it can also be used on lists):

In [10]:
sentence = "?was I tac a ti saW"
for letter in reversed(sentence):
    print(letter, end='')

Was it a cat I saw?

n.b. In the above example, we have added `end=''` to the print function - this prevents the print function printing each letter on a new line. 

### While loops

While loops repeat the contents of a code block **while** a condition is True. Most loops could be written using either `for` or `while`. In general, you should use whichever is simplest - where you are iterating over a defined object (string, list, etc), you are generally better to use `for`.

|Loop Type	|Description|
|:-------|:------|
|`for` loop	|Executes a block of code multiple times until a sequence has been exhausted. The current item from the sequence is made available through the iterative variable defined in the `for` loop.|
|`while` loop	|Repeats a block of code while a given condition is True. The condition is evaluated before executing the loop block.|

In [5]:
count = 0
while (count < 5):
   print('The count is {}'.format(count))
   count = count + 1


The count is 0
The count is 1
The count is 2
The count is 3
The count is 4


# Exercises

1) The `%` (modulo) operator yields the remainder from the division of the first argument by the second. Write a program to calculate whether each of the following numbers is an even number, returning True if even and False if odd. 

test_numbers = [1,39,93, 28, 32, 42, 52, 19, 48, 84]

***Hint*** <br>
`print(6%2) #prints 0` <br>
Dividing a even number by 2 will give no remainder. 

2) Write a program that will reverse a DNA sequence (it does not need to give the reverse complement). Use a loop, not extended slices ([::-1]).

3) Modify your program from 2) above so that it misses out thymidines (T) from the reversed sequence. Hint: use `continue`
