# Programming: Flow Control

## Decision Making

While the data structures deal with storing data for further processing, and after processing, the data processing involves making decisions about how the data moves within the program. 

Sometimes we do write programs where the program does not make any decision, and the data flows linearly through each line of the code as they get executed one after the other.  

For example, if we write an interactive program that asks the users to input there first and last names separated by a single white space and then prints it.

In [None]:
#!/usr/bin/env python3
# Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
first_last_name = input("Enter your first and last name seperated by whitespace: ")
# Make a list contaiining both first and last names by splitting the input
names_list = first_last_name.split()
# Extract both names in variables.
first_name = names_list[0]
last_name = names_list[-1]
# Pass both names to the output stream.
print("""First name : {}
Last name : {}""".format(first_name, last_name))
# Tell the user that they have reached the end of the program.
print("The program ends here.")

The code does not make any decision by itself, and each line gets executed no matter what.

However, we can write the code in a manner that it checks the input, and only prints the name if it is in the correct format.

## if Statement
The **if statement** is a conditional flow control structure. It acts upon the result of an evaluation. **if** evaluates the expression next to it. When true, the block of code after the **if** clause is executed. Otherwise, the program passes over the **if block** and moves to the next code block.

<img src="If-Statement-in-Python.png" alt="Drawing" style="width: 400px;"/>

In [None]:
#!/usr/bin/env python3
dna=input('Enter DNA sequence:')
if 'n' in dna :
    nbases=dna.lower().count('n')
    print("dna sequence has {} undefined bases ".format(nbases))
print(dna.upper())

1. The expression must return **True** or **False**. Such expressions are called a **Boolean expression**, which are formed with the help of **comparison**, **identity** and **membership operators**.
2. The colon (:) after the expression is important.
3. The block governed by the if statement has to be indented, and only gets executed if the expression evalutes to True.

#### Membership operators

In [None]:
motif='gtccc'
dna='atatattgtcccattt'
motif in dna

In [None]:
motif not in dna

#### Comparison operators

| Comparison | Operator |
| --- | --- |
| Equal | == |
| Not equal | != |
| Less than | < |
| Greater than | > |
| Less than or equal to | <= |
| Greater than or equal to | >= |

In [None]:
0<1

In [None]:
len('atgcgt')>=10

In [None]:
'a'=='A'

In [None]:
'GT' != 'AG'

In [None]:
'A'<'C'

In [None]:
10+1==11

####  Do not confuse the assignment sign **=** with the comparison operator **==** 

#### Identity operators
The identity operator `is` tests if the two compared objects reside at the same memory address.

| Operator | Description |
| --- | --- |
| is | True if the variables on either side of the operator point to the same object and false otherwise. |
| is not | False if the variables on either side of the operator point to the same object and true otherwise. |

In [None]:
alphabet=['a', 't', 'g', 'c']

In [None]:
# Make a copy and check equality and identity
newalphabet=alphabet[:]
alphabet == newalphabet, alphabet is newalphabet

In [None]:
# Take a reference (another name for the same variable)
alphabet_reference = alphabet
alphabet_reference == alphabet, alphabet_reference is alphabet

Now we can edit our program to ensure that wrong entries are not printed.

In [None]:
#!/usr/bin/env python3
# Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
first_last_name = input("Enter your first and last name seperated by whitespace: ")
# Make a list contaiining both first and last names by splitting the input
names_list = first_last_name.split()
# Check if the list has just two elements, i.e. length of the names_list is 2.
if len(names_list) == 2:
    #Now we know that there are only two elements in the list so execute the following line
    # Extract both names in variables
    first_name = names_list[0]
    last_name = names_list[-1]
    #Pass both names to the output stream
    print("""
    First name : {}
    Last name : {}""".format(first_name, last_name))
# Tell the user that they have reached the end of the program.
print("The program ends here.")

## if-else 

An alternative block of code gets executed if the evalution of the expression next to the if statement does not return True.

<img src="If-Else-in-Python.jpg" alt="Drawing" style="width: 400px;"/>

Now we can inform the user that they have entered an incorrect format.

In [None]:
#!/usr/bin/env python3
# Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
first_last_name = input("Enter your first and last name seperated by whitespace: ")
# Make a list contaiining both first and last names by splitting the input
names_list = first_last_name.split()
# Check if the list has just two elements, i.e. length of the names_list is 2.
if len(names_list) == 2:
    #Now we know that there are only two elements in the list so execute the following line
    # Extract both names in variables
    first_name = names_list[0]
    last_name = names_list[-1]
    #Pass both names to the output stream
    print("""
    First name : {}
    Last name : {}""".format(first_name, last_name))
else:
    # Since the list does not have just two memebers, tell the users that they did not use the correct format
    print('You did not use the correct format. Please try again!')
print("The program ends here.")

## Elif and Nested if-else

To test multiple expressions one after the other we use **elif**, which is short of **else if**. 

We can tell the user if they have entered more/less than two names.

In [None]:
#!/usr/bin/env python3
# Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
first_last_name = input("Enter your first and last name seperated by whitespace: ")
# Make a list contaiining both first and last names by splitting the input
names_list = first_last_name.split()
# Check if the list has just two elements, i.e. length of the names_list is 2.
if len(names_list) == 2:
    #Now we know that there are only two elements in the list so execute the following line
    # Extract both names in variables
    first_name = names_list[0]
    last_name = names_list[-1]
    #Pass both names to the output stream
    print("""
    First name : {}
    Last name : {}""".format(first_name, last_name))
elif len(names_list) > 2:
    # Tell the user that they have entered more than two names
    print('You entered more than two names. Please try again!')
else:
    # Tell the user that they have entered less than two names
    print('You entered less than two names. Please try again!')
print("The program ends here.")

To test expressions that are subsets of a previous expression we use **nested if-else**.

In [None]:
#!/usr/bin/env python3
# Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
first_last_name = input("Enter your first and last name seperated by whitespace: ")
# Make a list contaiining both first and last names by splitting the input
names_list = first_last_name.split()
# Check if the list has just two elements, i.e. length of the names_list is 2.
if len(names_list) == 2:
    #Now we know that there are only two elements in the list so execute the following line
    # Extract both names in variables
    first_name = names_list[0]
    last_name = names_list[-1]
    #Pass both names to the output stream
    print("""
    First name : {}
    Last name : {}""".format(first_name, last_name))
elif len(names_list) > 2:
    # Tell the user that they have entered more than two names
    print('You entered more than two names. Please try again!')
else:
    if len(names_list) == 0:
        # Tell the user that they have did not enter any name
        print('You did not enter any name. Please try again!')
    elif len(names_list) == 1:
        # Tell the user that they have enterd just one name
        print('You either entered just one name or forgot the whitespace. Please try again!')
print("The program ends here.")

### Logical operators

| Operator | Description
| --- | --- |
| and | True if both conditions are true |
| or | True if at least one condition is true |
| not | True if condition is false |

In [None]:
#!/usr/bin/env python
dna=input('Enter DNA sequence:')
if 'n' in dna or 'N' in dna:
    #counts undefined bases for both upper and lower cases
    nbases=dna.count('n')+dna.count('N')
    print("dna sequence has {} undefined bases ".format(nbases))
else:
    print("dna sequence has no undefined bases")
print(dna)

# Loops

*The Njoy FM station seems to be playing the same songs on a loop.*

Loops are the python control structures that allow us to perform the same task repeatedly.

Python has two loops, **for** loop and **while** loop.


## while Loop

A `while` loop keeps executing a block of code inside it, until the condition
given to it returns `False`. Note that the condition must evaluate to `True` to enter the body of the while loop in the first case.

<img src="while-loop.jpg" alt="Drawing" style="width: 400px;"/>

#### The block of code inside the while loop should have an instruction that leads to a false return for the given expression. Otherwise the loop will never end.

In [None]:
n = 1
while n < 10:
    print(n)
    n = n + 1

We can keep asking the user to enter their name in the correct format.

In [None]:
#!/usr/bin/env python3
#intiate an empty names_list
names_list = []
#Repeat till user prvides the names in correct format
while len(names_list) != 2:
    # Ask the user to enter their first and last name seperated by a white space and store it in the variable named 'first_last_name'.
    first_last_name = input("Enter your first and last name seperated by whitespace: ")
    # Reassign the names_list by splitting the input
    names_list = first_last_name.split()
    # Check if the list has just two elements, i.e. length of the names_list is 2.
    if len(names_list) == 2:
        #Now we know that there are only two elements in the list so execute the following line
        # Extract both names in variables
        first_name = names_list[0]
        last_name = names_list[-1]
        #Pass both names to the output stream
        print("""
        First name : {}
        Last name : {}""".format(first_name, last_name))
    else:
        # Since the list does not have just two memebers, tell the users that they did not use the correct format
        print('You did not use the correct format. Please try again!')
print("The program ends here.")

## for Loop

Python’s for loop iterates over the items of any sequence (a list, a string, or a tuple) or any other iterable objects, in the order that they appear in the sequence.

<img src="for-loop.jpg" alt="Drawing" style="width: 400px;"/>

In [None]:
motifs=["attccgt","agggggtttttcg","gtagc"]
for m in motifs:
    print(m, len(m))    

### range() Fucntion
The range() built-in function allows you to iterate over a sequence of numbers.

In [None]:
for n in range(10):
    print(n)

**range(start, stop, step)** can be used to define start, stop, and steps (length of the interval).

In [None]:
for n in range(1,10,2):
    print(n)

#### **Problem**: Find if all characters in a given protein sequence are valid amino acids.

In [None]:
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA'
#for each character in protein sequence
for i in range(len(protein)):
    #if character is not amino acid
    if protein[i] not in 'ABCDEFGHIKLMNPQRSTVWXYZ':
        #print invalid character and its position in protein
        print("protein contains invalid amino acid {} at position {}".format(protein[i],i))

### break Statement
The break statement terminates the nearest enclosing loop (a `for` loop or a `while` loop).

<img src="flowchart-break-statement.jpg" alt="Drawing" style="width: 400px;"/>

<img src="how-break-statement-works.jpg" alt="Drawing" style="width: 400px;"/>

In [12]:
### COMMENT: better would be to use enumerate, see below.
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA'
#for each character in protein sequence
# That
for i in range(len(protein)):
    #if character is not amino acid
    if protein[i] not in 'ABCDEFGHIKLMNPQRSTVWXYZ':
        #print this is not a valid sequence
        print(" this is not a valid protein sequence!")
        #break out of the loop
        break

 this is not a valid protein sequence!


A more elegant solution to accessing the index and the element of a list in a for loop is to use the `enumerate` builtin function. The `enumerate` function takes a list or other `iterable` object as an argument and returns for each element a tuple that consists of the index and the element itself. 


Using the `enumerate` function, the above example becomes

In [11]:
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA'
#for each character in protein sequence
# That
for idx, aa in enumerate(protein):
    #if character is not amino acid
    if aa not in 'ABCDEFGHIKLMNPQRSTVWXYZ':
        #print this is not a valid sequence
        print("Found invalid character {} at position {} in the given sequence.\
              This is not a valid protein sequence!".format(aa, idx))
        #break out of the loop
        break

Found invalid character U at position 8 in the given sequence.              This is not a valid protein sequence!


## continue Statement
The continue statement causes the program to continue with the next iteration of the nearest enclosing loop, skipping the rest of the code in the loop.

<img src="continue-statement-flowchart.jpg" alt="Drawing" style="width: 400px;"/>

<img src="how-continue-statment-works.jpg" alt="Drawing" style="width: 400px;"/>

#### **Problem**: Delete all invalid amino acid characters from a protein sequence.

In [None]:
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA'
corrected_protein=''
for i in range(len(protein)):
    if protein[i] not in 'ABCDEFGHIKLMNPQRSTVWXYZ':
        # If the residue is not a valid residue continue to the next residue
        continue
    # If the residue is a valid residue add it to the corrected_proteins
    corrected_protein=corrected_protein+protein[i]
print("Corrected protein sequence is: {}".format(corrected_protein))

### pass Statement

1. Python's pass statement is a placeholder: it does nothing.
2. It is used when a statement is required syntactically but you do not want any command or code to execute.

3. Sometimes you can use the pass statement if you didn’t yet write the code but need the placeholder so that the syntax of the rest of your program is correct.

## else Statement with Loops

Loop statements may have an **else** clause.

•  If used with a for loop, the else statement is executed when the loop has exhausted iterating the list.

•  If used with a while loop, the else statement is executed when the condition becomes false.

####  The else statement is not executed if the loop is terminated by the break statement!

#### **Problem**: Find all prime numbers smaller than a given integer.

In [None]:
N=10
for y in range(2, N): # should skip all even numbers :)
    for x in range(2, y):
        if y % x == 0:
            print(y, 'equals', x, '*', y//x)
            break
    else:
    # loop fell through without finding a factor ... print(y, 'is a prime number')
        print(y, 'is a prime number')

## References

* [Python for Bioinformatics](https://www.routledge.com/Python-for-Bioinformatics/Bassi/p/book/9781138035263)