## Program control and logic

A program will normally run by executing the stated commands, one after the other in sequential order. Frequently however, you will need the program to deviate from this. There several ways of diverting from the line-by-line paradigm:

- With conditional statements. Here you can check if some statement or expression is true, and if it is then you continue on with the following block of code, otherwise you might skip it or execute a different bit of code.

- By performing repetitive loops through the same block of code, where each time through the loop different values may be used for the variables.

- Through the use of functions (subroutines) where the program’s execution jumps from a particular line of code to an entirely different spot, even in a different file or module, to do a task before (usually) jumping back again. Functions are covered in the next session, so we will not discuss them yet.

- By checking if an error or exception occurs, i.e. something illegal has happened, and executing different blocks of code accordingly

## Code blocks

With all of the means by which Python code execution can jump about we naturally need to be aware of the boundaries of the block of code we jump into, so that it is clear at what point the job is done, and program execution can jump back again. In essence it is required that the end of a function, loop or conditional statement be defined, so that we know the bounds of their respective code blocks.

Python uses indentation to show which statements are in a block of code, other languages use specific `begin` and `end` statements or curly braces `{}`. It doesn't matter how much indentation you use, but the whole block must be consistent, i.e., if the first statement is indented by four spaces, the rest of the block must be indented by the same amount. The Python style guide recommends using 4-space indentation. Use spaces, rather than tabs, since different editors display tab characters with different widths.

The use of indentation to delineate code blocks is illustrated in an abstract manner in the following scheme: 

Statement 1:

    Command A – in the block of statement 1
    Command B – in the block of statement 1
  
    Statement 2:
        Command C – in the block of statement 2
        Command D – in the block of statement 2
  
    Command E – back in the block of statement 1

Command F – outside all statement blocks


## Conditional execution

### The <tt>if</tt> statement

A conditional <tt>if</tt> statement, is used to specify that some block of code should only be executed if some associated test is upheld; a conditional expression evaluates to <tt>True</tt>. This might also involve subsidiary checks using the <tt>elif</tt> statement to use an alternative block if the previous expression turns out to be False. There can even be a final <tt>else</tt> statement to do something if none of the checks are passed. 

The following uses statements that test whether a number is less than zero, greater than zero or otherwise equal to zero and will print out a different message in each case:

In [None]:
x = -3

if x > 0:
  print "Value is positive"

elif x < 0:
  print "Value is negative"

else:
  print "Value is zero"

The general form of writing out such combined conditional statements is as follows:

<pre>
if conditionalExpression1:
    # codeBlock1

elif conditionalExpression2:
    # codeBlock2

elif conditionalExpressionN:
    # codeBlockN
    +any number of additional elif statements, then finally:

else:
    # codeBlockE
</pre>


The <tt>elif</tt> block is optional, and we can use as many as we like. The <tt>else</tt> block is also optional, so will only have the <tt>if</tt> statement, which is a fairly common situation. It is often good practice to include <tt>else</tt> where possible though, so that you always catch cases that do not pass, otherwise values might go unnoticed, which might not be the desired behaviour.

Placeholders are needed for “empty” code blocks:

In [None]:
gene = "BRCA2"
geneExpression = -1.2

if geneExpression < 0:
    print gene, "is downregulated"
        
elif geneExpression > 0:
    print gene, "is upregulated"
        
else:
    pass

For very simple conditional checks, you can write the `if` statement on a single line as a single expression, and the result will be the expression before the `if` if the condition is true or the expression after the `else` otherwise.



In [None]:
x = 11
s = "Yes" if x < 10 else "No"
print s

### Comparisons and truth

With conditional execution the question naturally arises as to which expressions are deemed to be true and which false. For the python boolean values <tt>True</tt> and <tt>False</tt> the answer is (hopefully) obvious. Also, the logical states of truth and falsehood that result from conditional checks like “Is x greater than 5?” or “Is y in this list?” are also clear. When comparing values Python has the standard comparison (or relational) operators, some of which we have already seen:


<table>
    <tr><th>Operator</th><th>Description</th><th>Example</th></tr>
    <tr><td><tt>==</tt></td><td>equality</td><td><tt>1 == 2 # False</tt></td></tr>    
    <tr><td><tt>!=</tt></td><td>non equality</td><td><tt>1 != 2 # True</tt></td></tr>   
    <tr><td><tt><</tt></td><td>less than</td><td><tt>1 < 2 # True</tt></td></tr>
    <tr><td><tt><=</tt></td><td>equal or less than</td><td><tt>2 <= 2 # True</tt></td></tr>    
    <tr><td><tt>></tt></td><td>greater then</td><td><tt>1 > 2 # False</tt></td></tr>    
    <tr><td><tt>>=</tt></td><td>equal or greater than</td><td><tt>1 >= 1 # True</tt></td></tr>    
</table>

It is notable that comparison operations can be combined, for example to check if a value is within a range.

In [None]:
x = -5

if x > 0 and x < 10:
    print "In range A"
    
elif x < 0 or x > 10:
    print "In range B"

Python has two additional comparison operators <tt>is</tt> and <tt>is not</tt>. These compare whether two objects are the same object, whereas <tt>==</tt> and <tt>!=</tt> compare whether values are the same.

As an example in Python:

In [None]:
x = [123, 54, 92, 87, 33]
y = x[:] # y is a copy of x
z = x
print y == x  # values are the same?                   
print y is x  # objects are the same?             
print y is not x  # objects are not the same? 
print z is x  # objects are the same? 


In Python even expressions that do not involve an obvious boolean value can be assigned a status of "truthfulness";  the value of an item itself can be forced to be considered as either True or False inside an if statement. For the Python built-in types discussed in this chapter the following are deemed to be False in such a context:

<table>
    <tr><th>False value</th><th>Description</th></tr>
    <tr><td><tt>None</tt></td><td>numeric equality</td></tr>
    <tr><td><tt>False</tt></td><td>False boolean</td></tr>
    <tr><td><tt>0</tt></td><td>0 integer</td></tr>
    <tr><td><tt>0.0</tt></td><td>0.0 floating point</td></tr>
    <tr><td><tt>""</tt></td><td>empty string</td></tr>
    <tr><td><tt>()</tt></td><td>empty tuple</td></tr>
    <tr><td><tt>[]</tt></td><td>empty list</td></tr>
    <tr><td><tt>{}</tt></td><td>empty dictonary</td></tr>
    <tr><td><tt>set()</tt></td><td>empty set</td></tr>
</table>

And everything else is deemed to be True in a conditional context.

In [None]:
x = ''    # An empty list
y = 'a'   # A list with one item

if x:
    print "x is true"
elif y:
    print "y is true"

__[2.0] Exercises__

1. Create a `if..elif..else` block that will compare a variable containing your age to another variable containing another persons age and print a statement which says if you are younger, older or the same age as that person.
2. Use an `if` statement to check if some variable containing DNA sequence contains a stop codon. (e.g. `dna = "ATGGCGGTCGAATAG"`), first just check for one possible stop, but then extend your code to look for any of the 3 stop codons (`TAG`, `TAA`, `TGA`). Hint: recall that the `in` operator lets you check if a string contains some substring, and returns `True` or `False` accordingly.

## Loops

When an operation needs to be repeated multiple times, for example on all of the items in a list, we 
avoid having to type (or copy and paste) repetitive code by creating a loop. There are two ways of creating loops in Python, the <tt>for</tt> loop and the <tt>while</tt> loop.

#### The <tt>for</tt> loop

The for loop in Python iterates over each item in a sequence (such as a list or tuple) in the order that they appear in the sequence. What this means is that a variable (<tt>code</tt> in the below example) is set to each item from the sequence of values in turn, and each time this happens the indented block of code is executed again.

In [None]:
codeList = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']

for code in codeList:
    print code

A <tt>for</tt> loop can iterate over the individual characters in a string:

In [None]:
dnaSequence = 'ATGGTGTTGCC'

for base in dnaSequence:
    print base

And also over the keys of a dictionary: 

In [None]:
rnaMassDict = {"G":345.21, "C":305.18, "A":329.21, "U":302.16}

for x in rnaMassDict:
    print x, rnaMassDict[x]

Any variables that are defined before the loop can be accessed from inside the loop. So for example to calculate the summation of the items in a list of values we could define the total initially to be zero and add each value to the total in the loop:

In [None]:
total = 0
values = [1, 2, 4, 8, 16]

for v in values:
    total = total + v
    print total

print total

Naturally we can combine a <tt>for</tt> loop with an <tt>if</tt> statement, noting that we need two indentation levels, one for the outer loop and another for the conditional blocks:

In [None]:
geneExpression = {
    'Beta-Catenin': 2.5, 
    'Beta-Actin': 1.7, 
    'Pax6': 0, 
    'HoxA2': -3.2
}

for gene in geneExpression:
    if geneExpression.get(gene) < 0:
        print gene, "is downregulated"
        
    elif geneExpression.get(gene) > 0:
        print gene, "is upregulated"
        
    else:
        print "No change in expression of ", gene

#### The <tt>while</tt> loop

In addition to the <tt>for</tt> loop that operates on a collection of items, there is a <tt>while</tt> loop that simply repeats while some statement evaluates to True and stops when it is False. Note that if the tested expression never evaluates to False then you have an “infinite loop”, which is not good.

In this example we generate a series of numbers by doubling a value after each iteration, until a limit is reached: 

In [None]:
value = 0.25
while value < 8:
  value = value * 2
  print value

print "final value:", value

Whats going on here is that the value is doubled in each loop and once it gets to 8 the while test fails (8 is not less than 8) and that last value is preserved. Note that if the test were instead value `<= 8` then we would get one more doubling and the value would reach 16.

#### Skipping and breaking loops

Python has two ways of affecting the flow of the <tt>for</tt> or <tt>while</tt> loop inside the block. The <tt>continue</tt> statement means that the rest of the code in the block is skipped for this particular item in the collection, i.e. jump to the next iteration. In this example negative numbers are left out of a summation:

In [None]:
values = [10, -5, 3, -1, 7]

total = 0
for v in values:
  if v < 0:
    continue 	# Skip this loop
    
  total += v

print total

The other way of affecting a loop is with the <tt>break</tt> statement. In contrast to the <tt>continue</tt> statement, this immediately causes all looping to finish, and execution is resumed at the next statement _after_ the loop.

In [None]:
geneticCode = {'TAT': 'Tyrosine',  'TAC': 'Tyrosine',
               'CAA': 'Glutamine', 'CAG': 'Glutamine',
               'TAG': 'STOP'}

sequence = ['CAG','TAC','CAA','TAG','TAC','CAG','CAA']

for codon in sequence:
   if geneticCode[codon] == 'STOP':
       break            # Quit looping at this point
   else:
       print geneticCode[codon]

#### Looping gotchas

An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if you delete the current an item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if you insert an item in a sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence.

3 is skipped

In [None]:
values = [1, 2, 2, 4, 5]
for v in values:
    if v == 2:
        values.remove(v)
    print v

print values

In [None]:
valuesCopy = values[:]
for v in values:
    if v == 2:
        valuesCopy.remove(v)
    
print valuesCopy

Similarly you cannot modify dictionary keys while looping through:

In [None]:
rnaMassDict = {"G":345.21, "C":305.18, "A":329.21, "U":302.16}

for key in rnaMassDict:
    mass = rnaMassDict[key]
    
    if mass < 305:
        del rnaMassDict[key]  # Fails! Raises a RuntimeError

__[2.1] Exercises__

1. Create an list where each element is an individual base of DNA. Make the array 15 bases long.
2. Create a for loop to output every base of the sequence on a new line.
3. Create a <tt>while</tt> loop similar to the one above that starts at the third base in the sequence and outputs every third base until the 12th.

## More looping

If you would like to iterate over a numeric sequence then this is possible by combining the `range()` function and a for loop.

In [None]:
print range(10)

print range(5, 10)

print range(0, 10, 3)

print range(7, 2, -2)

Looping through ranges 

In [None]:
for x in range(8):
    print x*x

In [None]:
squares = []
for x in range(8):
    s = x*x
    squares.append(s)
    
print squares

Python also has a convenient syntax for expressing loops that is shorter than a <tt>for</tt> loop. Here is an alternative way to write the loop above.

In [None]:
squares = [x*x for x in range(8)]

print squares

Looping through list indices

In [None]:
codes = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']

for index in range(len(codes)):
    print index, codes[index]

Looping through indices for two lists

In [None]:
more_codes = ['NA06993', 'NA06994', 'NA06995', 'NA06997', 'NA07000']

for index in range(len(codes)):
    print index, codes[index], more_codes[index]

#### Using enumerate

Given a sequence, enumerate() allows you to iterate over the sequence generating a tuple containing each value along with a corresponding index.

In [None]:
letters = ['A','C','G','T']
print enumerate(letters)
for index, letter in enumerate(letters):
    print index, letter

In [None]:
numbered_letters = list(enumerate(letters))

print numbered_letters

#### Using zip

The function zip() returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence.

In [None]:
letters = ['A', 'B', 'C', 'D', 'D']
numbers = [1, 2, 3, 4, 5, 6]

print zip(letters, numbers)

for l, n in zip(letters, numbers):
    print l, n

#### Filtering in loops

In [None]:
city_pops = {
    'London': 8200000,
    'Cambridge': 130000,
    'Edinburgh': 420000,
    'Glasgow': 1200000
}

big_cities = []
for city in city_pops:
    if city_pops[city] >= 1000000:
         big_cities.append(city)

print big_cities

Filtering in a list comprehension loop

In [None]:
big_cities = [city for city in city_pops if city_pops[city] >= 1000000]
print big_cities

In [None]:
total = 0
for city in city_pops:
    total += city_pops[city]
print "total population:", total

In [None]:
pops = [city_pops[city] for city in city_pops]
print pops

print "total population:", sum(pops)

__[2.2] Exercises__

1. Let's calculate GC content of a DNA sequence. Use the 15-base array you created for the exercises above. Create a variable, `gc`, which we will use to count the number of Gs or Cs in our sequence.
2. Create a loop to iterate over the bases in your sequence. If the base is a G or the base is a C, add one to your `gc` variable.
3. When the loop is done, divide the number of GC bases by the length of the sequence and multiply by 100 to get the GC percentage.

## Exceptions

Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called exceptions and unless handled by your program will result in error messages and your program stopping, or _crashing_.

For example:

In [None]:
x = 1/0

It is possible to write programs that handle selected exceptions using a `try` statement.

* Code that might throw an exception is included into the try block and is executed.
* If no exception occurs, execution of the `try` statement continues until completion, and the `except` clause is skipped.
* If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the `except` keyword, the `except` clause is executed, and then execution continues in the outer scope again.
* If an exception occurs which does not match the exception named in the except clause, it is passed on to outer `try` statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.


In [None]:
x = 1 
y = 0 

try:
    z = x + y 
    w = x / y 
    t = x * y 

except ZeroDivisionError:
    
    print "divided by zero"

print "program did not stop"

The error object represents the cause of the exception, and can be printed out to reveal what went wrong.

In [None]:
x = 1 
y = 0 

try: 
    w = x / y 

except ZeroDivisionError, errorObj: 
    print errorObj
    
print "program did not stop"

Common programming actions that throw exceptions:

* opening a file
* reading from a file

Multiple detection

* A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name multiple exceptions as a parenthesized tuple, for example:

In [None]:
x=1
y='3'
try:
    w = x / y
except ZeroDivisionError, errorObj:
    print "divided by zero", errorObj
except TypeError, errorObj:
    print "divided by something silly", errorObj

Putting it all together

In [None]:
x=1
y=0
try:
    w = x / y
except ZeroDivisionError, e:
    print "divided by zero"
except TypeError, errorObj:
    print "divided by something silly", errorObj
finally:
    print "finished with division"

In your own code you may make some assumption that must hold for the rest of your program to make sense. To avoid crashing, you can check that this assumption is true before continuing and raise an exception of your own if it is not. You should supply a useful error message to let you (or someone else) know what went wrong.

In [None]:
if y <= 0: 
    raise Exception("y must be > 0")

Handling exceptions before reraising them

In [None]:
try:
    w = x / y
except ZeroDivisionError, errorObj:
    print "Program quitting due to zero division"
    # do some last moment clean-up
    raise errorObj

Finally: always execute regardless of exceptions (typically cleanup code)

In [None]:
try: 
    w = x / y 
finally: 
    print "finished with division"

__[2.3] Exercises__

1. Modify your code that calculates the GC content of a DNA sequence to raise an exception if there is any non DNA character in the string (i.e. a character that is not one of "A", "C", "G", or "T"). 

## Importing modules and libraries

Like other laguages, Python has the ability to import external modules (or libraries) into the current program. These modules may be part of the standard library that is automatically included with the Python installation, they may be extra libraries which you install separately or they may be other Python programs you have written yourself. Whatever the source of the module, they are imported into a program via an <tt>import</tt> command.

For example, if we wish to access the mathematical constants pi and e we can use the import keyword to get the module named <tt>math</tt> and access its contents with the dot notation:


In [None]:
import math
print math.pi, math.e

Also we can use the `as` keyword to give the module a different name in our code, which can be useful for brevity and avoiding name conflicts:


In [None]:
import math as m
print m.pi, m.e

Alternatively we can import the separate components using the `from … import` keyword combination:

In [None]:
from math import pi, e
print pi, e

We can import multiple components from a single module, either on one line:

In [None]:
from sys import argv, exit

Or on separate lines

In [None]:
from sys import argv 
from sys import exit

Listing module contents:

In [None]:
import math
dir(math)

String instance

In [None]:
dir("mystring")

String class

In [None]:
dir(str)

Quick help on methods:

In [None]:
help(str.title)