#Exception handling

What can go wrong when we are writing code? Formatting errors:

In [None]:
print("starting...")
for base in ['a', 't', 'g', 'c']:
print(base)

will prevent the code from running at all; Python can tell that something is wrong before it starts to execute. 

Things like type errors e.g. using an integer as a string:

In [None]:
print("starting..")
print('abc' + 3)

will cause the program to crash out when it reaches that point. 

Then there are bugs, which will silently give the wrong answer:

In [None]:
dna = 'atctgcatattgcgtctgatg'
a_count = dna.count('A') 
print(a_count)

All these errors are intrinsic to the code and repeatable. 

Some types of errors are caused by the environment:

In [None]:
o = open('missingfile.txt')

When something bad like this happens it's called an **exception**. When writing code, we can decide what to do when an exception occurs.

Warning: extended non-biological examples ahead. 

###Catching exceptions

The default way to deal with exceptions is as above: do nothing and let Python print an error message. If we want to actually do something based on the exception, wrap the code that might cause the exception in a `try` block and put the exception-handling code in an `except` block. 

E.g. to print a more user-friendly message:

In [None]:
try: 
    f = open('misssing.txt') 
    print('file contents: ' + f.read())
except: 
    print("Sorry, couldn't find the file you asked for") 

If the file is there, then lines 2 and 3 run. If the file is missing, then we jump straight from line 2 where the exepction is raised to line 5 inside the `except`. 

Because we have *handled* or *caught* the exception, the program continues running:

In [None]:
try: 
    f = open('misssing.txt') 
    print('file contents: ' + f.read())
except: 
    print("Sorry, couldn't find the file you asked for") 
    
print("still running!")

So we can take some action to recover e.g. ask the user to type in a different file name. 

Problem: the `except` block will catch any type of exception, not just `IOError`. What if there are two things that could go wrong?

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except: 
    print("sorry, couldn't find the file") 

Actually, the file does exist, but it doesn't contain an integer:

In [None]:
!cat my_file.txt

The exception was thrown by the call to `int()`. Better to specify the type of exception to handle:

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except IOError: 
    print("sorry, couldn't find the file") 

Or we can catch multiple different types of exceptions with multiple blocks:

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except IOError: 
    print("sorry, couldn't find the file") 
    # fix the problem somehow...
except ValueError: 
    print("sorry, couldn't parse the number") 
    # fix the problem somehow...

For multiple exception types, use a tuple:

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except (IOError, ValueError): 
    print("sorry, something went wrong") 

####Getting exception information

Exceptions are objects which we can access:

In [None]:
try: 
    f = open('missing.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except IOError as ex: 
    print("sorry, couldn't open the file: " + ex.strerror) 
except ValueError: 
    print("sorry, couldn't parse the number") 

Now we get different error messages depending on the type of problem:
- sorry, couldn't open the file: No such file or directory
- sorry, couldn't open the file: Permission denied
- sorry, couldn't open the file: Is a directory

###More control over exception handling flow

Where to put the line that prints the number (more generally: the code that relies on the lines that might raise an exception)? With the version above, `print(my_number + 5)` might also raise an `IOError`, so it's not a great idea to have it inside the `try` block. Could put it outside:

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
except IOError as ex: 
    print("sorry, couldn't find the file: " + ex.strerror) 
except ValueError as ex: 
    print("sorry, couldn't parse the number: " +  ex.args[0]) 
print(my_number + 5) 

but there's no point trying to print the number if it hasn't been sucessfully read from the file. Solution: use an `else` block:

In [None]:
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
except IOError as ex: 
    print("sorry, couldn't find the file: " + ex.strerror) 
except ValueError as ex: 
    print("sorry, couldn't parse the number: " +  ex.args[0]) 
else:
    print(my_number + 5) 

`else` gets run if the `try` ran with no exceptions. 

What if there's code that needs to run regardless of whether there was an exception or not? Consider this:

In [None]:
import os 

# write some temporary data to a file
t = open('temp.txt', 'w') 
t.write('some important temporary text') 
t.close() 

# do some other processing
f = open('my_file.txt') 
my_number = int(f.read()) 
print(my_number + 5) 

# delete the temporary file
os.remove('temp.txt') 

When the exception is raised by `int()` the program exits and so the temp file does not get cleaned up. Where should we put `os.remove()` if we want to make sure it always runs? Not using `else`, because `else` only runs in the absence of errors. Also not at the end of the code:

In [None]:
import os 
t = open('temp.txt', 'w') 
t.write('some important temporary text') 
t.close() 
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except IOError as ex: 
    print("sorry, couldn't find the file: " + ex.strerror) 
except ValueError as ex: 
    print("sorry, couldn't parse the number: " +  ex.args[0]) 

os.remove('temp.txt')

Because it won't run if an exception is raised inside the `try` but not caught (anything other than `IOError` or `ValueError`). Solution: `finally` blocks are always run:

In [None]:
import os 
t = open('temp.txt', 'w') 
t.write('some important temporary text') 
t.close() 
try: 
    f = open('my_file.txt') 
    my_number = int(f.read()) 
    print(my_number + 5) 
except IOError as ex: 
    print("sorry, couldn't find the file: " + ex.strerror) 
except ValueError as ex: 
    print("sorry, couldn't parse the number: " +  ex.args[0]) 
finally: 
    os.remove('temp.txt')

`finally` blocks are useful for doing clean up code (files, network connections, database connections, logging, etc.). 

###Summary of all exception handling features:

In [None]:
try:
    # code in here will be run until an exception is raised
except ExceptionTypeOne:
    # code in here will be run if an ExceptionTypeOne
    # is raised in the try block
except ExceptionTypeTwo:
    # code in here will be run if an ExceptionTypeTwo 
    # is raised in the try block
else:
    # code in here will be run after the try block 
    # if it doesn't raise an exception
finally:
    # code in here will always be run

##Context managers

Some situations always call for `try/finally` e.g. file reading:

In [None]:
f = open('somefile.txt')
try:
    # do something with f
finally:
    f.close()

We always want to `close()` a fILe after we've opened it. Use a **context manager** to encapsulate this bit of logic:

In [None]:
with open('somefile.txt') as f:
    # do something with f

This File example is by far the most common; we can also write our own. 

###Use with caution: nested `try` blocks

What's wrong with this code:

In [27]:
try: 
    f = open('my_file.txt') # this line might raise an IOError
    my_number = int(f.read()) # this line might raise a ValueError
except IOError: 
    print('cannot open file!') 
except ValueError: 
    print('not an integer!') 
finally: 
    f.close() 

not an integer!


We are attempting to use the file variable `f` inside the `finally` block to close the file, but `f` only exists in the scope of the `try` block, so it's not visible. Nesting two `try` blocks (for `open()` and `int()`) ensures that `f` remains in scope:

In [28]:
try: 
    f = open('my_file.txt') 
    try: 
        my_number = int(f.read()) 
    except ValueError: 
        print('not an integer!') 
    finally: 
        f.close() 
except IOError: 
    print('cannot open file') 

not an integer!


Of course, for this example just use the File context manager. 

##The life of an exception

>Exception bubble up

What does that mean? If we have two functions:

In [30]:
def function_one():
    # do some processing...
    return 5

def function_two():
    my_number = function_one()
    return my_number + 2

print(function_two())

7


and `# do some processing` might raise an exception, we could catch and handle it as discussed:

In [32]:
def function_one():
    try:
        # do some processing...
        return 5
    except SomeException:
        print("Handling exception")
        # handle the exception...
        
def function_two():
    my_number = function_one()
    return my_number + 2

print(function_two())

7


but what happens if we don't? Answer: the exception gets passed *up the stack* to the function that called `function_one()` so we have a second chance to handle it: 

In [34]:
def function_one():
    # do some processing...
    return 5

def function_two():
    try:
        my_number = function_one()
        return my_number + 2
    except SomeException:
        print("Handling exception")
        # handle the exception...

        
print(function_two())

7


and if we don't catch the exception in `function_two()` then it gets passed up again to the top level of code, so we have a third chance to catch and handle it:

In [35]:
def function_one():
    # do some processing...
    return 5

def function_two():
    my_number = function_one()
    return my_number + 2
try:
    print(function_two())
except SomeException:
    print("Handling exception")
    # handle the exception

7


This is what we mean when we say that exceptions bubble up. Handle exceptions in the place where your program can do something about it. 

---

##Raising exceptions

An exception is a signal that something has gone wrong. As well as responding to these signals, our code can create them. To create an exception is simple:

In [36]:
raise ValueError("this is a description of the problem")

ValueError: this is a description of the problem

An example in context will be more useful. Here's a familiar function:

In [37]:
def get_at_content(dna): 
    length = len(dna) 
    a_count = dna.count('A') 
    t_count = dna.count('T') 
    at_content = (a_count + t_count) / length 
    return at_content 

that can only handle AGCT bases. Let's give it some error checking:

In [43]:
from __future__ import division
import re 
def get_at_content(dna): 
    if re.search(r'[^ATGC]', dna): 
        raise ValueError('Sequence cannot contain non-ATGC bases') 
    length = len(dna) 
    a_count = dna.count('A') 
    t_count = dna.count('T') 
    at_content = (a_count + t_count) / length 
    return at_content 

print(get_at_content('ATCGCTGTTATCGACTGACT'))
print(get_at_content('ATCGCTGANCGACTGATTCT'))

0.55


ValueError: Sequence cannot contain non-ATGC bases

Now we can make use of this. For example, given a large collection of sequences we don't want a single "bad" sequence to cause the whole program to crash:

In [44]:
sequences = ['ACGTACGTGAC', 'ACTGCTNAACT', 'ATGGCGCTAGC'] 
for seq in sequences: 
    print('AT content for ' + seq + ' is ' + str(get_at_content(seq)))

AT content for ACGTACGTGAC is 0.454545454545


ValueError: Sequence cannot contain non-ATGC bases

So we can wrap the call to `get_at_content()` in a `try` block:

In [48]:
for seq in sequences: 
    try: 
        print('AT content for ' + seq + ' is ' + str(get_at_content(seq)))
    except ValueError: 
        print('skipping invalid sequence '+ seq) 

AT content for ACGTACGTGAC is 0.454545454545
skipping invalid sequence ACTGCTNAACT
AT content for ATGGCGCTAGC is 0.363636363636


Problem: what happens if something else occurs which causes a `ValueError` (there are lots of things that can do this):

In [57]:
for seq in sequences: 
    try: 
        number = int('five')
        print('AT content for ' + seq + ' is ' + str(get_at_content(seq)))
    except ValueError as ex: 
        print('skipping invalid sequence '+ seq) 

skipping invalid sequence ACGTACGTGAC
skipping invalid sequence ACTGCTNAACT
skipping invalid sequence ATGGCGCTAGC


We get incorrect messages. The problem is that `ValueError` is too generic. We could use the error message to distinguish:

In [59]:
for seq in sequences: 
    try: 
        print('AT content for ' + seq + ' is ' + str(get_at_content(seq)))
        number = int('five')
    except ValueError as ex: 
        print('something went wrong with sequence '+ seq) 
        print("sorry, couldn't parse the number: " +  ex.args[0]) 

AT content for ACGTACGTGAC is 0.454545454545
something went wrong with sequence ACGTACGTGAC
sorry, couldn't parse the number: invalid literal for int() with base 10: 'five'
something went wrong with sequence ACTGCTNAACT
sorry, couldn't parse the number: Sequence cannot contain non-ATGC bases
AT content for ATGGCGCTAGC is 0.363636363636
something went wrong with sequence ATGGCGCTAGC
sorry, couldn't parse the number: invalid literal for int() with base 10: 'five'


But that doesn't really help us to recover from the error. We need a custom exception to signal a specific error:

In [60]:
class AmbiguousBaseError(Exception): 
    pass 

A custom exception type is surprisingly simple - just a class that inherits from `Exception`, it doesn't even need a body. We can start using it:

In [64]:
def get_at_content(dna): 
    if re.search(r'[^ATGC]', dna): 
        raise AmbiguousBaseError('Sequence cannot contain non-ATGC bases') 
    length = len(dna) 
    a_count = dna.count('A') 
    t_count = dna.count('T') 
    at_content = (a_count + t_count) / length 
    return at_content 
 
sequences = ['ACGTACGTGAC', 'ACTGCTNAACT', 'ATGGCGCTAGC'] 
for seq in sequences: 
    try: 
        print('AT content for ' + seq + ' is ' + str(get_at_content(seq)))
    except AmbiguousBaseError: 
        print('skipping invalid sequence '+ seq) 

AT content for ACGTACGTGAC is 0.454545454545
skipping invalid sequence ACTGCTNAACT
AT content for ATGGCGCTAGC is 0.363636363636


Now we will only catch `AmbiguousBaseError`, any other exception can be dealt with separately. 

##Exercises

###Responding to exceptions

Here's a piece of code that reads a DNA sequence from a file and splits it up into a number of equal sized pieces. It asks the user to enter the name of the file and the number of pieces, calculates the length of each piece (by dividing the total length by the number of pieces), then uses a `range()` to print out each piece:

In [None]:
# ask the user for the filename, open it and read the DNA sequence
input_file = raw_input('enter filename:\n') 
f = open(input_file) 
dna = f.read().rstrip("\n") 

# ask the user for the number of pieces and calculate the piece length
pieces = int(raw_input('enter number of pieces:\n')) 
piece_length = int(len(dna) / pieces) 
print('piece length is ' + str(piece_length)) 

# print out each piece of DNA in turn
for start in range(0, len(dna)-piece_length+1, piece_length): 
    print(dna[start:start+piece_length]) 

As you will see if you play around with this code, it's quite easy to break it:
- give a nonexisting file name
- give zero for the number of pieces
- give 'banana' for the number of pieces

Here's the same code with error-checking:

In [None]:
import os 
import sys 
 
# check for valid filename
input_file = raw_input('enter filename:\n') 
if not os.path.isfile(input_file): 
    sys.exit('not a valid filename') 
 f = open(input_file) 
dna = f.read().rstrip("\n") 

# check for valid number
pieces = raw_input('enter number of pieces:\n') 
if not pieces.isdigit(): 
    sys.exit('not a valid number') 

# check that number is not zero or negative
pieces = int(pieces) 
if pieces < 0: 
    sys.exit('number of pieces must be greater than zero') 

# do the processing
piece_length = len(dna) / pieces 
print('piece length is ' + str(piece_length)) 
for start in range(0, len(dna)-piece_length+1, piece_length): 
    print(dna[start:start+piece_length]) 

Rewrite this code to use exceptions rather than ugly `if` statements. Hint: you can find out what type of exception each invalid input will raise by trying it using the original code.

###Bonus exercise: Exceptions for the SequenceRecord class

Take a look back at the classes that we designed for working with DNA and protein sequences in the chapter on object-oriented programming. Reminder:



In [None]:
class DNARecord(object): 
    
    def __init__(self, sequence, gene_name, species_name):
        self.sequence = sequence
        self.gene_name = gene_name
        self.species_name = species_name

The constructor has no error-checking, so there's nothing to stop us doing things like...

In [None]:
# invalid bases in the sequence
d = DNARecord('ATGYCNNCR', 'COX1', 'Homo sapiens')

# an empty string for the gene name
d = DNARecord('ATGCGGTGA', '', 'Homo sapiens')

# an incorrectly-formatted species name
d = DNARecord('ATGCGGTGA', 'COX1', 'homosapiens')

# non-string properties
d = DNARecord(3.1415, 42, -1)

Add error-checking using exceptions to the class definition.