In [None]:
import imageio
import pandas as pd

## Introduction to error handling

Errors in python are called 'exceptions'. When errors occur, we say that the code as 'thrown' or 'raised' an exception. 

Errors can occur for many reasons, and Python comes with many [built-in exceptions](https://docs.python.org/3/library/exceptions.html) that are specific to different situations. Here are some examples:

In [None]:
# syntax errors (here, a missing colon)
for ind in (1, 2, 3]:
    print(ind)

In [None]:
# referring to a variable that is not defined
print(some_variable)

In [None]:
# adding a string to a number
1 + '1'

In [None]:
# dividing by zero
1/0

In [None]:
# opening a file that doesn't exist
open('/some/file.txt')

In [None]:
# accessing a list element that doesn't exist
[1, 2, 3][2]

In [None]:
# accessing a dictionary item that doesn't exist
{'a': 1, 'b': 2}['c']

### 'Error handling' means preventing, anticipating, and intercepting errors

There are two ways to do this:

1) Use Python's syntax for intercepting or 'catching' exceptions. This is called a 'try-except' block (see below).

2) Write code to prevent errors from occuring in the first place and/or raise exceptions when necessary.

In [None]:
# here is the syntax of the try-except block

try:
    # line or block of code that may raise an exception
except SomeExceptionType:
    # what to do if an exception of the specified type was raised

#### A simple example
Suppose we want to calculate a rate from a count and a total number, but we don't want to assume that the variables are numbers. We can 'wrap' the line that calculates the ratio and then check for a `TypeError` exception (which is the kind of exception we'd expect if the variables were not numbers). 

In [None]:
def calculate_rate(count, total):
    rate = count/total
    return rate

In [None]:
calculate_rate(30, 0)

In [None]:
def calculate_rate(count, total):
    rate = None
    try:
        rate = count/total
    except TypeError as error:
        print('Error: count and/or total are not numbers')
    except ZeroDivisionError:
        print('Error: total cannot be zero')
    return rate

In [None]:
calculate_rate(30, 0)

In [None]:
# we can also catch *all* errors, regardless of type
def calculate_rate(count, total):
    rate = None
    try:
        rate = count/total
    except Exception as error:
        print('Error calculating the rate: %s' % error)
    return rate

In [None]:
calculate_rate(30, '150')

#### The (better) alternative to try-except: preventing and anticipating errors

Another, and generally better, approach to error handling is to prevent errors from occuring in the first place by explicitly checking that variable types and values are correct.

In this example, this would mean explicitly checking that the `count` and `total` variables are the expected type, and that `total` is not zero (which does not make sense and would yield a division-by-zero error). This kind of validation also has the added benefit of implicitly documenting how the function is intended to behave.

In [None]:
def calculate_rate(count, total):
    
    if type(count) is not int:
        raise TypeError('count must be an integer')

    if type(total) is not int:
        raise TypeError('total must be an integer')
    
    if total == 0:
        raise ValueError('total cannot be zero')

    rate = count/total
    return rate

In [None]:
calculate_rate(30, '')

#### Striking a balance between preventing and catching errors

Using try-except blocks is much less work than preventing errors, but it is also impossible to anticipate and prevent all possible errors. The right balance is inevitably a matter of judgement and depends on how robust and user-friendly a function needs to be. 

As a general rule, it's best to include at least some error prevention logic, at least in the form of validating variable types and values, and to rely on try-except only either as last resort to catch random/weird errors that would be hard or impossible to anticipate or when calling external functions in whose behavior you aren't confident.

Here is one such example of the latter scenario, using an external function provided by the `imageio` package to save an array as an image. Because it is hard to anticipate how this function will handle different inputs, it is wise to wrap it in a try-except block. 

In [None]:
def save_image(image):
    
    try:
        im = imageio.imsave('tmp.jpg', image)
        print('Image saved!')
    except Exception as error:
        print('An error occured while saving the image: %s' % error)

In [None]:
# we can save this 2D array without an error
image = [
    [1, 0, 0],
    [0, 3, 0],
    [0, 0, 9]
]

save_image(image)

In [None]:
# what happens if one element of the array is not numeric?
image = [
    [1, 0, 0],
    [0, 3, 0],
    [0, 0, '9']
]

save_image(image)

In [None]:
# what happens if we try to save a string?
save_image('hello-world')

### Error handling (in all its forms) has many benefits

- It helps the user to figure out what happened when things go wrong.

- It makes your code more robust by allowing it to recover from errors that aren't serious enough to warrant exiting.

- It adds a layer of 'safety' when calling external functions whose behavior may not be reliable.

- It implicitly documents your code.

In [None]:
# create dummy CSV files
df = pd.DataFrame([
    ['Plate1', 10],
    ['Plate1', 11],
    ['Plate2', 12],
    ['Plate3', 9],
    ['Plate4', 5],
])

df.columns = ['plate_id', 'num_hits']
df.to_csv('data.csv', index=False)

df.columns = ['plate_id', 'num_hit']
df.to_csv('data-missing-column.csv', index=False)

df = pd.DataFrame([
    ['Plate1', 10],
    ['Plate1', 11],
    ['Plate2', 12],
    ['Plate3', ],
    ['Plate4', 5],
])
df.columns = ['plate_id', 'num_hits']
df.to_csv('data-missing-value.csv', index=False)


df = pd.DataFrame([
    ['Plate1', 10, True],
    ['Plate1', 11],
    ['Plate2', 12],
    ['Plate3',9],
    ['Plate4', 5],
])
df.columns = ['plate_id', 'num_hits', 'qc_flag']
df.to_csv('data-extra-column.csv', index=False)

df.columns = ['plate_id', 'num_hit', 'qc_flag']
df.to_csv('data-messy.csv', index=False)

In [None]:
def load_data(filepath):
    
    # load the data
    df = pd.read_csv(filepath)
    
    # coerce the num_hits column to integers
    df['num_hits'] = df.num_hits.astype(int)
    
    # calculate the success rate 
    num_wells = 96
    df['success_rate'] = df['num_hits']/num_wells
    
    return df

In [None]:
load_data('data-missing-value.csv')

In [None]:
def load_data_safer(filepath):
    
    # try to load the file
    try:
        df = pd.read_csv(filepath)
    except FileNotFoundError as err:
        print('File %s does not exist' % filepath)
        return

    # try to parse the num_hits column,
    # and return the raw dataframe if it cannot be parsed
    try:
        df['num_hits'] = df.num_hits.astype(int)
    except ValueError:
        print('Warning: could not convert num_hits column to an integer')
        return df
    
    # if the column was parsed, calculate the success rate
    num_wells = 96
    df['success_rate'] = df['num_hits']/num_wells
    return df    

In [None]:
load_data_safer('data-missing-column.csv')

### Combining try-except blocks, data validation, and error raising

In [None]:
def load_data_even_safer(filepath):
    
    try:
        df = pd.read_csv(filepath)
    except FileNotFoundError as err:
        print('File %s does not exist' % filepath)
        return

    # check for missing and unexpected columns by defining a list of required/expected columns
    required_columns = ['plate_id', 'num_hit']
    missing_columns = set(required_columns).difference(df.columns)
    unexpected_columns = set(df.columns).difference(required_columns)
    
    # missing required columns are likely to cause errors when parsing or processing the data, 
    # so it makes sense to raise an error here
    if missing_columns:
        raise ValueError('Required columns %s not found' % missing_columns)
    
    # unexpected columns, however, are unlikely to cause errors later,
    # but still may indicate something is wrong, so it is a good idea to warn the user
    if unexpected_columns:
        print('Warning: unexpected columns %s found' % unexpected_columns)
    
    # note that if execution reaches this line, we know that num_targets column exists,
    # so we can assume that we only need to catch a ValueError (and not an AttributeError)
    try:
        df['num_hit'] = df.num_hit.astype(int)
    except ValueError:
        print('Warning: could not convert num_hits column to an integer')
        return df
    
    num_wells = 96
    df['success_rate'] = df['num_hit']/num_wells
    return df   

In [None]:
load_data_even_safer('data-messy.csv')

### Next topic: SQL

Suppose we want to know the number of plates with at least 10 hits.

In [None]:
df

In [None]:
# this is how we would answer this question
count = 0
for ind, row in df.iterrows():
    if row.plate_id != 'Plate4' and row.num_hit >= 10:
        count = count + 1
count