# Unit Testing

Unit testing is the integration of checks into your code that test that individual units of your code (e.g. functions or methods) behave in the way you expect them to.

### Why Is It Important?

**<a href="http://www.sciencemag.org/content/314/5807/1856.full">A Scientist's Nightmare</a>**

This 2006 article from Science describes the retraction of 5 papers due to a software error in which two columns of data were accidentally swapped.

While most errors of this type will be obvious, some are not!

Testing allows us to catch these errors during the development process and check for error regression as we re-factor code.

### The Example

I am going to use a single example function to describe the process of writing tests.  The function will take a DNA or protein sequence as a string and return all the kmers of length k that occur more than n times.

The first example is deliberately badly written so I can go through the process of refactoring!

In [None]:
# return kmers of length k that occur more than n times in the sequences

def findKmers(sequence, k, n):
    units = list(sequence)
    result = []
    for start in range(len(units) +1 -k):
        kmer = ''.join(units[start:start+k])
        if sequence.count(kmer) >= n:
            result.append(kmer)
    return result

findKmers('ATGATGA', 3, 2)

        

### A Bug

This function returns the correct kmers, but a kmer that occurs twice will appear in the list twice. 

If we had run this on an entire genome, we might never have noticed this error.

### Write the test **THEN** fix the bug!

Assert is a keyword built into Python.  It runs a function with a known input, and then checks the output against what you expect it to be.  If the test passes, assert runs silently.  If it fails, your script will exit with an error.


In [None]:
# this statement will test for the behaviour we want - at this point we expect it to fail!
assert findKmers('ATGATGA', 3, 2) == ['ATG', 'TGA']

Now we can fix the bug.  In a script, we would fix it *in situ* and leave the assert statement in place - here I will write them out again.

In [None]:
# fix by adding an extra clause to the if statement

def findKmers(sequence, k, n):
    units = list(sequence)
    result = []
    for start in range(len(units) +1 -k):
        kmer = ''.join(units[start:start+k])
        if sequence.count(kmer) >= n and kmer not in result:
            result.append(kmer)
    return result

assert findKmers('ATGATGA', 3, 2) == ['ATG', 'TGA']

findKmers('GATCGATCGATC', 3, 2)


The assert statment passed and the script has gone on to find kmers in another input sequence.

### Making a Suite of Tests

You could, of course, write infinite tests.  A good approach is to test the extreme ends of what you expect your input to be.

In [None]:
# Test limits of k
assert findKmers('ATGATGA', 1, 1) == ['A', 'T', 'G']
assert findKmers('ATGATGA', 7, 1) == ['ATGATGA']

#Test limits of n
assert findKmers('ATGATGA', 2, 1) == ['AT', 'TG', 'GA']
assert findKmers('ATGATGA', 1, 3) == ['A']

Another useful thing to do is test unrealistic input.

What would happen if we ran this function on our test input with a value of k=8?  In order to write a test, we need to think about what we **want** this behaviour to be.  This can be a useful exercise in it's own right.  

In this case, we return an empty list.  This is something we can easily check for downstream so I will stick with it.

In [None]:
# Test unreasonable inputs
assert findKmers('ATGATGA', 8, 1) == []

### Refactoring and Regressions

You have probably already spotted this, but this code will not scale well to large sequences. This is because:
* The .count() method scales linearly with the length of the string it queries
* Every time we add a new kmer to the results list, we scan the list to check if it is already there.  This scales linearly as the list grows
* The number of iterations of the for loop scales linearly with the length of the sequence

We can improve the efficiency of the function without changing the inputs or the outputs - this is called refactoring.

In [None]:
# Our more efficient new function
# We will use a dict to keep track of how many times we see each kmer

def findKmers(sequence, k, n):
    units = list(sequence)
    kmerCounts = {}
    for start in range(len(units) +1 - k):
        kmer = ''.join(units[start:start+k])
        currentCount = kmerCounts.get(kmer, 0)
        kmerCounts[kmer] = currentCount + 1
    
    # now return just the kmers with count > cutoff
    result = []
    for kmer, count in kmerCounts.items():
        if count >= n:
            result.append(kmer)
        
    return result

# And now we should re-run our tests
# Test limits of k
assert findKmers('ATGATGA', 1, 1) == ['A', 'T', 'G']
assert findKmers('ATGATGA', 7, 1) == ['ATGATGA']

#Test limits of n
assert findKmers('ATGATGA', 2, 1) == ['AT', 'TG', 'GA']
assert findKmers('ATGATGA', 1, 3) == ['A']

# Test unreasonable inputs
assert findKmers('ATGATGA', 8, 1) == []

Our tests have failed!  But it is not easy to see why.  There are more sophisticated tools we can use for this.  In Python, these include:

* unittest (builtin)
* nose (a wrapper for unittest)
* pytest

And probably many others!

Let's have a look at unittest.

In [None]:
# Note this code won't run in Jupyter
import unittest

class TestKmerMethods(unittest.TestCase):

    def test_findKmers(self):
        # limits of k
        self.assertEqual(findKmers('ATGATGA', 1, 1), ['A', 'T', 'G'])
        self.assertEqual(findKmers('ATGATGA', 7, 1), ['ATGATGA'])
        
        #limits of n
        self.assertEqual(findKmers('ATGATGA', 2, 1), ['AT', 'TG', 'GA'])
        self.assertEqual(findKmers('ATGATGA', 1, 3), ['A'])
        
        #unreasonable inputs
        self.assertEqual(findKmers('ATGATGA', 8, 1), [])  

if __name__ == '__main__':
    unittest.main()

### Regression

Regression is something that is important to mention here -  it is the tendency for old bugs to come back when you refactor code and is an important reason to build test suites.  However, I couldn't come up with an artificial example, so I won't discuss it further here!

### Checking your tests are appropriate

If you look at the output from unittest, you can see why the test failed.  The items in the list that the function returns are the same - but they are in different orders.  That is because we changed from using a list, where the order of the items is predictable, to a dict, where the order of keys is random.  If this is acceptable, we need to change the test and not the code.  This is another example of how testing can make us think more carefully about our code.

Unittest has a method to deal with this.  assertCountEqual checks that the *elements* in a sequence appear the same number of times, regardless of their order:

In [None]:
# Again, this code won't run in Jupyter
class TestKmerMethods(unittest.TestCase):

    def test_findKmers(self):
        # limits of k
        self.assertCountEqual(findKmers('ATGATGA', 1, 1), ['A', 'T', 'G'])
        self.assertCountEqual(findKmers('ATGATGA', 7, 1), ['ATGATGA'])
        
        #limits of n
        self.assertCountEqual(findKmers('ATGATGA', 2, 1), ['AT', 'TG', 'GA'])
        self.assertCountEqual(findKmers('ATGATGA', 1, 3), ['A'])
        
        #unreasonable inputs
        self.assertCountEqual(findKmers('ATGATGA', 8, 1), [])  

if __name__ == '__main__':
    unittest.main()

### Other Things Unittest Can Do

* Test discovery
* Set-up and tear-down (allows construction of tests for functions that can change state)
* assertAlmostEqual - will handle errors caused by doing floating point calculations
* Testing exception handling

<a href=https://docs.python.org/3.4/library/unittest.html>unittest documentation</a>