In [1]:
from IPython.core.display import HTML
css_file = "./presentation_notebook_style.css"
HTML(open(css_file, 'r').read())

In [2]:
import numpy
from numpy.random import rand
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams.update({'font.size': 18})
from scipy.integrate import quad
import unittest

# 02 Testing Scientific Codes

# Why do we test?
- In the experimental Sciences, new theories are developed by applying the Scientific method
- Perform tests to demonstrate results are ***accurate, reproducible*** and ***reliable***
- Test experimental setup show it's working as designed and to eliminate or quantify any systematic errors
- A result will not be trusted unless experiment itself has been carried out to a suitable standard

# Why do we test?
- In computational Science, we should apply the same principles to our code
- A result should only be trusted if the code that has produced it has undergone rigorous testing which demonstrates:
    * it is working as intended 
    * any limitations of the code (e.g. numerical errors) are understood and quantified

# Testing scientific code is hard
- Often investigate systems where *exact behaviour unknown*
- Can be very *complex*, built over a number of years (or even decades!) with contributions from a vast number of people
- Even for the most complicated of codes there are a number of different tests we can apply to build ***robust, reliable code*** whose results can be ***trusted***. 

<center>![Good code](https://imgs.xkcd.com/comics/good_code.png )
[xkcd](https://xkcd.com/844/)</center>

# When should I test?
<br />
## Always and often

- The earlier you start testing the better:
    * catch bugs as they develop and before they become too entrenched in the code. 
- Try and execute tests every time changes are made
- Continuous integration useful to make sure tests are run frequently 

# When should I test?
- Important to ***review your tests regularly***
- In actively developed code, tests must be amended and new tests written so new features are also tested
- Regression tests useful to test changes to the code improve its performance rather than making it worse
- Code coverage makes sure ***all*** code is being tested
    * If only 20% of the code has been tested, cannot trust other 80% of the code is producing reliable results 

# Effective testing

- Necessary to make sure entire parameter space is tested, not just one or two nice cases
- Particularly important: *edge* and *corner* cases
    * Edge cases: at beginning and end of input parameter space
    * Corner cases: one or more edge cases are combined
- Errors frequently arise here - often special code is required to deal with boundary values

# Tests should break your code
- Also important to check code breaks as expected
- If code input is garbage but still manages to run as normal, that is not good behaviour and suggests some validation of inputs is needed
- Highlights where runtime testing and exceptions are needed in code

# Unit tests

- For complicated codes made up of many functions, useful to write tests that check small parts - *units* - at a time
- Easier to track down exact location of bugs
- Units may be individual functions or groups of shorter functions
- Encourage good coding practice as require code to be modular

# Integration tests

- Need to verify smaller units work together.
- Individual functions may work, but this is no guarantee that they will work when put together
- Can encompass a small section of code, e.g. to check that one function correctly calls another, all the way up to the entire code
- Integration tests can be difficult to design - can involve many different functions, so often a lot more complex than unit tests

# Convergence tests

- Often calculate solution on some grid - a discretised approximation of exact continuous solution
- As grid resolution increases, solution should approach exact solution
- *Convergence tests* check this
- Calculate solution at various resolutions, calculate error
- Error should decrease with increasing resolution at algorithm's order of convergence

# Regression tests

- When building your code, generally aim for its performance to improve with time
- Results should get more accurate (or at least not deteriorate)
- Solution: *regression tests* 
- Run multiple versions of code, compare outputs
- If output has changed, test fails
- Helps catch bugs other types of tests may not, ensure project remains backwards-compatible

# Common problems and how to solve them

# My code has some randomness
- Time evolution problems: output at individual timestep may be random, but behaviour averaged over several timesteps is known - test this!
- Other problems: test average behaviour across entire domain or sections of domain
- Even if completely random so not possible to take meaninful averages, outputs should still be within set of known values - test this!
- Write tests that isolate random parts so can check non-random parts work
- If using a random number generator, eliminate non-determinism by testing using a fixed seed value

# I don't know the correct solution

- In experimental science, test experimental setup using a *control*
    * Use input data where outcome is known so any bugs in apparatus or systematic errors can be identified and understood
- In computational science, there's usually a simple system whose behaviour is known
    * Time evolution problems: system which is initially static should remain that way
- If this is not the case, there is something seriously wrong with the code! 
- In physics, can check for symmetries (e.g. translation, reflection, time reversal), conserved quantities (e.g. mass, energy, charge)

# I didn't write most of the code - how do I know that the bit I wrote works?

- Unit tests! 
- Test original code in isolation
- Any failures in subsequent tests that then incorporate your code will therefore only be the result of bugs in your code

# I know there is some numerical error in my code - how can I test my code is correct up to this error?

- In numerical calculations, there will always be some computational error that cannot be avoided
    * floating point representation of numerical data 
    * accuracy  of algorithm
- Rarely require result to be 100% precise, but instead correct up to some tolerance
- Build tests to reflect this. 