[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JamesFergusson/Introduction-to-Research-Computing/blob/master/07_BestPractice_Validation.ipynb)

# Validating you code

Once you have written some initial code from your prototype with good documentation and a nice modular structure (and of course carefully tracked by git) you will need to check that it is correct.  There are a few things you can do to make this much easier:

- Debug
- Error trapping
- Unit testing
- Continuous Integration

For the first Python has a good inbuilt debugger that will help you, `pdg`

## Debugging
If you run code that has raised an exception in jupyter or ipython you can use the magic command `%debug` to launch an interactive debugger where you can examine the code line by line.  Here are the most useful commands

| Command | Description |
|---|---|
| u | Up |
| d | Down|
| p | Print |
| q | Quit |

This is best seen in an example:

In [None]:
def bottom_func(x):
    y = x**2
    return str(x) + " squared equals " + str(y)

def top_func(z):
    result = bottom_func(z)
    return result
        
top_func('7')

In [None]:
%debug

This can be turned on automatically with `%pdb on` so whenever an exception is raised the debugger is launched automatically

You can also run the code interactively line by line using `%run -d` which can be more useful if your code is just wrong rather than breaking.  If you launch code with this then you can step through it with the following commands:

| Command | Description |
|---|---|
| n | Next line |
| s | Step into function|
| c | Continue to run normally |
| q | Quit |




In [None]:
%run -d Examples/UnitTesting/simple.py

You can also run the debugger from the command line with:

In [None]:
>> python3 -m pdb myscript.py

## Error trapping

It is a good idea to write you code so that when errors occurs the code reports this to you.  Python is pretty good a providing reasonable error messages but it's good practice to make you own.  One simple place to check things is whether the input makes sense. So for fibonacci we could add: 

In [None]:
def fibonacci(num):
    
    if not isinstance(num,int):
        print("non-integer input given to function fibonacci; num="+str(num))
        return 0
    
    a=0
    b=1
    for i in range(num):
        a,b = b,a+b
    return a

fibonacci(3.4)
fibonacci(3)



You should also check error codes that come back from routines and report on them, for instance reading a file. Better examples are checking inputs are in valid ranges for functions you've created or other things which won't cause python to crash but will mean you code gives the wrong answer, for example:

In [None]:
def harmonic_mean(list):
    x = 0e0
    for item in list:
        if item==0:
            print("Harmonic mean does not exist for data containing zeros.  Returned 'None'")
            return None
        
        x+= 1e0 / item
    x = len(list)/x
    
    return x

list1 = [1,2,3,4,5,6,7,8,9]
list2 = [1,2,3,4,5,6,7,8,9,0]
print(harmonic_mean(list1))
print(harmonic_mean(list2)) 

## Unit testing

The best way to debug your code is to catch them before they happen which you can do with unit testing.  The ideas is that you would set up a bunch of tests for the code then every time you do a commit to master or after doing major edits you run them to check you haven't broken anything.  For the code calculating the integral in the previous notebook you may want to to create tests that check you polynomials are OK (by checking orthonormality) or that the final integral with Gauss-Legendre quadrature is correct for large l (ie set X = $\delta_{l,200}$ and see if you get the correct answer, 0.000018285996687338485).  

Having set up these tests you can then automate them to create a test package that runs, say, before a push to a central repository.  This is standard practice in commercial development environments and if you can include them in interview test questions this will put you above the majority of applicants.

It's a good idea to get into the habit of adding them for functions, ideally before you write it.  Then you can use them to check your code does what you thought it should.  You will end up spending lots of time checking your code when you are trying to fix bugs so setting up the tests in advance can save a lot of time. Luckily in python basic ones are easy to do, you can just add it to the docstring

In [None]:
import doctest

def function(x):
    """
    Calculate x + 2
    >>> function(5)
    7
    """
    return x+3

doctest.testmod()

This is fine for super simple tests but isn't much use once you write functions that process data rather than just a number.  There are a lot of packages available but `pytest` is the standard (which is not to say it's the best).  This runs from the terminal and looks for any functions with the name `test_somefunction` or `somefunction_test` then runs them.  These functions should contain some code to run then tests to apply to the outputs using the command `assert` which accepts any boolean argument.  If our function was:

In [None]:
def addtwo(x):
    """
        Add 2 to x
    """
    return x+1

Then our test could look like:

In [None]:
def test_addtwo():
    """
        Test addtwo
    """
    assert( addtwo(3)==5)

These are in the files simple.py in the directory `Examples/UnitTesting`. We can test them from the command line using

In [None]:
%%bash
pytest Examples/UnitTesting/simple.py

The tests can also be held in separate files like test_simple.py:

In [None]:
%%bash
pytest Examples/UnitTesting/test_simple1.py

One thing to remember is that floating point arithmetic is not exact so the test of add02 in test_simple2.py fails

In [None]:
%%bash
pytest Examples/UnitTesting/test_simple2.py

To fix this pytest had a function called `approx` which by default allows a relative tolerance of 1e-6 which is mostly fine and it works on most data objects:

In [None]:
from pytest import approx
import numpy as np
print(0.1 + 0.2 == approx(0.3))
print((0.1 + 0.2, 0.2+0.4) == approx((0.3,0.6)))
print({'a': 0.1 + 0.2, 'b': 0.2 + 0.4} == approx({'a': 0.3, 'b': 0.6}))
print(np.array([0.1, 0.2]) + np.array([0.2, 0.4]) == approx(np.array([0.3, 0.6])))
print(np.array([0.1, 0.2]) + np.array([0.2, 0.1]) == approx(0.3))

However if you are testing things near zero relative tolerances are useless.  Luckily `approx` also allows you to change the tolerances and make them relative or absolute.  If you specify both, it is true if either are satisfied.

In [None]:
from pytest import approx
print(1.0001 == approx(1))
print(1.0001 == approx(1, rel=1e-3))
print(1.0001 == approx(1, abs=1e-3))
print(1.0001 == approx(1, rel=1e-5, abs=1e-3))

You can also specify a specific failure message with `fail` ie:

In [None]:
from pytest import fail
def test_something():
    x = somefunc()
    if x in badthings:
        fail('A bad thing came back from somefunc()')

Finally note that if we specify no arguments `pytest` looks for files names `test_fimename` or `filename_test` and runs them.  In general is is best practice to put your source code and you tests in different directories as it keeps them separate and safe.  You should have one test file for each module.  Then running pytest in the test directory will check all your code or you can run on individual modules if you want.

In [None]:
%%bash
pytest Examples/UnitTesting/

## Continuous Integration

Now the above is all useful but the real trick is to automate it.  Continuous Integration is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests.  This stops people from introducing errors which are not found until much later as the code must pass all tests before it can be accepted into the main repository.

As this is a standard development practice there are many tools for this. Common options are: `Jenkins`, ` Travis CI`, `Circle CI`, `TeamCity`, or `Bamboo`.  You can run it locally on your machine or via the cloud.  Most will create multiple virtual machines so you can test on multiple versions of python simultaneously to ensure code stability and most have free options.  Setting these up can be a bit tricky so one of the easiest options is to use the CI tools built into the popular GIT hosting sites. `github`, `gitlab`, and `bitbucket` all have the tools built in and template scripts you can use.

We will show you how to set up a simple CI routine using the tools in GitHub to get you started.  We will use the following code for our example:

In [None]:
%%file CI_Test/basic_maths.py
"""
basic math library.
"""


def add(a, b):
    """
    add a and b
    """
    return a+b


def minus(a, b):
    """
    subtract b from a
    """
    return a-b


def multiply(a, b):
    """
    multiply a and b
    """
    return a*b


def divide(a, b):
    """
    divide a by b
    """
    return a/b



In [None]:
%%file CI_Test/test_basic_maths.py
"""
test basic_maths.py.
"""

import basic_maths as bm


def test_add():
    """
    Test add
    """
    assert(bm.add(6, 3) == 9)


def test_minus():
    """
    Test minus
    """
    assert(bm.minus(6, 3) == 3)


def test_multiply():
    """
    Test multiply
    """
    assert(bm.multiply(6, 3) == 18)


def test_divide():
    """
    Test divide
    """
    assert(bm.divide(6, 3) == 2)



1st. - We will copy the code in these two files into a folder, create a git repository and commit them.

In [None]:
git init
git add basic_maths.py test_basic_maths.py
git commit -m "Initial commit"

2nd. We upload it to GitHub

- login to github
- click 'repositories'
- click 'new'
- follow instructions for "push an existing repository from the command line"

3rd. We click `actions` and select `Python Package` (or `Python Package using Anaconda` / `Python Application`). then `start commit` then `commit new file`.  This creates the following file in the new folder .github/workflows

In [None]:
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:

    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.5, 3.6, 3.7, 3.8]

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install flake8 pytest
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        pytest

Now we have set up the test and it will run on any push or pull request to the branch main.  To see this we can first do a git pull to update the local repository then edit the comment at the top of `basic_maths.py` the `add`/`commit`/`push`.  Now if we navigate to GitHub and click on actions we can see the tests running.  At the moment there is nothing to stop us pushing rubbish that fails the test and it being added to main.  To stop this we need to protect it.

Go to `settings` and click `branches` and click `add rule` then for `branch name pattern` write "main" then select `Require status checks to pass before merging` and select the 4 tests then select `Include administrators` and click `Create`.

Now go back and edit the comment at the top of `basic_maths.py` again and try to push.  You should now get the message:

In [None]:
remote: error: GH006: Protected branch update failed for refs/heads/main.
remote: error: 4 of 4 required status checks are expected. At least 1 approving review is required by reviewers with write access.
To https://github.com/JamesFergusson/CI_Test.git
 ! [remote rejected] main -> main (protected branch hook declined)
error: failed to push some refs to 'https://github.com/JamesFergusson/CI_Test.git'

To edit the code we now have to use branches which we then merge with the main in GitHub. so create a branch and switch to it:

In [None]:
git branch "test"
git checkout test

Now add the following function to both basic_functions.py and test_basic_functions.py:

In [None]:
def exponentiate(a, b):
    """
    calculate a to the power of b
    """
    return a^b

In [None]:
def test_exponentiate():
    """
    Test exponentiate
    """
    assert(bm.exponentiate(6, 3) == 216)

Now `add`/`commit` then `git push --set-upstream origin test`.  You should get the message:

In [None]:
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
remote: 
remote: Create a pull request for 'test' on GitHub by visiting:
remote:      https://github.com/JamesFergusson/CI_Test2/pull/new/test
remote: 

now go back to GitHub and click `Pull Requests` then `new pull request` and under `compare` select `test`.  This should say the tests have failed so we can't merge into main.  Go back and fix the function to:

In [None]:
def exponentiate(a, b):
    """
    calculate a to the power of b
    """
    return a**b

Then `add`/`commit`/`push`.  Now go back to the `Pull Request` which should pass and now you can merge `test` with `main`.  Now your `main` branch is safe and can't be committed to.  without all testing passing.  This is a fairly basic CI setup which works fine for very small teams but there are a huge number of options.  The documentation is mostly OK once you have the basics so you can teach yourself but if you want to create CI for a medium size team (3+) it might be best to ask for some computer officer support.  That said, the tools for managing it are advancing rapidly so it's worth exploring what is available regularly.

### Exercise: Game of life

Write a small module that runs Conway's game of life, https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life.  You start with a 2D grid (whose size is to be specified at runtime) where the cells are either 'alive' or 'dead'.  Then the rules for stepping in time are:

- Overpopulation. Live cells with more than 3 neighbours die
- Underpopulation. Live cells with less than 2 neighbours die
- Reproduction. Dead cells with exactly 3 neighbours become live

Boundaries are periodic

The goal is to:
1. Prototype 
    - It should take a starting set of cells or generate a random board
    - Have a "step" function which advanced the board one timestep
    - Have a "run" function which evolves the game n steps and plots/animates the result
2. Design unit tests (The "Oscillators" from the wikipedia page are good for test cases)
3. THEN! Write the code, try to use debug and include error traps.  You can use your unit tests to validate as you go.. See if you can use CI for the project.

You can create all the logic yourself but the following functions will make it much easier: 2D convolution from https://docs.scipy.org/doc/scipy-0.15.1/reference/signal.html, which will count your neighbours, and then https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html for implementing the rules.