# Testing

One of the essentials when the time comes to deploy our code (or models) is ensuring it works how we intended.

> __Testing allows us to have higher confidence in how our code works__

There are a lot of approaches we will briefly go over some of them before diving into __testing Python code specifically__.

## Functional testing

> __Functional testing describes testing the functionality of our product (code, mobile app, UI etc.) manually__

There are a few steps one can perform here (and are conducted in this order), each being a "higher level approach":
- __`unit testing` - tells us WHERE and HOW something failed__:
    - usually done by the developers themselves
    - testing specific functions/methods to see that they return assumed values/do assumed things
- __`integration testing` - tells us WHAT failed__:
    - framed as scenarios (e.g. user wants to log something into file)
    - tests how different units cooperate with each other
- __`system testing`__:
    - whole system/product is tested against desired functionality
    - considered as a black box
    - usually done by specialized testing team
- __`acceptance testing`__:
    - rolling out to production 
    - beta testing (your end users)
    - verifying requirements of the product are really met    
    
> __Some of those tests could be generated which saves hours of work if done correctly!__

## Non-functional testing

> __Testing various components of the systems not directly related with desired functionality__

### Performance testing

> __How performant our product is under different conditions__

During those tests there are a few key things to keep in mind:
- __Find bottlenecks__ - where your code takes absurdly long to run (maybe it is a single slow operation you can change?)
- __Premature optimization is the root of all evil__ - if it is fast enough, don't try to improve by `0.1%`
- __Test under different load__, some examples:
    - How the performance differs based on increased batch size?
    - __Spike testing__: What if our machine learning app deployed on AWS has a sudden user spike?
    - __Stress testing__: how your product behaves at __or even above__ it's limits (e.g. large input values, large data, large traffic), is this how you envisioned it?
    - __Endurance testing__: normal load but for a long time; how often is your web app down?
    
    
### Security testing

> __Keep in mind this topic is way too broad and worthy of another course on it's own!__

Importance of this topic is often underestimated, but it is an essential piece of many infrastructures.
Few things you should keep in mind:
- __Minimum trust approach__ - give only absolutely necessary permissions to users/coworkers
- __Separate roles__ - permissions only related to their roles
- __Try to break it__ - check out pentesting or ethical hacking

### Compatibility testing

> __How compatible is our product with previous iteration and/or different environments__

Luckily the second type of compatibility can be simply improved by using `docker` (__principle of shifting responsibility to providers__)

## Other helpful techniques

> In order to keep your code in check one can employ a few simple additional techniques

- __Peer review__ - each thing you do is checked by another person:
    - Pull Requests are often checked by assigned reviewers
    - Scientific papers are under double blind peer-review
- __Code analyzers__ - GitHub offers a lot of integrations, __which looks for possible bugs in your code automatically__, a few examples with easy integration:
    - [`codebeat`](https://codebeat.co/)
    - [`sonarqube`](https://www.sonarqube.org/)
    - [`codacy`](https://www.codacy.com/)
    - [`codeclimate`](https://codeclimate.com/) 
- __Test coverage__ - how many (in percentage) of our code was tested. One can obtain it via [`coverage.py`](https://coverage.readthedocs.io/en/coverage-5.5/) with testing framework of choice (also it is possible to integrate with GitHub Actions, which we will see in a few lessons)

## unittest

> `unittest` is a built-in `python` module which allows us to write unit tests efficiently

Module structure is very simple and best explained by an example we can go over:

In [36]:
import unittest

class TestStringMethods(unittest.TestCase):

    def test_upper(self):
        self.assertEqual('foo'.upper(), 'FOO')

    def test_isupper(self):
        self.assertTrue('FOO'.isupper())
        self.assertFalse('Foo'.isupper())

    def test_split(self):
        s = 'hello world'
        self.assertEqual(s.split(), ['hello', 'world'])
        # check that s.split fails when the separator is not a string
        with self.assertRaises(TypeError):
            s.split(2)

In [37]:
# In your Python code you should simply run unittest.main()

unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ok
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... ok

FAIL: test_stupid (__main__.DummyFailedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-19-303e6c82e795>", line 6, in test_stupid
    self.assertEqual("foo", "bar")
AssertionError: 'foo' != 'bar'
- foo
+ bar


----------------------------------------------------------------------
Ran 11 te

<unittest.main.TestProgram at 0x7fda4123ed60>

As one could see above, `unittest` consists of:
- `classes` inheriting from `unittest.TestCase` - those should contain semantically related tests
- method(s) of said class which run a specific unit test
- `unittest.main` function which parses Python file looking for test cases (as specified by `classes` and `methods`)

There are also a few constructs which act like `assert` statement, __each of those has to pass__.

Let's see how a failed test case looks:

In [38]:
class DummyFailedTest(unittest.TestCase):
    def test_obvious(self):
        self.assertEqual("foo", "foo")

    def test_stupid(self):
        self.assertEqual("foo", "bar")

unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ok
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... ok

FAIL: test_stupid (__main__.DummyFailedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-38-303e6c82e795>", line 6, in test_stupid
    self.assertEqual("foo", "bar")
AssertionError: 'foo' != 'bar'
- foo
+ bar


----------------------------------------------------------------------
Ran 11 te

<unittest.main.TestProgram at 0x7fda412267f0>

Please notice __all of the tests in this notebook were run__.

### setUp() and tearDown()

One can add two more methods:
- `setUp()` - runs before test cases in the classes, sets up necessary stuff like reading files, connecting to database etc.
- `tearDown()` - runs after test cases to destroy leftovers created by tests

Simple example could be:

In [39]:
import unittest


class FileTestWithHandle(unittest.TestCase):
    def setUp(self):
        self.handle = open("./0. Docker/__main__.py", "r")

    def test_length(self):
        length = len(self.handle.readlines())
        self.assertTrue(length > 20)

    def tearDown(self):
        self.handle.close()

unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ok
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... ok

FAIL: test_stupid (__main__.DummyFailedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-38-303e6c82e795>", line 6, in test_stupid
    self.assertEqual("foo", "bar")
AssertionError: 'foo' != 'bar'
- foo
+ bar


----------------------------------------------------------------------
Ran 11 te

<unittest.main.TestProgram at 0x7fda40d36d60>

### Skipping tests

Using decorators we can further control tests we are running (e.g. skip some of them based on condition).

See options below (`unittest.skip` can also be used as a decorator on the whole class a.k.a. __test suite__):

In [40]:
import sys

class MyTestCase(unittest.TestCase):
    @unittest.skip("demonstrating skipping")
    def test_nothing(self):
        # This line will not run at all
        self.fail("shouldn't happen")

    @unittest.skipIf(sys.version_info.major < 3, "Not supported for Python 2")
    def test_py3_format(self):
        self.assertEqual("{}".format("aaa"), "aaa")
        pass

    @unittest.skipUnless(sys.platform.startswith("win"), "Windows required")
    def test_windows_support(self):
        # windows specific testing code
        pass

    def test_maybe_skipped(self):
        # Skip test from within the function body
        if 5 == 5:
            self.skipTest("Yes, 5 is equal to 5 so we skip")
        # test code which would run if 5 != 5 (essentially never, we know)
        ...
        
unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ok
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... ok

FAIL: test_stupid (__main__.DummyFailedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-38-303e6c82e795>", line 6, in test_stupid
    self.assertEqual("foo", "bar")
AssertionError: 'foo' != 'bar'
- foo
+ bar


----------------------------------------------------------------------
Ran 11 te

<unittest.main.TestProgram at 0x7fda414954c0>

## Exercise

- Read the following code and create unit tests to find out implementation bugs
- Fix the code below accordingly to found bugs

In [41]:
import numpy as np

def sigmoid(data: np.ndarray):
    return 1 / (1 + np.exp(data))

def softmax(x):
    exponentials = np.exp(x)
    return exponentials / exponentials.sum(axis=1).reshape(-1, 1)

## Hypothesis

> __Hypothesis is a `python` library for `property-based testing` which is easier, more powerful and has way larger test cases coverage than standard unit testing__

Difference between `unit testing` and `property-based testing` is fantastically explained by [Hypothesis Welcome Page](https://hypothesis.readthedocs.io/en/latest/):

__Think of a normal unit test as being something like the following:__

1. Set up some data.
2. Perform some operations on the data.
3. Assert something about the result.

__Hypothesis lets you write tests which instead look like this:__

1. For all data matching some specification.
2. Perform some operations on the data.
3. Assert something about the result.

This idea was popularized by [Haskell](https://www.haskell.org/) (purely functional programming language) library [QuickCheck](https://hackage.haskell.org/package/QuickCheck).

> __Hypothesis generates testing data based on your specification and checks whether guarantees you want to give hold true.__

### Installation

As per usual, one can install `hypothesis` via `pip` or [`conda`](https://anaconda.org/conda-forge/hypothesis).

There are also a few extensions provided for scientific stack especially (like `numpy` or `pandas`).

To install `hypothesis` with `numpy` generation strategies via `pip` one could do (__and of course you should also inside your `AiCore` `conda` environment__):

In [42]:
!pip install hypothesis[numpy]

zsh:1: no matches found: hypothesis[numpy]


### General

First, let's set up an example which `encode`s the string and `decode`s it:

In [43]:
def encode(input_string):
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst


def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q

It should be fairly obvious, that `encode(decode(<string>))` should return original `<string>`.

Hypothesis can generate `<string>` examples for us (just like unit tests, but way easier and automated) using:
- `strategy` - way to create testing data (in this case `text`)
- `given` - generate samples from the specified strategy

With that in mind, let's see how we could do that:

In [44]:
# Change that to unittest

from hypothesis import given
import hypothesis.strategies as st

class TestEncoding(unittest.TestCase):
    @given(st.text())
    def test_decode_inverts_encode(self, s):
        self.assertEqual(decode(encode(s)), s)
        
unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ERROR
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... Falsifying example: test_decode_inverts_encode(
    s='', self=<__main__.TestEncoding testMethod=test_decode_inverts_encode>,
)
ok

ERROR: test_decode_inverts_encode (__main__.TestEncoding)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-44-d413d060cb54>", line 8, in test_decode_inverts_encode
 

<unittest.main.TestProgram at 0x7fda415279a0>

First of all notice how easy it is to mix `unittest` with `hypothesis` to create way more comprehensive test suite.

You can see that for our string encoding function fails when the input is an empty string. If we fix the above code by appending
the check for empty string:

In [45]:
def encode(input_string):
    # This is an example fix
    if input_string == "":
        return []
    
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst


def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q

And re-running the test (`@example` specifies this example will always be run, good for catching edge cases):

In [46]:
unittest.main(argv=[''], verbosity=2, exit=False)

test_obvious (__main__.DummyFailedTest) ... ok
test_stupid (__main__.DummyFailedTest) ... FAIL
test_length (__main__.FileTestWithHandle) ... ok
test_maybe_skipped (__main__.MyTestCase) ... skipped 'Yes, 5 is equal to 5 so we skip'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_py3_format (__main__.MyTestCase) ... ok
test_windows_support (__main__.MyTestCase) ... skipped 'Windows required'
test_decode_inverts_encode (__main__.TestEncoding) ... ok
test_isupper (__main__.TestStringMethods) ... ok
test_split (__main__.TestStringMethods) ... ok
test_upper (__main__.TestStringMethods) ... ok

FAIL: test_stupid (__main__.DummyFailedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-38-303e6c82e795>", line 6, in test_stupid
    self.assertEqual("foo", "bar")
AssertionError: 'foo' != 'bar'
- foo
+ bar


----------------------------------------------------------------------
Ran 11 te

<unittest.main.TestProgram at 0x7fda414aabb0>

Generated tests pass correctly.

> __Hypothesis is smart about running tests, IT WILL ONLY RUN THE FAILED CASES (as it has it's own internal database)!__

### Hypothesis tricks

A few useful things, which should help you with your tests:

> __`filter` values generated by a strategy__

In [47]:
@given(st.integers().filter(lambda x: x % 2 == 0))
def test_even_integers(i):
    pass

> __`assume` that input is/is not something__

> __NOTE:__ Hypothesis will fail if your assumptions get rid of too many generated samples

Test will not be marked as `failing` __if the assumption is `false`__:

In [48]:
from math import isnan

@given(st.floats())
def test_negation_is_self_inverse_for_non_nan(x):
    assume(not isnan(x))
    assert x == -(-x)

> __strategies are highly customizable__

One can specify a lot of parameters for the strategies, for example:

In [49]:
st.integers(min_value=0, max_value=10).example()

9

> __`given` can specify some/all argument to function via `kwargs` or `args`__

With the following signature:

```
hypothesis.given(*_given_arguments, **_given_kwargs)
```

Valid cases could be (amongst others):

In [51]:
@given(st.integers(), st.integers())
def a(x, y):
    pass


@given(st.integers())
def b(x, y):
    pass


@given(y=st.integers())
def c(x, y):
    pass

## Exercise

- Once again, test `sigmoid` and `softmax` functions, this time using `hypothesis` strategies
- Check out [documentation](https://hypothesis.readthedocs.io/en/latest/numpy.html#hypothesis.extra.numpy.arrays) to see how to generate `np.ndarray` instances

## Challenges

### Assessment

- Learn basics of `pytest` (documentation [here](https://docs.pytest.org/en/latest/contents.html))
- What is `pytest`'s [mark.parametrize](https://docs.pytest.org/en/stable/parametrize.html)? 
- What is [Test Driven Development](https://en.wikipedia.org/wiki/Test-driven_development) and what are the general steps needed to follow this approach?

### Non-assessment

- Check out [`doctest`](https://docs.python.org/3/library/doctest.html#module-doctest) Python module, which allows you to test your code placed inside `docstring`s (you can think of it like a "smoke" tests, which only test whether everything runs correctly) 
- Read more about automated test case generation via [Hypothesis](https://hypothesis.readthedocs.io/en/latest/). We don't want to do more manual work than needed, do we?