# Testing

During these notebooks, we have been seeing how to add maintain your code clean and your functions and classes as flexible as possible. And maybe adding more feature to a flexible code can cause problems. Can we be sure that by adding more feature to a flexible code, the existing functions and classes are going to do what they were intended to do?

We need to make sure our code live on long into the future, so we need some means to assure its long-term stability. __Testing__ is one of the most important tools to do that.

Testing is a thorough and formal process for applications that must not fail. It, in essence, consists on verifying that a program works as intended.

> ## Testing is the process of verifying that a software behaves the way you expect.

One reason to test a software is to determine if it does what it claims to do. Imagine you are using a function, and it doesn't raise an error. However, the output is not what you expected, that mistake will cascade as the application progresses. 

In general terms we can divide testing into two categories: Functional and Non-Functional. In this notebook we will focus on Functional Testing, so we will start explainin Non-Functional Testing first, so we can focus our attention on the Functional Testing.

# Non-functional testing



> __Testing various components of the systems not directly related with desired functionality__


## Performance testing

> __How performant our product is under different conditions__

During those tests there are a few key things to keep in mind:
- __Find bottlenecks__ - where your code takes absurdly long to run (maybe it is a single slow operation you can change?)
- __Premature optimization is the root of all evil__ - if it is fast enough, don't try to improve by `0.1%`
- __Test under different load__, some examples:
    - How the performance differs based on increased batch size?
    - __Spike testing__: What if our machine learning app deployed on AWS has a sudden user spike?
    - __Stress testing__: how your product behaves at __or even above__ it's limits (e.g. large input values, large data, large traffic), is this how you envisioned it?
    - __Endurance testing__: normal load but for a long time; how often is your web app down?


## Security testing

> __Keep in mind this topic is way too broad and worthy of another course on it's own!__

Importance of this topic is often underestimated, but it is an essential piece of many infrastructures.
Few things you should keep in mind:
- __Minimum trust approach__ - give only absolutely necessary permissions to users/coworkers
- __Separate roles__ - permissions only related to their roles
- __Try to break it__ - check out pentesting or ethical hacking


## Compatibility testing

> __How compatible is our product with previous iteration and/or different environments__

Luckily the second type of compatibility can be simply improved by using `docker` (__principle of shifting responsibility to providers__)

## Other helpful techniques

> In order to keep your code in check one can employ a few simple additional techniques

- __Peer review__ - each thing you do is checked by another person:
    - Pull Requests are often checked by assigned reviewers
    - Scientific papers are under double blind peer-review
- __Code analyzers__ - GitHub offers a lot of integrations, __which looks for possible bugs in your code automatically__, a few examples with easy integration:
    - [`codebeat`](https://codebeat.co/)
    - [`sonarqube`](https://www.sonarqube.org/)
    - [`codacy`](https://www.codacy.com/)
    - [`codeclimate`](https://codeclimate.com/) 
- __Test coverage__ - how many (in percentage) of our code was tested. One can obtain it via [`coverage.py`](https://coverage.readthedocs.io/en/coverage-5.5/) with testing framework of choice (also it is possible to integrate with GitHub Actions, which we will see in a few lessons)

# Functional Testing


_Functional Testing_ consists on making sure that a piece of code _functions_ correctly. The basic structure of a functional test is:
1. Prepare the inputs to the software.
2. Identify the outputs of the software.
3. Run the software with the inputs and check the outputs.
4. Compare the outputs with the expected outputs to see if they match. 

The two first steps are performed by the developer or author of the software. The third step is done by the software itself. And the last step is done by a tester (we will see testers later).

![](images/functional_testing.png)


Let's say for example that we want to check the mean of a list of numbers.

In [1]:
def mean_list(my_list: list) -> float:
    """
    Return the mean of the values in the list.
    """
    running_sum = 0
    for num in my_list:
        running_sum += num
    return running_sum / len(my_list)

We know that the list `[1, 2, 3]` should return a mean of 2, the list [1, 2, 3, 4] should return a mean of 2.5, and the list [1, 2, 3, 4, 5] should return a mean of 3...

We can create some manual tests to check that the output is what we are expecting. This is named _Manual Testing_, and it is a good idea to do it before you write any code. However, as you start progress, you won't be able to manually test every possible case (due to the time it would take)

In [2]:
assert mean_list([1, 2, 3]) == 2
assert mean_list([-1, -2, 1, 2]) == 0
assert mean_list([]) == 0


ZeroDivisionError: division by zero

Thus, we might rely on _Automatic Testing_, which consists on writing a great amount of tests that can then be executed as many times as wanted. Automated tests will eventually discover things to be fixed, so you should modify your code to make the testing, and therefore, your code, more robust.

## Acceptance Testing
These tests, as you might notice are testing small pieces of code. However we will need to test the code as a whole. This is called __Acceptance Testing__, which is often performed by business stakeholders. They can also be autommated using _end-to-end testing_ to make sure that a list of actions are carried out.

> Acceptance testing: Is the API satisfying the stakeholders needs?

As you can see, __Acceptance Testing__ is not very granular, and in fact, is the least granular of all the testing techniques. 

At a lower level of granularity we find __System Testing__
## System Testing
System Testing is a testing technique that is used to test the entire system. This is done by testing the entire system from the very beginning and then adding features one by one.

> System Testing: Is the API available and performing as expected?

System Testing is still not very granular, since it takes the whole system

## Integration Testing
Integration Testing is a testing technique that is used to test the interconnection between different parts of the system. This is done by testing parts of the code framed as scenarios.

> Integration Testing: Is every endpoint working as expected?

<br>

## Unit Testing
The lower level of granularity is __Unit Testing__. Unit Testing is a testing technique that is used to test individual units of code. It is usually done by the developers themselves, and it consists on testing specific functions/methods to see that they return assumed values/do assumed things

> Unit Testing: Is each code module working properly?

## Pyramid Testing

Let's recap the type of testing we've seen so far:

![](images/pyramid_testing.png)

We can see that unit test is the basis of testing design. Thus, in this notebook we will focus on this type of testing using the unittest module.

## Regression Testing

Regression testing is not a testing per se, but rather an approach to develop your application. Regression testing consists on adding tests to your collection as you develop your application and find more and more bugs. This collection is named _test suite_, and developers run them in a _continuous integration_ (CI) server. Some of the most famous CI servers are [Jenkins](https://www.jenkins.io), [Travis CI](https://travis-ci.org) or [CircleCI](https://circleci.com).

# Unit Testing

> ## A Unit is a small, fundamental piece of software

Unit testing seeks to verify that all the individual units of code in your application work correctly. We can implement unit testing using the `unittest` module.



## Unit Testing using `unittest`

Unittest is Python's built-in testing framework. Despite its name, it can also be used for integration testing. The module provides features for making assertions about the behaviour of code, and for comparing the behaviour of code to a known result. It also includes the tool for running tests.

To start testing your code using unittest, you need to create a class that inherits from `unittest.TestCase`. By convention, the name of the class should have this syntax:
```
class <Object>TestCase(unittest.TestCase):
```
Where `<Object>` is the name of the class or function you want to test. For example, if you are testing a scraper, the name of the class would be `ScraperTestCase`. In this case, we are going to test the Date class in Date.py

This class will have methods, and each method will assert a certain aspect of the behaviour of the class. For example, you can test the `__str__` method, which should return a string representation of the date.

Unittest will include assertion methods that you can use to compare the output with the expected output. It will also include decorators to skip tests if a condition is met.

The final step will be running the test. To do this, we need to import the `unittest` module, and then, we can run the test in the script using the `unittest.main()` function, or in the command line using the `python -m unittest` command. The latter command is the most common way to run tests in Python, but the first one is useful when developing a script. Thus, in this notebook, we will see how to perform both

## Creating the Test class

Before we test something, we need something to test! Luckily, we have a script named `product.py` that we can use to test. Take a look at that script and try to figure out what it does.

In [3]:
from example.product import Product
import unittest

class ProductTestCase(unittest.TestCase):
    pass
# unittest.main looks at sys.argv by default, which is what started IPython, hence the error about the kernel connection file not being a valid attribute. You can pass an explicit list to main to avoid looking up sys.argv.
# In the notebook, you will also want to include exit=False to prevent unittest.main from trying to shutdown the kernel process:
# If you are using a script, you don't need to pass any argument to main.

unittest.main(argv=['first-arg-is-ignored'], exit=False)


----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK


<unittest.main.TestProgram at 0x7f948f3775e0>

The report says that there are no errors in the code, which makes sense, because we don't have any!

As mentioned, we can use the `unittest` module in the Command Line to test our code. To do this, we need to test a script whose name starts with `test_` (we can actually change that convention, but it's not recommended).

Go create a new file called `test_product.py`. We will see how to organize the tests in the next notebook (the infamous Notebook 6). In the file include the same ProductTestCase:

```
from Cart.product import Product
import unittest

class ProductTestCase(unittest.TestCase):
    pass
```

Then, in the command line type `python -m unittest`. This will detect all modules whose name start with test, and inside there, it will look at tests established in the class. The output you will see now is the same as in the cell:

```
----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK
```

Which makes sense, because we haven't programmed any test yet.

## Your first unit test

Let's populate the ProductTestCase. When we run a unit test, the module will check the assertions included in the methods. Take into account that an error due to other issue (for example a syntax error) will not count as a failed test.

In [13]:
from example.product import Product
import unittest

class ProductTestCase(unittest.TestCase):
    def test_transform_name(self):
        small_black_shoes = Product('shoes', 'S', 'black')
        expected_value = 'SHOES'
        actual_value = small_black_shoes.transform_name_for_sku()
        self.assertEqual(expected_value, actual_value)

unittest.main(argv=[''], verbosity=0, exit=False)

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


<unittest.main.TestProgram at 0x7f948f25f550>

# This is a big font
## This is a smaller one

### Try it yourself (20 min)

- Try changing the name of the method to something that doesn't contain test at the beginning

    - Add the 'incorrect' function to the `test_product.py` script.
    - Run unittest on the command line <br>
        <br>
        - Go back to the 'right' function and try changing the expected value
            - Add the 'right' function to the `test_product.py` script.
            - Run unittest on the command line<br>
        <br>
        - Change functionality to the `transform_name_for_sku()` method:
            - It will now Capitalize the name of the product instead of UPPERCASING it
            - Run the same test again
            - Observe the errors
            - This will happen every time you make a slight change to your code, and it might be annoying changing the test every time. But imagine what will happen when you start adding more and more code. A slight change can cause a huge snowball effect!<br>
        <br>
        - Add two more test methods:
            - test_transform_color_for_sku, which asserts that the colour is well written after using `transform_color_for_sku()`
            - test_generate_sku, which asserts that the SKU is well written after using `generate_sku()`
            - for both cases, use `Product('shoes', 'S', 'black')`

## setUp() and tearDown()

In the last part of the exercise, you had to set the same product `Product('shoes', 'S', 'black')` twice, and added to the already existing product in `test_transform_name_for_skull`, that's three times. Is there a way to use the same scenario for each test? Well, of course there is, otherwise, I wouldn't be writting this. You can do it using the method setUp(), which will run at the beginning of each test. The opposite of setUp() is tearDown(), which is run everytime the test finishes.

In [13]:
from example.product import Product
import unittest

class ProductTestCase(unittest.TestCase):
    # Initialize the scenario for your test
    def setUp(self):
        self.product = Product('shoes', 'S', 'black')
    def test_transform_name(self):
        expected_value = 'SHOES'
        actual_value = self.product.transform_name_for_sku()
        self.assertEqual(expected_value, actual_value)
    # Finish 
    def tearDown(self):
        del self.product

unittest.main(argv=[''], verbosity=2, exit=False)

test_add_and_remove_product (__main__.ShoppingCartTestCase) ... ok
test_transform_name (__main__.TestProduct) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.024s

OK


<unittest.main.TestProgram at 0x7fec11718df0>

In this case tearDown is not very useful, because we don't care about product once we finish the test anyway. Another example is using setUp and tearDown to open and close a file:

In [16]:
import unittest


class FileTestCase(unittest.TestCase):
    def setUp(self):
        self.handle = open("Cart/product.py", "r")

    def test_length(self):
        length = len(self.handle.readlines())
        self.assertTrue(length > 20)

    def tearDown(self):
        self.handle.close()

unittest.main(argv=[''], verbosity=2, exit=False)

test_length (__main__.FileTestCase) ... ok
test_add_and_remove_product (__main__.ShoppingCartTestCase) ... ok
test_transform_name (__main__.TestProduct) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.007s

OK


<unittest.main.TestProgram at 0x7fec12830b80>

## Assertions

Python has assertion statements by default that you can use in your code to make sure a condition is met. It works similar to an `if/else` statement, but in this case, if the condition is not fulfilled, the code will directly raise an error.

In [6]:
ivan = 'dumb'
assert ivan == 'dumb' # The assertion doesn't raise an error

In [8]:
assert ivan == 'smart' # The assertion raises an error. I guess I should study more

AssertionError: 

unittest has its own methods for creating these assertions, and that way, when running a test, it will add up each failed assertion to a total. Examples of these assertions are:

- `assertEqual(a, b)`
- `assertNotEqual(a, b)`
- `assertTrue(x)`
- `assertFalse(x)`
- `assertIsInstance(a, b)`
- `assertIn(a, b)`
- `assertAlmostEqual(a, b)` like `round(a-b, 7) == 0`
- `assertGreaterEqual(a, b)`
- `assertRegex(s, r)` like `r.search(s)`
- `assertMultiLineEqual(a, b)` compares two strings
- `assertDictEqual(a, b)` compares two dictionaries
- `assertSequenceEqual(a, b)` compares two sequences


Take a look at this page for a more [exhaustive list](https://docs.python.org/3/library/unittest.html#module-unittest)

## Write your first test suite

The `product.py` file was fairly simple, so let's spice things up. We are going to add these products to a Shopping Cart, so let's create a new module for said Cart. Thanks to the magic of preparing lessons, the package `Cart` already has a `cart.py` module. The ShoppingCart class has two methods: `add_product`, and `remove_product`. The product(s) we will add are products from the Product class.

So, if everything is fine, we should be able to add a product, and after removing, the cart will be empty.

In [9]:
import unittest
from example.cart import ShoppingCart
from example.product import Product

class ShoppingCartTestCase(unittest.TestCase):
    def test_add_and_remove_product(self):
        cart = ShoppingCart()
        product = Product('Polo', 'S', 'Navy Blue')
        
        cart.add_product(product)
        cart.remove_product(product)
        # Check if the products attribute is empty
        # The assertDictEqual check if two dicts are equal
        self.assertDictEqual({}, cart.products) 

unittest.main(argv=[''], verbosity=2, exit=False)

test_add_and_remove_product (__main__.ShoppingCartTestCase) ... ok
test_transform_name (__main__.TestProduct) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.037s

OK


<unittest.main.TestProgram at 0x7fec11a19fd0>

Looks great! Something that you might have noticed is that it ran three tests, but we just specified one. This is because once you run a test, that Test class will be stored if you are working in a notebook, or they will all be ran if you are working in the command line. 

This is actually one of the beauties of unittest, as you start writing tests, they will be added to your _test suite_, so next time you change your code, you can simply run the _test suite_ to check all functions at once.

One thing that you might not notice is that we are already implementing integration testing, the second level of granularity in the pyramid testing. Observe that `test_add_and_remove_product` calls for both `generate_sku` from `product.py`, and `add_product` and `remove_product` from `cart.py`. You will have the chance to add more tests to the test suite during the challenges.


## Skipping tests

We can skip certain tests given specific consitions using decorators:

In [None]:
import sys

class MyTestCase(unittest.TestCase):
    @unittest.skip("demonstrating skipping")
    def test_nothing(self):
        # This line will not run at all
        self.fail("shouldn't happen")

    @unittest.skipIf(sys.version_info.major < 3, "Not supported for Python 2")
    def test_py3_format(self):
        self.assertEqual("{}".format("aaa"), "aaa")
        pass

    @unittest.skipUnless(sys.platform.startswith("win"), "Windows required")
    def test_windows_support(self):
        # windows specific testing code
        pass

    def test_maybe_skipped(self):
        # Skip test from within the function body
        if 5 == 5:
            self.skipTest("Yes, 5 is equal to 5 so we skip")
        # test code which would run if 5 != 5 (essentially never, we know)
        ...
        
unittest.main(argv=[''], verbosity=2, exit=False)

## Hypothesis

> __Hypothesis is a `python` library for `property-based testing` which is easier, more powerful and has way larger test cases coverage than standard unit testing__

Difference between `unit testing` and `property-based testing` is fantastically explained by [Hypothesis Welcome Page](https://hypothesis.readthedocs.io/en/latest/):

__Think of a normal unit test as being something like the following:__

1. Set up some data.
2. Perform some operations on the data.
3. Assert something about the result.

__Hypothesis lets you write tests which instead look like this:__

1. For all data matching some specification.
2. Perform some operations on the data.
3. Assert something about the result.

This idea was popularized by [Haskell](https://www.haskell.org/) (purely functional programming language) library [QuickCheck](https://hackage.haskell.org/package/QuickCheck).

> __Hypothesis generates testing data based on your specification and checks whether guarantees you want to give hold true.__

### Installation

As per usual, one can install `hypothesis` via `pip` or [`conda`](https://anaconda.org/conda-forge/hypothesis).

There are also a few extensions provided for scientific stack especially (like `numpy` or `pandas`).

To install `hypothesis` with `numpy` generation strategies via `pip` one could do 

### General

First, let's set up an example which `encode`s the string and `decode`s it:

In [None]:
def encode(input_string):
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst


def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q

It should be fairly obvious, that `encode(decode(<string>))` should return original `<string>`.

Hypothesis can generate `<string>` examples for us (just like unit tests, but way easier and automated) using:
- `strategy` - way to create testing data (in this case `text`)
- `given` - generate samples from the specified strategy

With that in mind, let's see how we could do that:

In [None]:
# Change that to unittest

from hypothesis import given
import hypothesis.strategies as st

class TestEncoding(unittest.TestCase):
    @given(st.text())
    def test_decode_inverts_encode(self, s):
        self.assertEqual(decode(encode(s)), s)
        
unittest.main(argv=[''], verbosity=2, exit=False)

First of all notice how easy it is to mix `unittest` with `hypothesis` to create way more comprehensive test suite.

You can see that for our string encoding function fails when the input is an empty string. If we fix the above code by appending
the check for empty string:

In [None]:
def encode(input_string):
    # This is an example fix
    if input_string == "":
        return []
    
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst


def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q

And re-running the test (`@example` specifies this example will always be run, good for catching edge cases):

In [None]:
unittest.main(argv=[''], verbosity=2, exit=False)

Generated tests pass correctly.

> __Hypothesis is smart about running tests, IT WILL ONLY RUN THE FAILED CASES (as it has it's own internal database)!__

### Hypothesis tricks

A few useful things, which should help you with your tests:

> __`filter` values generated by a strategy__

In [None]:

@given(st.integers().filter(lambda x: x % 2 == 0))
def test_even_integers(i):
    pass


> __`assume` that input is/is not something__

> __NOTE:__ Hypothesis will fail if your assumptions get rid of too many generated samples

Test will not be marked as `failing` __if the assumption is `false`__:
from math import isnan

In [None]:
@given(st.floats())
def test_negation_is_self_inverse_for_non_nan(x):
    assume(not isnan(x))
    assert x == -(-x)

> __strategies are highly customizable__

One can specify a lot of parameters for the strategies, for example:
st.integers(min_value=0, max_value=10).example()
> __`given` can specify some/all argument to function via `kwargs` or `args`__

With the following signature:

```
hypothesis.given(*_given_arguments, **_given_kwargs)
```

Valid cases could be (amongst others):

In [None]:
@given(st.integers(), st.integers())
def a(x, y):
    pass


@given(st.integers())
def b(x, y):
    pass


@given(y=st.integers())
def c(x, y):
    pass

# Summary

- Testing your code is crucial as your software starts growing
- Good test suites are key for a continuous integration (CI) environment
- Non-Functional testing checks parameters such as performance and security
- Functional testing checks that your code works as intended with no issues
- Unit testing is the lowest level of granularity in your testing schema
- You can use `unittest` to create unit testing and integration testing
- As you create more test classes, the test suite will grow bigger
- Instead of giving a whole range of values, you can use the `hypothesis` library to set a strategy for a test

# Challenges

### Q1. The Stock Keeping Unit has to be updated, so if you add products or colours with whitespaces, the SKU removes them:
`'tank top', 'm', 'navy blue'` should look `'TANKTOP-M-NAVYBLUE'`
### Q2. The Stock Keeping Unit has to be updated, so if you add products with dashes, the SKU removes them:
`'tshirt', 'm', 'navy blue'` should look `'TSHIRT-M-NAVYBLUE'`
### Q3. Run the test_product you created during the lesson. Modify it so you don't get any failed test
### Q4. Add the following tests to the test suite:
#### $\qquad$ 1. `test_cart_initially_empty()`: tests that, after creating the cart instance, it starts as an empty dictionary
#### $\qquad$ 2. `test_add_prodcut()`: tests that, after adding a product to the cart, cart.products will be equal to a dictionary like this: `{'SHOES-S-BLUE': {'quantity': 1}}`
#### $\qquad$ 3. `test_add_two_of_a_product`: tests that, after adding a product passing the argument `quantity` equals 2, cart.product has a dictionary whose quantity is 2
#### $\qquad$ 4. `test_add_two_different_products`
### Q5. Add the following test to the test suite:
```
def test_remove_too_many(self):
    cart = ShoppingCart()
    product = Product('shoes', 'S', 'blue')
    
    cart.add_product(product)
    cart.remove_product(product, quantity=2)

    self.assertDictEqual({}, cart.products)
```
#### Does it run fine? How can you fix it?

### Q6. Decorate with hypothesis at least two of the tests in your test suite


# Assessments:

### 1. Look for Behaviour-Driven Development
Take a look at [Cucumber.io](https://cucumber.io) 
### 2. Learn basics of `pytest` (documentation [here](https://docs.pytest.org/en/latest/contents.html))
### 3. What is `pytest`'s [mark.parametrize](https://docs.pytest.org/en/stable/parametrize.html)? 
### 4. What is [Test Driven Development](https://en.wikipedia.org/wiki/Test-driven_development) and what are the general steps needed to follow this approach?
### 5. What is [Unittest Mocking](https://docs.python.org/3/library/unittest.mock.html) and what are the general steps needed to follow this approach?