[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gao-hongnan/gaohn-mlops-docs/blob/main/docs/mlops_docs/testing/07_testing.ipynb)

In [None]:
import numpy as np
import pytest
import torch

## Intuition

Tests are a way for us to ensure that something works as intended. We're incentivized to implement tests and discover sources of error as early in the development cycle as possible so that we can reduce [increasing downstream costs](https://assets.deepsource.io/39ed384/images/blog/cost-of-fixing-bugs/chart.jpg) and wasted time. Once we've designed our tests, we can automatically execute them every time we implement a change to our system and continue to build on them over time. In this lesson, we'll learn how to test machine learning code, data and models to construct a system that we can reliably iterate on.

## Types of tests

There are many four majors types of tests which are utilized at different points in the development cycle:

- Unit tests: tests on individual components that each have a single responsibility (ex. function that filters a list).
- Integration tests: tests on the combined functionality of individual components (ex. data processing).
- System tests: tests on the design of a system for expected outputs given inputs (ex. training, inference, etc.).
- Acceptance tests: tests to verify that requirements have been met, usually referred to as User Acceptance Testing (UAT).
- Regression tests: testing errors we've seen before to ensure new changes don't reintroduce them.

## How should we test?

The framework to use when composing tests is the [Arrange Act Assert methodology](http://wiki.c2.com/?ArrangeActAssert).

- Arrange: set up the different inputs to test on.
- Act: apply the inputs on the component we want to test.
- Assert: confirm that we received the expected output.

## What should we be testing for?

An example:

> When arranging our inputs and asserting our expected outputs, what are some aspects of our inputs and outputs that we should be testing for?

- inputs: data types, format, length, edge cases (min/max, small/large, etc.)
- outputs: data types, formats, exceptions, intermediary and final outputs

## Best practices

Regardless of the framework we use, it's important to strongly tie testing into the development process.

- atomic: when creating unit components, we need to ensure that they have a [single responsibility](https://en.wikipedia.org/wiki/Single-responsibility_principle) so that we can easily test them. If not, we'll need to split them into more granular units.

- compose: when we create new components, we want to compose tests to validate their functionality. It's a great way to ensure reliability and catch errors early on.

- regression: we want to account for new errors we come across with a regression test so we can ensure we don't reintroduce the same errors in the future.

- coverage: we want to ensure that 100% of our codebase has been accounter for. This doesn't mean writing a test for every single line of code but rather accounting for every single line (more on this in the coverage section below).

- automate: in the event we forget to run our tests before committing to a repository, we want to auto run tests for every commit. We'll learn how to do this locally using pre-commit hooks and remotely (ie. main branch) via GitHub actions in subsequent lessons.

## Test-driven development or Otherwise?

[Test-driven development (TDD)](https://en.wikipedia.org/wiki/Test-driven_development) is the process where you write a test before completely writing the functionality to ensure that tests are always written. This is in contrast to writing functionality first and then composing tests afterwards. Here are my thoughts on this:

- good to write tests as we progress, but it's not the representation of correctness.

- initial time should be spent on design before ever getting into the code or tests.

- using a test as guide doesn't mean that our functionality is error free.

Perfect coverage doesn't mean that our application is error free if those tests aren't meaningful and don't encompass the field of possible inputs, intermediates and outputs. Therefore, we should work towards better design and agility when facing errors, quickly resolving them and writing test cases around them to avoid them next time.


## Pytest

We're going to be using [pytest](https://docs.pytest.org/en/stable/) as our testing framework for it's powerful builtin features such as parametrization, fixtures, markers, etc.

### Configuration

Pytest expects tests to be organized under a `tests` directory by default. However, we can also use our `pyproject.toml` file to configure any other test path directories as well. Once in the directory, pytest looks for python scripts starting with `tests_*.py` but we can configure it to read any other file patterns as well.

```toml
# Pytest
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
```

### Assertions

Simple assertion testing example.

In [None]:
from pathlib import Path

# Creating Directories
BASE_DIR = Path("__file__").parent.absolute()

SRC_DIR = Path.joinpath(BASE_DIR, "src")
TEST_DIR = Path.joinpath(BASE_DIR, "tests")
SRC_DIR.mkdir(parents=True, exist_ok=True)
TEST_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
%%writefile {BASE_DIR}/pyproject.toml
# Pytest
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"

Writing /content/pyproject.toml


In [None]:
%%writefile {SRC_DIR}/__init__.py
"init file"

Writing /content/src/__init__.py


In [None]:
%%writefile {SRC_DIR}/fruits.py
def is_crisp(fruit):
    if fruit:
        fruit = fruit.lower()
    if fruit in ["apple", "watermelon", "cherries"]:
        return True
    elif fruit in ["orange", "mango", "strawberry"]:
        return False
    else:
        raise ValueError(f"{fruit} not in known list of fruits.")
    return False

Writing /content/src/fruits.py


In [None]:
%%writefile {TEST_DIR}/test_fruits.py
import pytest
import sys 
sys.path.append("/content") # append to import properly.
from src.fruits import is_crisp
def test_is_crisp():
    assert is_crisp(fruit="apple") #  or == True
    assert is_crisp(fruit="Apple")
    assert not is_crisp(fruit="orange")
    with pytest.raises(ValueError):
        is_crisp(fruit=None)
        is_crisp(fruit="pear")

Writing /content/tests/test_fruits.py


In [None]:
!pytest                                      # all tests
!pytest tests/                               # tests under a directory
!pytest tests/test_fruits.py                 # tests for a single file
!pytest tests/test_fruits.py::test_is_crisp  # tests for a single function

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1
rootdir: /content, inifile:
plugins: typeguard-2.7.1
[1mcollecting 0 items                                                             [0m[1mcollecting 1 item                                                              [0m[1mcollected 1 item                                                               [0m

tests/test_fruits.py .[36m                                                   [100%][0m

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1
rootdir: /content, inifile:
plugins: typeguard-2.7.1
collected 1 item                                                               [0m

tests/test_fruits.py .[36m                                                   [100%][0m

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1
rootdir: /content, inifile:
plugins: typeguard-2.7.1
collected 1 item                                                               [0m

tests/test_f

### Classes

See [examples from madewithml repo](https://github.com/GokuMohandas/follow/blob/testing/tests/tagifai/test_data.py) to understand better.

### Interfaces

See [madewithml interface section](https://madewithml.com/courses/mlops/testing/#interfaces).

### Parametrize

So far, in our tests, we've had to create individual assert statements to validate different combinations of inputs and expected outputs. However, there's a bit of redundancy here because the inputs always feed into our functions as arguments and the outputs are compared with our expected outputs. To remove this redundancy, pytest has the [`@pytest.mark.parametrize`](https://docs.pytest.org/en/stable/parametrize.html) decorator which allows us to represent our inputs and outputs as parameters.

Let us create a new python file `test_fruits_parametrize.py` to test it out.

In [None]:
%%writefile {TEST_DIR}/test_fruits_parametrize.py
import pytest
import sys 
sys.path.append("/content") # append to import properly.
from src.fruits import is_crisp

@pytest.mark.parametrize(
    "fruit, crisp",
    [
        ("apple", True),
        ("Apple", True),
        ("orange", False),
    ],
)
def test_is_crisp_parametrize(fruit, crisp):
    assert is_crisp(fruit=fruit) == crisp

@pytest.mark.parametrize(
    "fruit, exception",
    [
        ("pear", ValueError),
    ],
)
def test_is_crisp_exceptions(fruit, exception):
    with pytest.raises(exception):
        is_crisp(fruit=fruit)

Overwriting /content/tests/test_fruits_parametrize.py


To fix line number, but for now the line number starts from the decorator `@pytest.mark.parametrize`.

- [Line 2]: define the names of the parameters under the decorator, ex. "fruit, crisp" (note that this is one string). Note that this string names should correspond to the function defined under the decorator.

- [Lines 3-7]: provide a list of combinations of values for the parameters from Step 1.

- [Line 9]: pass in parameter names to the test function.

- [Line 10]: include necessary assert statements which will be executed for each of the combinations in the list from Step 2.

- [Line 12-20]: this tests exception handling as well if you pass in as such.

In [None]:
!pytest tests/test_fruits_parametrize.py  # tests for a single function

platform linux -- Python 3.7.13, pytest-3.6.4, py-1.11.0, pluggy-0.7.1
rootdir: /content, inifile:
plugins: typeguard-2.7.1
[1mcollecting 0 items                                                             [0m[1mcollecting 4 items                                                             [0m[1mcollected 4 items                                                              [0m

tests/test_fruits_parametrize.py ....[36m                                    [100%][0m



### Fixtures

[What's the benefits of using fixtures?](https://realpython.com/pytest-python-testing/#fixtures-managing-state-and-dependencies)

One obvious reason that I know of is about reducing the redundancies of re-defining inputs every time.

In [None]:
import numpy as np 

def add(nums_list):
    return np.sum(nums_list)


def mul(nums_list):
    return np.prod(nums_list)

def test_add():
    nums_list = [1, 2, 3, 4, 5]
    assert add(nums_list) == 15

def test_mul():
    nums_list = [1, 2, 3, 4, 5]
    assert add(nums_list) == 120

Notice that you defined `nums_list` twice when we want to test different functions with the **same inputs**. So to reduce this redundancy, we can do:

In [None]:
import pytest

@pytest.fixture
def sample_nums_list():
    nums_list = [1, 2, 3, 4, 5]
    return nums_list 

## Unit Test

### Mock

Readings:

- https://realpython.com/python-mock-library/
- https://docs.python.org/3/library/unittest.mock.html (READ THE API)
- PeekingDuck `draw.poses` test suites.

## Example Walkthrough

> **The example walkthrough assumes you have a basic understanding of pytests.**

### Problem Setup

In the field of object detection, given a query image, our task is to **localize** and **classify**. **Localization** needs labels and they are in the form of **bounding boxes**.

As an example, we take the image from [albumentations](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/), this image shows us a cat with a bounding box drawn around it. The coordinates are marked accordingly. One thing to note is that the image coordinates' origin starts at the top-left corner, so it is like a cartesian coordinate but rotated clockwise by 90 degrees.

![cat_bbox_example](https://albumentations.ai/docs/images/getting_started/augmenting_bboxes/bbox_example.jpg)

The bounding boxes can be represented in various different formats. Most notably, the Pascal-VOC, COCO and the YOLO format. More information can be found [here](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/).

Our task is two fold, one is to create some utility functions that transform from one format to another, and the other is to write some unit testing to ensure the correctness of our codes.

For our purpose, we will deal with two formats, given a bounding box of Pascal-VOC format, we want to convert it to YOLO format, and vice versa. 

First, we briefly quote from [albumentations](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/) on the two formats.

Pascal-VOC is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: `[x_min, y_min, x_max, y_max]`. `x_min` and `y_min` are coordinates of the top-left corner of the bounding box. `x_max` and `y_max` are coordinates of bottom-right corner of the bounding box. For our purpose, we will call them `xyxy` for abbreviation.

YOLO format's bounding box is represented by four values `[x_center, y_center, width, height]`. `x_center` and `y_center` are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box. They are normalized as well. For our purpose, we will call them `xywhn` for abbreviation.

More concretely, if an image is of heigth of 480 and width 640 with its bounding box in Pascal-VOC format with coordinates

```python
xyxy = [98, 345, 420, 462] # x1 y1 x2 y2
```

we want to transform it to the equivalent YOLO coordinates

```python
xywhn = [0.4046875, 0.840625, 0.503125, 0.24375] # x y w h
```

and vice versa. 

As of now, we have verified by hand that `[98, 345, 420, 462]` indeed convert correctly to `[0.4046875, 0.840625, 0.503125, 0.24375]` with the given height and width. But as programmers, our task is to reduce manual work, let's write out the functions to convert the bounding boxes in between formats.

### Step 0. Setting up Folders/Scripts

In [None]:
%%bash
mkdir -p bbox

In [None]:
%cd /content/bbox

/content/bbox


In [None]:
from pathlib import Path

# Creating Directories
BASE_DIR = Path("__file__").parent.absolute()

SRC_DIR = Path.joinpath(BASE_DIR, "src")
TEST_DIR = Path.joinpath(BASE_DIR, "tests")
SRC_DIR.mkdir(parents=True, exist_ok=True)
TEST_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
%%writefile {SRC_DIR}/__init__.py
"init file"

Overwriting /content/bbox/src/__init__.py


In [None]:
%%writefile {BASE_DIR}/pyproject.toml
# Pytest
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"

Overwriting /content/bbox/pyproject.toml


### Step 1. Writing our functions

We will create two functions `xyxy2xywhn` and `xywhn2xyxy`, the former takes in `inputs` of Pascal-VOC style bounding box and **transforms** it to its equivalent YOLO format, while the latter does the opposite.

We will write our functions into `src/bbox_utils.py`.

In [None]:
%%writefile {SRC_DIR}/bbox_utils.py
from typing import Union

import numpy as np
import torch

BboxType = Union[np.ndarray, torch.Tensor]


def cast_to_float(inputs: BboxType) -> BboxType:
    if isinstance(inputs, torch.Tensor):
        return inputs.float()
    return inputs.astype(np.float32)


def clone(inputs: BboxType) -> BboxType:
    if isinstance(inputs, torch.Tensor):
        return inputs.clone()
    return inputs.copy()

def xyxy2xywhn(inputs: BboxType, height: float, width: float) -> BboxType:
    outputs = clone(inputs)
    outputs = cast_to_float(outputs)

    outputs[..., [0, 2]] /= width
    outputs[..., [1, 3]] /= height

    outputs[..., 2] -= outputs[..., 0]
    outputs[..., 3] -= outputs[..., 1]

    outputs[..., 0] += outputs[..., 2] / 2
    outputs[..., 1] += outputs[..., 3] / 2

    return outputs


def xywhn2xyxy(inputs: BboxType, height: float, width: float) -> BboxType:
    outputs = clone(inputs)
    outputs = cast_to_float(outputs)

    outputs[..., [0, 2]] *= width
    outputs[..., [1, 3]] *= height

    outputs[..., 0] -= outputs[..., 2] / 2
    outputs[..., 1] -= outputs[..., 3] / 2
    outputs[..., 2] += outputs[..., 0]
    outputs[..., 3] += outputs[..., 1]

    return outputs

Overwriting /content/bbox/src/bbox_utils.py


After writing the functions, we can simply just test the **correctness** of the transformation by passing in our pre-defined ground truths defined earlier.

For example, when I pass in `xyxy = [98, 345, 420, 462]` to `voc2yolo`, I expect it `voc2yolo(xyxy, 480, 640)` to output `xywhn = [0.4046875, 0.840625, 0.503125, 0.24375]`.

Without using any library, we can simply do something like the following, using an `assertion` 

In [None]:
%%writefile --append {SRC_DIR}/bbox_utils.py

xyxy = np.asarray([98, 345, 420, 462])
xywhn = np.asarray([0.4046875, 0.840625, 0.503125, 0.24375])

assert np.allclose(xyxy2xywhn(xyxy, height=480, width=640), xywhn, rtol=1e-05, atol=1e-08)
assert np.allclose(xywhn2xyxy(xywhn, height=480, width=640), xyxy, rtol=1e-05, atol=1e-08)

Appending to /content/bbox/src/bbox_utils.py


In [None]:
%%bash
python src/bbox_utils.py

The assertion passed!

This may seem fine, but it is very hard to scale up when you add in more transformations, imagine having 10 pairs of transformation functions, you will need to do the assertion 20 times.

Furthermore, functions like these often have implicit assumptions that need to be rigourously tested as well.

For example, we defined `BboxType = Union[np.ndarray, torch.Tensor]` and type hinted our functions' `inputs` and `outputs` to be both of this type. In particular, when a user pass in an array of type `torch.Tensor`, I expect the output to be of the same type as in the input. This is important as many operations performed on `torch.Tensor` does not carry forward to their `np.ndarray` counterpart. ***Our assert statement above does not check this, and this will be a problem. See example below.***

```python
def xywhn2xyxy(inputs: BboxType, height: float, width: float) -> BboxType:
    outputs = clone(inputs)
    outputs = cast_to_float(outputs)

    outputs[..., [0, 2]] *= width
    outputs[..., [1, 3]] *= height

    outputs[..., 0] -= outputs[..., 2] / 2
    outputs[..., 1] -= outputs[..., 3] / 2
    outputs[..., 2] += outputs[..., 0]
    outputs[..., 3] += outputs[..., 1]

    if isinstance(outputs, torch.Tensor):
        outputs = outputs.detach().cpu().numpy()

    return outputs
```

Imagine if we coded our `xywhn2xyxy` as such, where we hard coded a conversion of `outputs` to `numpy`. Then the user will experience errors down stream.

```python
xywhn_tensor = torch.tensor([0.1, 0.2, 0.3, 0.4])
xyxy_tensor = yolo2voc(yolo)

unsqueeze_voc = voc.unsqueeze(0)
```

The user trusted that when he passed in an input of type `torch.Tensor`, the `outputs` he get will also be the same type, in which he performed an operation `unsqueeze`, unique to torch. An error ensues, since now his `yolo` became a `np.ndarray` instead.

Things get a bit more complicated when I also want to check that the input dimension is the same as the output dimension. For example, if I pass in a 3d-array as input, I expect the same dimension for its outputs. ***Our assert statement above does not check this.***

This is where PyTests come in 

### Step 2. Writing our tests

This section will be written into `tests/test_bbox_utils_before_refactor.py`. 

#### Defining Global Variables

We start first by importing the libraries and define some global constants.

In [None]:
%%writefile {TEST_DIR}/test_bbox_utils_before_refactor.py
import sys
from typing import Union

import numpy as np
import numpy.testing as npt
import pytest
import torch

sys.path.append("/content/bbox")  # append to import properly.
from src.bbox_utils import clone, xywhn2xyxy, xyxy2xywhn

# tolerance to assert_allclose
ATOL, RTOL = 1e-4, 1e-07

# image width and height
HEIGHT, WIDTH = 480, 640

xyxy = [98, 345, 420, 462]
xywhn = [0.4046875, 0.840625, 0.503125, 0.24375]

GT_BBOXES = {"xyxy": xyxy, "xywhn": xywhn}

Overwriting /content/bbox/tests/test_bbox_utils_before_refactor.py


- In `line 14`, we defined the tolerance level for `allclose`, where we allow some slight numerical differences.
- In `lines 17` we defined the height and width of the image.
- In `lines 19-20`, we defined our ground truth values.
- In `line 22`, we defined a global variable `GT_BBOXES`, a dictionary that holds the **bounding box format name** as key and its **ground truth** as values. Note that the ground truth values are equivalent in their own format.

#### Parametrize Input Types

We now want to test the correctness of the transformation functions we wrote.

Recall earlier we used 

```python
assert np.allclose(xyxy2xywhn(xyxy, height=480, width=640), xywhn, rtol=1e-05, atol=1e-08)
```

to test the correctness. This assumes that our input type is of `np.ndarray`, but since our functions allow `torch.Tensor` as well, we need to ensure that our test function can accept two types of input type, `np.ndarray` and `torch.Tensor`, and still work.

So for one transform function `xyxy2xywhn`, we need to test it ***twice***, one for which the input data type is a `np.ndarray`, the other when it's a `torch.Tensor`. This means we need to write more assertions!

Fortunately, as we have seen in the [pytest documentation](https://docs.pytest.org/en/6.2.x/parametrize.html), the decorator `pytest.mark.parametrize` does just that.

In [None]:
%%writefile --append {TEST_DIR}/test_bbox_utils_before_refactor.py

def list2numpy(input_list):
    return np.asarray(input_list)

def list2torch(input_list):
    return torch.tensor(input_list)

@pytest.mark.parametrize("convert_type", [list2numpy, list2torch])
def test_correct_transformation_xyxy2xywhn(convert_type):
    """Test correctness of conversion from VOC to YOLO."""
    from_bbox = convert_type(GT_BBOXES["xyxy"])
    to_bbox = xyxy2xywhn(from_bbox, height=HEIGHT, width=WIDTH)

    expected_bbox = convert_type(GT_BBOXES["xywhn"])
    
    if isinstance(to_bbox, torch.Tensor):
        torch.testing.assert_close(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)
    else:
        npt.assert_allclose(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)

Appending to /content/bbox/tests/test_bbox_utils_before_refactor.py


- `lines 3-7` consists of two utility functions, `list2numpy` and `list2torch`, which converts the ground truth bounding box inputs to either `numpy` or `torch` (note that the ground truth is created as a `list` so that the conversion is easy). 

- `line 9` defines the `pytest.mark.parametrize` decorator where
    - the *first argument* is a comma-delimited string of parameter names, this string will be the argument names in the function that follows. Here I named it `"convert_type"`;
    - the *second argument* will define what *values* the *first argument* can take on. This argument has type `List[Tuple[Any]]` or `List[Any]` or even single values `Any`. In our example, our first argument `convert_type` can take on values of either `list2numpy` or `list2torch`, so we should populate the second argument as a list of two elements: `[list2numpy, list2torch]`. 

- `line 10` is our function name `test_correct_transformation_xyxy2xywhn` and as the name suggests, it will test whether our conversion of `voc` to `yolo` is correct. Note that the argument is named `convert_type`, corresponding exactly to our *first argument* in the decorator.

- `line 12` is where we apply our argument `convert_type` to the input `GT_BBOXES["voc"] = [98, 345, 420, 462]`, the `parametrize` decorator will then apply `list2numpy` and `list2torch` this input and convert the `list` to a `np.ndarray` and `torch.Tensor` respectively. We name this input `from_bbox`.

- `line 13` will then convert the input using our function `voc2yolo` to its yolo equivalent format. We name this converted input `to_bbox`.

- `line 15` gets the ground truth for yolo. Note I need to convert them into the same type as the input ground truth using `convert_type`. We name this variable `expected_bbox`.

- `line 17-20` will then check if our converted bounding box input `to_bbox` matches the ground truth for yolo `expected_bbox` using `numpy.testing` and `torch.testing`.

The process does not stop here, since we passed in two values for the function `convert_type`, it will also loop through the `list2torch` step. We will see it in action now by running the `pytest` command.

Let's run `pytest` and see what happens.


In [None]:
!pytest -v tests/test_bbox_utils_before_refactor.py -s       # tests for a single file

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/bbox, inifile:
plugins: typeguard-2.7.1
collected 2 items                                                              [0m

tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2torch] [32mPASSED[0m



As we can see

```bash
$ tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2numpy] PASSED
$ tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2torch] PASSED
```

means that the test function has tested for both combinations, the case where the input is a `np.ndarray` and when it is a `torch.Tensor`, both passed the assertion!

As important thing to realize here is that we are only testing the ***correctness*** of this transformation. It does not test whether our transformation functions ensure the same return type as the inputs. 
For that, we need to write a test to ensure the outputs has the same type as the inputs.

In [None]:
%%writefile --append {TEST_DIR}/test_bbox_utils_before_refactor.py

@pytest.mark.parametrize("convert_type", [list2numpy, list2torch])
def test_correct_return_type_xyxy2xywhn(convert_type):
    from_bbox = convert_type(GT_BBOXES["xyxy"])
    to_bbox = xyxy2xywhn(from_bbox, height=HEIGHT, width=WIDTH)

    assert isinstance(to_bbox, type(from_bbox))

Appending to /content/bbox/tests/test_bbox_utils_before_refactor.py


In [None]:
!pytest -v tests/test_bbox_utils_before_refactor.py -s       # tests for a single file

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/bbox, inifile:
plugins: typeguard-2.7.1
collected 4 items                                                              [0m

tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn[list2torch] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_return_type_xyxy2xywhn[list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_return_type_xyxy2xywhn[list2torch] [32mPASSED[0m



Now we are sure that the `inputs` and `outputs` of our functions are of the same type!

#### Parametrize Consistent Dimensions

Our next step is to test that our transform functions can handle different dimensions. Whether the input is a 3d-tensor, or a 10d-array, all of them should work. Of course, our main goal here is still to test the correctness of the transformation, but bear in mind we need to have a separate test to check the consistency of input and output dimensions (i.e. passing in a 2d-array will result in an output of 2d-array).

Let's say we want to test if the code works for 3 dimensions, means checking if the code can execute correctly without error for dimensions in `[1d, 2d, 3d]`.

This is not trivial as we need to check for 6 different cases, a result of the cartesian product of

```
[list2numpy, list2torch] x [0, 1, 2] = {(list2numpy, 0), (list2numpy, 1), ...}
```

a total of 6 combinations.

We will continue to leverage `pytest`'s parametrize to test all 6 cases.

There will not be much change besides defining an extra utility function `expand_dim` will expands the input's dimensions according to the `num_dims` argument.

To be able to use the cartesian product, we simply add one more decorator below our `convert_type`, in which case it now takes in `num_dims` as first argument, and `[0, 1, 2]` as the second, indicating that we want the function to test for the aforementioned 3 dimensions. Having two parametrize decorators stacked together means it will execute in combination, exactly as what we wanted.

In [None]:
%%writefile --append {TEST_DIR}/test_bbox_utils_before_refactor.py

def expand_dim(
    bboxes: Union[np.ndarray, torch.Tensor],
    num_dims: int,
) -> Union[np.ndarray, torch.Tensor]:
    """Expand the dimension of bboxes to num_dims.

    Note:
        np.expand_dims will not work for tuple dim numpy < 1.18.0 which
        is not the version in our cicd.
    """
    bboxes = clone(bboxes)
    return bboxes[(None,) * num_dims]

@pytest.mark.parametrize("convert_type", [list2numpy, list2torch])
@pytest.mark.parametrize("num_dims", [0, 1, 2])
def test_correct_transformation_xyxy2xywhn_with_dims(convert_type, num_dims):
    """Test conversion from VOC to YOLO."""
    from_bbox = convert_type(GT_BBOXES["xyxy"])
    from_bbox = expand_dim(from_bbox, num_dims)

    to_bbox = xyxy2xywhn(from_bbox, height=HEIGHT, width=WIDTH)

    expected_bbox = expand_dim(convert_type(GT_BBOXES["xywhn"]), num_dims)
    
    if isinstance(to_bbox, torch.Tensor):
        torch.testing.assert_close(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)
    else:
        npt.assert_allclose(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)

Appending to /content/bbox/tests/test_bbox_utils_before_refactor.py


In [None]:
!pytest -v tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims -s       # tests for a single file

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/bbox, inifile:
plugins: typeguard-2.7.1
collected 6 items                                                              [0m

tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[0-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[0-list2torch] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[1-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[1-list2torch] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[2-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_dims[2-list2torch] [32mPASSED[0m



We see that a total of 6 results were tested by `test_correct_transformation_xyxy2xywhn_with_dims`. 

Notice that in each line they indicate the combination, for example, the first line says `test_voc2yolo[0-list2numpy] PASSED`, which means they tested for the case of `0` dimensions and the input type of `np.ndarray`.

We can also test whether the input dimensions must match the output dimensions. But since `np.testing.assert_allclose` will raise an error if their shape mismatch, we will skip over this. It may still be good practice to write test as we typically want one test to handle one type of error.

```python
a = np.asarray([1,2,3])
b = np.asarray([[1,2,3][)

np.testing.assert_allclose(a, b) -> raises an error
```

#### Creating Fixtures to Manage State and Dependencies

The idea of fixtures is that if you have multiple test functions that take in a "fixed" set of inputs (i.e. `GT_BBOXES`), then we should consider using fixtures.

In our past 2 test functions, `test_correct_return_type_xyxy2xywhn` and `test_correct_transformation_xyxy2xywhn`, we both used the same set of global constant `GT_BBOXES`, we can imagine now that for our testing on `xywhn2xyxy`, we would use this variable again.

For that we can write a function `gt_bboxes` that has `pytest.fixture` as decorator. Subsequently, we can pass `gt_bboxes` to any test functions.

For example,

```python
@pytest.fixture(scope="module")
def gt_bboxes():
    return GT_BBOXES

@pytest.mark.parametrize("convert_type", [list2numpy, list2torch])
@pytest.mark.parametrize("num_dims", [0, 1, 2])
def test_voc2yolo(gt_bboxes, convert_type, num_dims):
    from_bbox = convert_type(gt_bboxes["xyxy"])
    ...
```

where we simply defined a new fixture function called `gt_bboxes`, decorated with `pytest.fixture(scope="module")`. The [scopes](https://betterprogramming.pub/understand-5-scopes-of-pytest-fixtures-1b607b5c19ed) defines how frequent your fixture is called, expensive operations often require a higher scope.

However, the same result can be achieved with our old method, defining a global constant `GT_BBOXES` work as well. This is true in our case because `GT_BBOXES` is a relative cheap operation. But imagine if creating our `GT_BBOXES` is an expensive operation (i.e. `GT_BBOXES` involves creating a 10,000 by 10,000 array), then having it defined under a fixture with a proper scope is important.

For our purpose, imagine a scenario where we have 3 test files `test_1.py, test_2.py, test_3.py` which all need `GT_BBOXES`, then we can now put this fixture in a file called `conftest.py`, with `scope` to be defined at a modular level, when running your test suites across all 3 files, `gt_bboxes` will be called once and re-use it across all functions in these 3 files.

In [None]:
%%writefile {TEST_DIR}/conftest.py
import pytest 

xyxy = [98, 345, 420, 462]
xywhn = [0.4046875, 0.840625, 0.503125, 0.24375]

GT_BBOXES = {"xyxy": xyxy, "xywhn": xywhn}

@pytest.fixture(scope="module")
def gt_bboxes():
    return GT_BBOXES

Overwriting /content/bbox/tests/conftest.py


In [None]:
%%writefile --append {TEST_DIR}/test_bbox_utils_before_refactor.py

@pytest.mark.parametrize("convert_type", [list2numpy, list2torch])
@pytest.mark.parametrize("num_dims", [0, 1, 2])
def test_correct_transformation_xyxy2xywhn_with_fixture(gt_bboxes, convert_type, num_dims):
    """Test conversion from VOC to YOLO."""
    from_bbox = convert_type(gt_bboxes["xyxy"])
    from_bbox = expand_dim(from_bbox, num_dims)

    to_bbox = xyxy2xywhn(from_bbox, height=HEIGHT, width=WIDTH)

    expected_bbox = expand_dim(convert_type(gt_bboxes["xywhn"]), num_dims)
    
    if isinstance(to_bbox, torch.Tensor):
        torch.testing.assert_close(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)
    else:
        npt.assert_allclose(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)

Appending to /content/bbox/tests/test_bbox_utils_before_refactor.py


We now run the test on this function we created `test_correct_transformation_xyxy2xywhn_with_fixture`, the only difference with `test_correct_transformation_xyxy2xywhn` is that it now takes in an argument `gt_bboxes` corresponding to the fixture name, all instances of the global variable `GT_BBOXES` are changed to `gt_bboxes`.

In [None]:
!pytest -v tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture -s       # tests for a single file

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/bbox, inifile:
plugins: typeguard-2.7.1
collected 6 items                                                              [0m

tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[0-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[0-list2torch] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[1-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[1-list2torch] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[2-list2numpy] [32mPASSED[0m
tests/test_bbox_utils_before_refactor.py::test_correct_transformation_xyxy2xywhn_with_fixture[2-list2torch] [32mPASSED[0

There is no need to import `conftest` into any of our tests, `pytest` handles it for us.

### Step 3. Refactoring our tests

#### Using parametrize for different transforms

Now I actually have more than 10 transform functions that converts bounding boxes. This means I have to write 10 cases: `test_xyxy2xywhn`, `test_xywhn2xyxy`, ...

Most of the code inside the test functions are the same.

We can again leverage on parametrize and define the first argument to be `conversion_name` which takes on values such as `["xyxy2xywhn", "xywhn2xyxy"]`. These values will then be our identifier on which conversion/transform to use.



For that, we need to revamp our fixture `gt_bboxes`.

In [None]:
%%writefile {TEST_DIR}/conftest.py
import sys
import pytest 

sys.path.append("/content/bbox")  # append to import properly.
from src.bbox_utils import xywhn2xyxy, xyxy2xywhn

xyxy = [98, 345, 420, 462]
xywhn = [0.4046875, 0.840625, 0.503125, 0.24375]

@pytest.fixture(scope="module")
def gt_bboxes():
    return {"xyxy2xywhn": [xyxy, xywhn, xyxy2xywhn], "xywhn2xyxy": [xywhn, xyxy, xywhn2xyxy]}

Overwriting /content/bbox/tests/conftest.py


The changes are:

- The key is now exact name of the function call of the transformation. This means that if our transform function is `xyxy2xywhn`, then our key is called `"xyxy2xywhn"`. 
- The corresponding value is a list, where 
    - the first element is the ground truth of the input (i.e. `xyxy`), 
    - and the second element is the ground truth of the output after the conversion (i.e. `xywhn`),
    - the third element is the function call itself.

In [None]:
%%writefile {TEST_DIR}/test_bbox_utils_after_refactor.py
from typing import Union

import numpy as np
import numpy.testing as npt
import pytest
import torch

from src.bbox_utils import clone

CONVERT_TYPES = [np.array, torch.tensor]
NUM_DIMS = [0, 1, 2]
TRANSFORMS = ["xyxy2xywhn", "xywhn2xyxy"]

# tolerance to assert_allclose
ATOL, RTOL = 1e-4, 1e-07

# image height and width
HEIGHT, WIDTH = 480, 640

def list2numpy(input_list):
    return np.asarray(input_list)

def list2torch(input_list):
    return torch.tensor(input_list)

def expand_dim(
    bboxes: Union[np.ndarray, torch.Tensor],
    num_dims: int,
) -> Union[np.ndarray, torch.Tensor]:
    """Expand the dimension of bboxes to num_dims.

    Note:
        np.expand_dims will not work for tuple dim numpy < 1.18.0 which
        is not the version in our cicd.
    """
    bboxes = clone(bboxes)
    return bboxes[(None,) * num_dims]

@pytest.mark.parametrize("convert_type", CONVERT_TYPES)
class TestBboxTransforms:
    @pytest.mark.parametrize("conversion_name", TRANSFORMS)
    def test_correct_return_type(self, gt_bboxes, convert_type, conversion_name):
        from_bbox, _, conversion_fn = gt_bboxes[conversion_name]
        from_bbox = convert_type(from_bbox)

        to_bbox = conversion_fn(from_bbox, height=HEIGHT, width=WIDTH)

        assert isinstance(to_bbox, type(from_bbox))

    @pytest.mark.parametrize("num_dims", NUM_DIMS)
    @pytest.mark.parametrize("conversion_name", TRANSFORMS)
    def test_correct_transformation(
        self, gt_bboxes, convert_type, num_dims, conversion_name
    ):
        from_bbox, expected_bbox, conversion_fn = gt_bboxes[conversion_name]
        from_bbox = convert_type(from_bbox)
        from_bbox = expand_dim(from_bbox, num_dims)

        to_bbox = conversion_fn(from_bbox, height=HEIGHT, width=WIDTH)  

        expected_bbox = expand_dim(convert_type(expected_bbox), num_dims)

        if isinstance(to_bbox, torch.Tensor):
            torch.testing.assert_allclose(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)
        else:
            npt.assert_allclose(to_bbox, expected_bbox, atol=ATOL, rtol=RTOL)

Overwriting /content/bbox/tests/test_bbox_utils_after_refactor.py


In [None]:
!pytest -W ignore -v tests/test_bbox_utils_after_refactor.py -s       # tests for a single file

platform linux -- Python 3.7.14, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/bbox, inifile:
plugins: typeguard-2.7.1
[1mcollecting 0 items                                                             [0m[1mcollecting 16 items                                                            [0m[1mcollecting 16 items                                                            [0m[1mcollecting 16 items                                                            [0m[1mcollected 16 items                                                             [0m

tests/test_bbox_utils_after_refactor.py::TestBboxTransforms::test_correct_return_type[xyxy2xywhn-convert_type0] [32mPASSED[0m
tests/test_bbox_utils_after_refactor.py::TestBboxTransforms::test_correct_return_type[xyxy2xywhn-convert_type1] [32mPASSED[0m
tests/test_bbox_utils_after_refactor.py::TestBboxTransforms::test_correct_return_type[xywhn2xyxy-convert_type0] [32mPASSED[0m
t

We see that we achieved the same results.

## References & Citations

- https://madewithml.com/courses/mlops/testing/

    ```
    @misc{goku mohandas_2020, title={Testing Machine Learning Systems: Code, Data and Models - Made With ML}, url={https://madewithml.com/courses/mlops/testing/}, journal={Madewithml.com}, author={Goku Mohandas}, year={2020} }
    ```
‌
- https://realpython.com/pytest-python-testing/

    ```
    @misc{real python_2022, title={Effective Python Testing With Pytest}, url={https://realpython.com/pytest-python-testing/}, journal={Realpython.com}, publisher={Real Python}, author={Real Python}, year={2022}, month={Jun} }
    ```
