(05:Testing)=
# Testing
<hr style="height:1px;border:none;color:#666;background-color:#666;" />

Testing is an important part of Python package development but one that is often neglected due to the perceived additional workload. However, the reality is quite the opposite! Introducing formal, automated testing into your workflow can have several benefits:

1. **Fewer bugs:** you’re explicitly constructing and testing your code from the viewpoint of a developer and a user;
2. **Better code structure:** writing tests forces you to structure and compartmentalise your code so that it's easier to test and understand;
3. **Easier development:** formal tests will help you and others add features to your code without breaking the tried-and-tested base functionality.

**{numref}`03:Testing-your-package`** briefly introduced testing in Python package development. This chapter now goes into more detail about how to write tests, different types of tests (unit tests, regression tests, integration tests), and code coverage.

(05:Testing-workflow)=
## Testing workflow

In general, the goal of testing is to check that your code produces the results you expect it to. You probably already conduct informal tests of your code in your current workflow. In a typical workflow, we write code, run it in a Python session to see if it's working as we expect, make changes, repeat. This is sometimes called "manual testing" or "exploratory testing" and is common in the early stages of development of your code. But when developing code you intend to package up, reuse, and potentially share with others, you'll want to test it in a more formal and reproducible way.

In Python, tests are usually written using an `assert` statement, which checks the truth of a given expression, and returns a user-defined error message if the statement is false. To demonstrate this process, imagine we want to create a function called `count_letters()` that counts the number of letters in a string. We come up with the following code as a first version of that function:

```python
def count_letters(text):
    """Count letters in a string."""
    return len(text)
```

We can write some tests for that function using the `assert` statement to check it's working as we expect it to. For example we would expect our function to calculate 5 letters in the string `"Hello"` and 10 letters in the string `"Hello world"`:

```{prompt} python >>> auto
>>> assert count_letters("Hello") == 5, "'Hello' should have 5 letters"
>>> assert count_letters("Hello world") == 10, "'Hello world' should have 10 letters"
```

If we ran the above `assert` statements, the first would pass without error, but the second would throw an error:

```console
AssertionError: 'Hello world' should have 10 letters
```

What went wrong? When we call `len()` on a string, it counts all the characters in the string, including the spaces. So, we need to go back to our `count_letters()` function and remove spaces before counting letters. One way we can do this is by using the `.replace()` method to replace spaces with an empty string (i.e., nothing):

```{code-block}
---
emphasize-lines: 3
---
def count_letters(text):
    """Count letters in a string."""
    return len(text.replace(" ", ""))
```

Now our previous `assert` statements should both pass. This process we just went through roughly followed the typical testing workflow of:

1. Write a test;
2. Write the code to be tested;
3. Test the code; 
4. Refactor code (make small changes); and,
5. Repeat.

```{figure} images/05-test-workflow.png
---
width: 80%
name: 05-test-workflow
alt: The testing workflow.
---
The testing workflow.
```

In our earlier demonstration with the `count_letters()` function, we swapped step 1 and 2; we wrote the first version of our function's code before we wrote our tests, and this is a common workflow too. However, you can see how it might have been beneficial to write the tests (or at least think about them) before writing the code; if we knew we were testing text with a space in it, we might have included that in our function in the first place. Writing your tests before your code is known as "test-driven development" and advocates of this approach suggest that it helps you better understand the code you need to write, prevent bugs, and ultimately save you time. However in practice, writing your tests first or last doesn't seem to have a significant impact on overall development time {cite:p}`fucci2016`. Regardless of when you choose to formally write your tests, all developers should at least think about the specifications of their code before they write it. What might the inputs look like? What will the output look like? Are there any [edge cases](https://en.wikipedia.org/wiki/Edge_case) to consider? Taking a moment to consider and write down these specifications will help you write code effectively and efficiently.

Ultimately, the testing workflow is all about working incrementally and iteratively. The idea is to make small changes to your code as you add features or identify bugs, test it, write more tests, repeat. Managing and executing such a workflow manually like we did above would clearly be inefficient. Instead, a test framework is typically used to help manage the testing workflow in an efficient, automated, and reproducible way. `pytest` is one of the most common test frameworks for Python packages. We used it to help test our `pycounts` package in **{numref}`03:Running-tests`**. In the rest of this chapter we'll continue to explore how `pytest` can be used to test your package and will demonstrate concepts by building on the `pycounts` package.

(05:Test-structure)=
## Test structure

`pytest` expects tests to be structured as follows:

1. Tests are defined as functions prefixed with `test_` and contain one or more statements that `assert` code produces an expected result or raises a particular error;
2. Tests are put in files of the form *`test_*.py`* or *`*_test.py`*, and are usually placed in a directory called *`tests/`* in a package's root.

Tests can be executed using the command `pytest` at the command line and pointing it to the directory your tests live in (i.e., `pytest tests/`). `pytest` will find all files of the form *`test_*.py`* or *`*_test.py`* in that directory and its sub-directories, and execute any functions with names prefixed with `test_`.

```{note}
Tests are sometimes put in the *`src/`* folder of a package and included in the distribution that users will install. This isn't common because users don't usually need (or want) to run tests for your package - they expect that you as a developer have done that and that the package they are installing is going to work - So there's no need to bloat their installation with tests. 
```

As an example, consider the structure of our `pycounts` package:

```{code-block}
---
emphasize-lines: 13-15
---
pycounts
├── .readthedocs.yml
├── CHANGELOG.md
├── CONDUCT.md
├── CONTRIBUTING.md
├── docs
│   └── ...
├── LICENSE
├── README.md
├── pyproject.toml
├── src
│   └── ...
└── tests
    ├── einstein.txt
    └── test_pycounts.py
```

The file *`einstein.txt`* is a text file we created in **{numref}`03:Writing-tests`** to use in our tests. It includes a quote from Albert Einstein:

>*"Insanity is doing the same thing over and over and expecting different results."*

The file *`test_pycounts.py`* is where the tests we want to run with `pytest` should be. That files contains the following test we wrote in **{numref}`03:Running-tests`**, using the format expected by `pytest`; a function prefixed with `test_` that includes an `assert` statement.

```python
from pycounts.pycounts import count_words
from collections import Counter

def test_count_words():
    """Test word counting from a file."""
    expected = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                        'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                        'expecting': 1, 'different': 1, 'results': 1})
    actual = count_words("tests/einstein.txt")
    assert actual == expected, "Einstein quote words counted incorrectly!"
```

To use `pytest` to run this test it should first be installed as a development dependency of your package. If using `poetry` as a packaging tool, as we do in this book, that can be done with the following command:

```{prompt} bash \$ auto
$ poetry add --dev pytest
```

```{note}
If you're following on from **Chapter 3: {ref}`03:How-to-package-a-Python`**, specifically, **{numref}`03:Running-tests`**, we already installed `pytest` as a development dependency of our `pycounts` package so running the above code won't do anything.
```

With `pytest` installed, we use the following command from our root package directory to run our test:

```{prompt} bash \$ auto
$ pytest tests/
```

```console
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/pycounts
collected 1 item                                                                                                                                   

tests/test_pycounts.py .                                                  [100%]

============================== 1 passed in 0.01s ===============================
```

The output of `pytest` provides some basic system information, along with how many tests were run and what percentage passed. If a test fails, it will output the trace-back of the error, so you can see exactly which test failed and why. In the next section, we'll go into more detail about how to write tests in `pytest` and demonstrate concepts by continuing to build on our `pycounts` package.

## Writing tests

Your test code will typically do one of the following:
1. Assert that an output matches an expected result;
2. Assert that a specific error is raised when code is used in a particular way.

There are also different ways of writing these tests:
1. Unit test: the most common type of test you will write. A unit test verifies that an independent unit of code (e.g., a Python function) is working as expected.
2. Integration test: checks that the individual units of code work together.
3. Regression test: checks that your code produces the same results as it did in the past

In this section, we'll explore and demonstrate these different test elements. We'll begin by writing unit tests but will explore intergation testing and regression texting in **{numref}``**

### Unit tests

Unit tests are the most common type of test you will write. A unit test verifies that an independent unit of code (e.g., a Python function) is working as expected. It will typically comprise:
1. Some data to test the code with (called a "*fixture*"). The fixture is typically a small or simple version of the type of data the function will typically process;
2. The *actual* result that the code produces given the fixture; and,
3. The *expected* result of the test, which is compared to the *actual* result, typically using an `assert` statement.

The `test_count_words()` function of our `pycounts` package is an example of a unit test. Recall that our `count_words()` function can be used to calculate words counts in a text file. To test it, we've created a small, sample text file called *`einstein.txt`* (our *fixture*) which contains the following quote:

>*"Insanity is doing the same thing over and over and expecting different results."*

The result of our `count_words()` function to count this fixture will be the *actual* result. The fixture is small enough that we can count the words by hand, so we also have the *expected* result (i.e., what the `count_words()` function should output when used with this fixture). In Python code, that unit test looks as follows:

```python
from pycounts.pycounts import count_words
from collections import Counter

def test_count_words():
    """Test word counting from a file."""
    expected = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                        'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                        'expecting': 1, 'different': 1, 'results': 1})
    actual = count_words("tests/einstein.txt")
    assert actual == expected, "Einstein quote words counted incorrectly!"
```

This is a unit test because it tests one function in one particular situation. 

The `assert` statement can be used with any statement that evaluates to a boolean (`True`/`False`) and should be followed by a user-defined error message providing some helpful debugging information in case the `assert` fails, as in the example above.

A `pytest` test function can include multiple `assert` statements and if any of the included `assert` functions fail, the whole test will fail. As an example, let's write a test for the `plot_words()` function of our `pycounts.plotting` module which we developed in **{numref}`03:Adding-code-with-dependencies-to-your-package`** (but currently don't have a test for) and show below:

```python
import matplotlib.pyplot as plt

def plot_words(word_counts, n=10):
    """Plot a bar chart of word counts.

    ...rest of docstring hidden...
    """
    if not isinstance(word_counts, Counter):
        raise TypeError("'word_counts' should be of type 'Counter'.")
    top_n_words = word_counts.most_common(n)
    word, count = zip(*top_n_words)
    fig = plt.bar(range(n), count)
    plt.xticks(range(n), labels=word, rotation=45)
    plt.xlabel("Word")
    plt.ylabel("Count")
    return fig
```

This function takes in a `Counter` object of word counts and outputs a `matplotlib` bar chart. To test that it's working as expected, we'll:
- Use the manually counted words from the Einstein quote as a fixture;
- Use that fixture as an input to the `plot_words()` function to create a bar plot (the actual result); and,
- `assert` that the plot is a `matplotlib` bar chart and `assert` that there are 10 bars in the bar chart as (`n=10` is the default number of bars to plot in the `plot_words()` function, as you can see above).

Here's that unit test in Python code:

```{code-block} python
---
emphasize-lines: 2-3, 9-17
---
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
import matplotlib
from collections import Counter

def test_count_words():
    # ... same as before ...

def test_plot_words():
    """Test plotting of word counts."""
    counts = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                      'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                      'expecting': 1, 'different': 1, 'results': 1})
    fig = plot_words(counts)
    assert isinstance(fig, matplotlib.container.BarContainer), "Wrong plot type"
    assert len(fig.datavalues) == 10, "Incorrect number of bars plotted"
```

Now that we've written a new test, we need to check that it is working. Running `pytest` at the command line should now show two tests were run:

```{prompt} bash \$ auto
$ pytest tests/
```

```{code-block}
---
emphasize-lines: 4, 9
---
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/pycounts
collected 2 items


tests/test_pycounts.py .                                                  [100%]

============================== 2 passed in 0.23s ===============================
```

Looks like things are working as expected!

As mentioned earlier, the `assert` statement can be used with any statement that evaluates to a boolean (`True`/`False`). However, if your package uses floating-point numbers, and you're wanting to `assert` the equality of floating-point numbers in your tests, there's one thing to watch out for. Due to the limitations of floating-point arithmetic in computers, numbers that we would expect to be equal are sometimes not. Consider the following infamous example:

```{prompt} python >>> auto
>>> assert 0.1 + 0.2 == 0.3, "Numbers are not equal!"
```

```console
AssertionError: Numbers are not equal!
```

You can read more about the nuances of floating-point arithmetic in the Python [documentation](https://docs.python.org/3/tutorial/floatingpoint.html), but the important point here is that, when working with floating-point numbers, we usually `assert` that numbers are *approximately* equal, rather than *exactly* equal. For this we can use the `pytest.approx()` function:

```{prompt} python >>> auto
>>> import pytest
>>> assert 0.1 + 0.2 == pytest.approx(0.3), "Numbers are not equal!"
```

You can control how approximate you want the equality to be by using the `abs` and `rel` arguments to specify how much absolute or relative error you want to allow respectively.

(05:Assert-that-a-specific-error-is-raised)=
### Test that a specific error is raised

Rather than `assert` that your code produces a particular output, sometimes you want to check that your code raises a particular error when used in the wrong way by a user. Consider again the `plot_words()` function of our `pycounts.plotting` module. From the docstring, we see that the function expects users to pass a `Counter` object to the function:

```{code-block} python
---
emphasize-lines: 8, 9
---
import matplotlib.pyplot as plt

def plot_words(word_counts, n=10):
    """Plot a bar chart of word counts.

    Parameters
    ----------
    word_counts : collections.Counter
        Counter object of word counts.
    n : int, optional
        Plot the top n words. By default, 10.

    ...rest of docstring hidden...
    """
    top_n_words = word_counts.most_common(n)
    word, count = zip(*top_n_words)
    fig = plt.bar(range(n), count)
    plt.xticks(range(n), labels=word, rotation=45)
    plt.xlabel("Word")
    plt.ylabel("Count")
    return fig
```

What happens if a user inputs a different object? For the sake of argument, let's consider what happens if they pass a list of words to our function:

```{prompt} python >>> auto
>>> from pycounts.pycounts import plot_words
>>> word_list = ["Pythons", "are", "non", "venomous"]
>>> plot_words(word_list)
```

```python
AttributeError: 'list' object has no attribute 'most_common'
```

This `AttributeError` message is not overly useful to our users. The problem is that our code uses a method, `.most_common()` which is specific to the `Counter` object and retrieves the top `n` counts from that object. To improve the user-experience, we might want to raise a more helpful error message to a user to tell them this if they do pass the wrong object type.

Let's modify our `plot_words()` function to check that the `word_counts` argument is a `Counter` object using the `isinstance()` function, and if it's not, `raise` a `TypeError` with a useful message. The `raise` statement terminates a program and allows you to notify users of a particular error. There are many error types to choose from and you can even create your own, as discussed in the Python [documentation](https://docs.python.org/3/library/exceptions.html). We'll use the `TypeError` here because it is used to indicate that an object is of the wrong type. Our function with this new checking code in it looks like this:

```{code-block} python
---
emphasize-lines: 2, 9, 10 
---
import matplotlib.pyplot as plt
from collections import Counter

def plot_words(word_counts, n=10):
    """Plot a bar chart of word counts.

    ...rest of docstring hidden...
    """
    if not isinstance(word_counts, Counter):
        raise TypeError("'word_counts' should be of type 'Counter'.")
    top_n_words = word_counts.most_common(n)
    word, count = zip(*top_n_words)
    fig = plt.bar(range(n), count)
    plt.xticks(range(n), labels=word, rotation=45)
    plt.xlabel("Word")
    plt.ylabel("Count")
    return fig
```

```{tip}
Other commons exceptions used in tests include:
- `AttributeError`: for when an object does not support a referenced attribute (i.e., of the form `object.attribute`).
- `ValueError`: for when an argument has the right type but an inappropriate value.
- `FileNotFoundError`: for when a specified file or directory doesn’t exist.
- `ImportError`: for when the `import` statement can't find a module.
```

We can check that our new error-handling code is working by starting a new Python session and retrying our code from before, which passed a `list` to our function:

```{prompt} python >>> auto
>>> from pycounts.pycounts import plot_words
>>> word_list = ["Pythons", "are", "non", "venomous"]
>>> plot_words(word_list)
```

```console
TypeError: 'word_counts' should be of type 'Counter'.
```

Great, our `plot_words()` function now raises a helpful `TypeError` when a user inputs the wrong type of object. How can we now test this functionality with `pytest`? We can use `pytest.raises()`. `pytest.raises()` is used as part of a `with` statement which contains the code you expect to throw a particular error. Let's add a new unit test called `test_plot_words_raises()` to our test file *`test_pycounts.py`* to demonstrate this functionality.

```{tip}
We'll write a new test function (`test_plot_words_raises()`) for checking this error, rather than adding it to our existing function (`test_plot_words()`), because unit tests should be written to check one unit of code (i.e., a function) in one particular situation.
```

```{code-block} python
---
emphasize-lines: 5, 13-16
---
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
import matplotlib
from collections import Counter
import pytest

def test_count_words():
    # ... same as before ...

def test_plot_words():
    # ... same as before ...

def test_plot_words_raises():
    """Check TypeError raised when Counter not used."""
    with pytest.raises(TypeError):
        list_object = ["Pythons", "are", "non", "venomous"]
        plot_words(list_object)
```

In the new test above, we purposefully pass the wrong object type (a list) to `plot_words()` and expect it to raise a `TypeError`. Let's check that this new test, and our existing tests, all pass by running `pytest` at the terminal. `pytest` should now find and execute three tests:

```{prompt} bash \$ auto
$ pytest tests/
```

```{code-block}
---
emphasize-lines: 4, 9
---
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/pycounts
collected 3 items


tests/test_pycounts.py .                                                  [100%]

============================== 3 passed in 0.39s ===============================
```

### Integration tests

The unit tests we've written above verify that the individual functions of our package work in isolation. But we should also test that they work correctly together. Such a test is called an "integration test" (because individual units of code are integrated into a single test).

Integration tests are structured the same way as unit tests. We use a fixture to produce an actual result with our code, which is then compared to an expected result. As an example of an integration test we'll:
- Use the "Einstein quote" text file, *`einstein.txt`*, as a fixture;
- Count the words in the quote using the `count_words()` functions;
- Plot the word counts using the `plot_words()` function; and,
- `assert` that a `matplotlib` bar chart was created, that the chart has 10 bars, and that the maximum word count in the chart is 2 (no word appears more than twice in the quote in the *`einstein.txt`* file).

The overall aim of this test is to check that the two core functions of our package `count_words()` and `plot_words()` work together (at least to our test specifications). It can be written as follows:

```{code-block} python
---
emphasize-lines: 16
---
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
import matplotlib
from collections import Counter
import pytest

def test_count_words():
    # ... same as before ...

def test_plot_words():
    # ... same as before ...

def test_plot_words_raises():
    # ... same as before ...

def test_integration():
    """Test count_words() and plot_words() workflow."""
    counts = count_words("tests/einstein.txt")
    fig = plot_words(counts)
    assert isinstance(fig, matplotlib.container.BarContainer), "Wrong plot type"
    assert len(fig.datavalues) == 10, "Incorrect number of bars plotted"
    assert max(fig.datavalues) == 2, "Highest word count should be 2"
```

### Regression tests

In most other cases, you can get away with only having unit and integration tests.

- Regression testing is about testing that your code produces consistent results as opposed to expected results. 
- For example, we don't really know exaclty how many words are in flatland.txt, it woul dbe next to impossible to count them all by hand. What we can do instead is find out how our code counts it now, write a test for that, and then as we make changes in the future, we can observe to see if this changes for some reason (either for better or for worse)

### How many tests should you write

Now that you know how to write tests, how many should you actually write? There's no single answer to this question. In general, you want your tests to validate the core functionality of your program. Combining unit tests and integration tests can be helpful. Code coverage can help here, but even 100% coverage doesn't guarantee your code is perfect, only that it passes the specific tests you wrote. It might be near impossible to write tests for every single use-case of your package (you'd be amazed at the weird and wonderful ways users can find to unwittingly break your code!). That's why testing is an iterative procedure, as we discussed in **{numref}`05:Testing-workflow`**; as you refactor and add to your code, as users find ways to use your function that you didn't expect, or it produces results you didn;t account for, write new tests.

## Advanced testing methods

As the complexity and number of tests you write increases, it can be helpful to streamline and organize your tests in a more efficient and accessible manner. Fixtures and parameterizations in `pytest` are two useful concepts to know about. As we'll discuss below, fixtures help define data that can be used across your tests, and parameterizations allow you to run the same test multiple times but with different inputs/outputs.

### Fixtures

In the previous section we used fixtures in our tests to provide context, e.g., the data to use for testing. However, you may have noticed some repetition of fixtures throughout the test suite we've written for `pycounts` so far; we define a `Counter` object from a manually defined dictionary of the words in the Einstein quote multiple times. 

```{code-block} python
---
emphasize-lines: 9-11, 17-19
---
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
from collections import Counter
import matplotlib
import pytest

def test_count_words():
    """Test word counting from a file."""
    expected = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                        'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                        'expecting': 1, 'different': 1, 'results': 1})
    actual = count_words("tests/einstein.txt")
    assert actual == expected, "Einstein quote words counted incorrectly!"

def test_plot_words():
    """Test plotting of word counts."""
    counts = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                      'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                      'expecting': 1, 'different': 1, 'results': 1})
    fig = plot_words(counts)
    assert isinstance(fig, matplotlib.container.BarContainer), "Wrong plot type"
    assert len(fig.datavalues) == 10, "Incorrect number of bars plotted"

def test_plot_words_raises():
    # ... hidden ...

def test_integration():
    # ... hidden ...
```

This is inefficient and violates the "don't repeat yourself" (DRY) principle of software development. Fortunately, there's a solution. In `pytest`, fixtures can be defined as functions that can be reused across your test suite. In our case, we could create a fixture that defines the our manually counted "Einstein quote" `Counter` object, and make it available to any test that wants to use it.

It's easiest to see the utility of a fixture by example. Fixtures can be created in `pytest` using the `@pytest.fixture` decorator. A decorator in Python is defined using the `@` symbol and immediately precedes a function definition. Decorators add functionality to the function they are "decorating"; understanding them isn't necessary to use them here, but for those interested in learning more, check out this [Primer on Python Decorators](https://realpython.com/primer-on-python-decorators).

In the code below we define a function called `einstein_counts()` and decorate it with the `@pytest.fixture` decorator. This fixture returns the manually counted words in the Einstein quote as a `Counter` object. To use it in our tests, we can pass it as an argument to the test function, just like you would usually use a function argument. We'll use this fixture in both the `test_count_words()` and `test_plot_words()` functions:

```{code-block} python
---
emphasize-lines: 7-8, 13, 15, 19, 21
---
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
from collections import Counter
import matplotlib
import pytest

@pytest.fixture
def einstein_counts():
    return Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                    'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                    'expecting': 1, 'different': 1, 'results': 1})

def test_count_words(einstein_counts):
    """Test word counting from a file."""
    expected = einstein_counts
    actual = count_words("tests/einstein.txt")
    assert actual == expected, "Einstein quote words counted incorrectly!"

def test_plot_words(einstein_counts):
    """Test plotting of word counts."""
    fig = plot_words(einstein_counts)
    assert isinstance(fig, matplotlib.container.BarContainer)
    assert len(fig.datavalues) == 10
    with pytest.raises(TypeError):
        plot_words(["Pythons", "are", "non", "venomous"])
        
def test_plot_words_raises():
    # ... same as before ...
```

We now have a way of defining a fixture once, but using it in multiple tests. However, at this point you might wonder why we used the `@pytest.fixture` decorator at all, why not just define a variable as normal at the top of the script like this:

```python
from pycounts.pycounts import count_words
from pycounts.plotting import plot_words
from collections import Counter
import matplotlib
import pytest

einstein_counts = Counter({'over': 2, 'and': 2, 'insanity': 1, 'is': 1,
                           'doing': 1, 'the': 1, 'same': 1, 'thing': 1,
                           'expecting': 1, 'different': 1, 'results': 1})

def test_count_words(einstein_counter):
    """Test word counting from a file."""
    expected = einstein_counter
    # ... rest of file hidden ...
```

The short answer is that fixtures provide far more functionality and reliability than manually defined variables. For example, each time you use a `pytest` fixture, it triggers the fixture function, meaning that each test gets a fresh copy of the data; you don't have to worry about accidentally mutating or deleting your fixture during a test. However, you can also control this behaviour; should the fixture be executed once per call, once per module, or once per session? This can be helpful if the fixture is large or time-consuming to create. Finally, we've only explored the use of fixtures as data for a test, but fixtures can also be used to set up the environment for a test. For example, the directory structure a test should run in, or the environment variables it should have access to. `pytest` fixtures can help you easily set up these kinds of contexts, as you can read more about in the `pytest` [documentation](https://pytest.readthedocs.io/en/latest/fixture.html).

### Parameterizations

Parameterizations can be useful for running a test multiple times using different arguments. For example, recall in **{numref}`05:Assert-that-a-specific-error-is-raised`** that we added some code to `pycounts`'s `plot_words()` function that raises a `TypeError` if a user inputs an object other than a `Counter` object to the function. We wrote a test for that new functionality as follows:

```{code-block}
# ... rest of file hidden ...

def test_plot_words_raises():
    """Check TypeError raised when Counter not used."""
    with pytest.raises(TypeError):
        list_object = ["Pythons", "are", "non", "venomous"]
        plot_words(list_object)
        
# ... rest of file hidden ...
```

That test only tests the error is raised if a `list` object is passed, but we should also test what happens if other objects are passed too, such as numbers or strings. Rather than writing new tests for each object we want to try, we can parameterize this test with all the different data we want to try and `pytest` will run the test for each piece of data.

Parameterizations can be created in `pytest` using the `@pytest.mark.parametrize(argnames, argvalues)` decorator. `argnames` represent the names of test variable(s) you want to use in your function (you can use any name you want), and `argvalues` is a list of the values those test variable(s) will take. In the code example below, we create a test variable named `obj` which can take three values; a float (`3.141`), a string (`"test.txt"`), or a list of strings (`["list", "of", "words"]`). `pytest` will run our test three times, once for each value that `obj` can take.


```{code-block} python
---
emphasize-lines: 3-10
---
# ... same as before ...

@pytest.mark.parametrize(
    "obj",
    [
        3.141,
        "test.txt",
        ["list", "of", "words"]
    ]
)
def test_plot_words_error(obj):
    with pytest.raises(TypeError):
        plot_words(obj)
```

If we ran this test module with `pytest`, it would run `test_plot_words_raises()` three times; once for each of the three objects we parameterized it with. To show this explicitly, we can add the `--verbose` flag to our `pytest` command:

```{prompt} bash \$ auto
$ pytest tests/ --verbose
```

```{code-block}
---
emphasize-lines: 8, 9, 10
---
============================= test session starts =================================
platform darwin -- Python 3.9.2, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/GitHub/py-pkgs/pycounts
collected 6 items                                                                                

tests/test_pycounts.py::test_count_words PASSED                               [ 16%]
tests/test_pycounts.py::test_plot_words PASSED                                [ 33%]
tests/test_pycounts.py::test_plot_words_error[3.141] PASSED                   [ 50%]
tests/test_pycounts.py::test_plot_words_error[test.txt] PASSED                [ 66%]
tests/test_pycounts.py::test_plot_words_error[["list", "of", "words"]] PASSED [ 83%]
tests/test_pycounts.py::test_integration PASSED                               [100%]

============================== 6 passed in 0.52s ===================================
```

Often you'll want to run a test on a function that's output depends on the input. As an example, consider the function `is_even()` below:

```python
def is_even(n):
    """Check if n is even."""
    if n % 2 == 0:
        return True
    else:
        return False
```

To parameterize this test with different input/output pairs, we use the same syntax as before with `@pytest.mark.parametrize()` except we comma-separate the test arguments in a string (`"n, result"`) and group the pairs of values we want those arguments to take in a tuple (e.g., `(2, True)`, `(3, False)`, etc.). In the example below, we'll purposefully add a wrong input/output pair (`(4, False)`) to show what the output of `pytest` looks like in the case of a failed parameterization:

```python
@pytest.mark.parametrize(
    "n, result",
    [
        (2, True),
        (3, False),
        (4, False)  # this last pair is purposefully wrong so we can
                    # show an example of the pytest error message
    ]
)
def testis_even(n, result):
    assert is_even(n) == result
```

The above test would run successfully for the first two parameterized input/output pairs but would fail for the last one with the following helpful error message that points out exactly which parameterization failed:

```{code-block}
---
emphasize-lines: 4, 19
---
================================== FAILURES ====================================
____________________________ testis_even[4-False] ______________________________

n = 4, result = False

    @pytest.mark.parametrize(
        "n, result",
        [
            (2, True),
            (3, False),
            (4, False)
        ]
    )
    def testis_even(n, result):
>       assert is_even(n) == result

tests/test_example.py:13: AssertionError
=========================== short test summary info ============================
FAILED tests/test_example.py::testis_even[4-False] - assert True == False
```

You can read more about parameterizations in the `pytest` [documentation](https://docs.pytest.org/en/6.2.x/parametrize.html).

(05:Code-coverage)=
## Code coverage

Regardless of the exact method of calculating coverage, the point is that there are different ways to evaluate the "coverage" of your tests which can provide useful insights about how your code is written and how it might fail. A key takeaway is that a 100% score in one metric like line coverage, doesn't mean your package code, of tests are perfect! Line and branch coverage are the two most popular methods of code coverage and will be acceptable for the large majority of Python packagers, however other methods exist that might prove useful in different situations, such as condition coverage, function coverage, mutation coverage, etc.

### Line coverage

How much of your package's source code your tests actually run is called "code coverage". There are several metrics to measure code coverage. The simplest and most intuitive is line coverage, which is the proportion of lines of your package's code that are executed by your tests:

$$
  \text{coverage} = \frac{\text{lines covered}}{\text{total lines}} * 100\%
$$

Consider the following hypothetical code, consisting of 8 lines (not including the function definition line `def ():`):

```python
def lines(x):
    if x > 0:                        # Line 1
        print("x above threshold!")  # Line 2
        print("Running analysis.")   # Line 3
        y = round(x)                 # Line 4
        z = y ** 2                   # Line 5
    else:                            # Line 6
        z = abs(x)                   # Line 7
    return z                         # Line 8
```

Imagine we write the following unit test for that code. This unit test uses `x=10.25` as a test fixture (the expected result for that input is 100):

```python
def test_lines():
    assert lines(x=10.25) == 100
```

That test only covers the condition `x > 0` and hence will only execute lines 1 - 5 and line 8 of our `lines()` function; a total of 6, of 8, possibel lines. The coverage would therefore be:

$$
  \text{coverage} = \frac{\text{6}}{\text{8}} * 100\% = 75\%
$$

Line coverage is simple and intuitive to understand, and many developers use it as a measure of how much of their codebase is covered by their tests. But you can see how line coverage can potentially be misleading. Our `lines()` function has two possible paths and outputs conditioned on the `if` statement and dependent on the value of `x` passed to the function. These two possible code paths are called "branches" and they might be equally important to our package but our line coverage is heavily dependent on which branches we actually test. If our test passed a value `x <= 0`, 

```python
def test_lines():
    assert lines(x=-5) == 5
```

then the test would only have covered line 1, and lines 6-8, resulting in a coverage of 50%. One way we can equally weight the different branches of our code is to use "branch coverage", which we'll discuss in the next section.

### Branch coverage

In contrast to line coverage, branch coverage evaluates how many branches in your code are executed by tests, where a branch is a possible execution path the code can take, usually in the form of an `if` statement.

```python
def lines(x):
    # Branch 1
    if x > 0:
        print("x above threshold!")
        print("Running analysis.")
        y = round(x)
        z = y ** 2
    # Branch 2
    else:
        z = abs(x)
    return z
```

$$
  \text{coverage} = \frac{\text{branches covered}}{\text{total branches}} * 100\%
$$

Regardless of whether we run test_lines_1() ot test_lines_2() we would get 50% branch coverage, because each test tests one branch.

### Calculating coverage

We can calculate code coverage with `pytest` using the extension `pytest-cov`. For a `poetry`-managed package, `pytest-cov` can be installed as a development dependency with the following command:

```{prompt} bash \$ auto
$ poetry add --dev pytest-cov
```

```{tip}
`pytest-cov` is an implementation of the `coverage` It can sometimes be helpful to visit the latter's [documentation](https://coverage.readthedocs.io/en/latest/) if you're looking for more information about how `pytest-cov` calculates coverage.
```

Code coverage can then be calculated using the `pytest` command with the argument `--cov=<pkg-name>` specified. For example, the following command determines the coverage our tests have of our `pycounts` package:

```{prompt} bash \$ auto
$ pytest tests/ --cov=pycounts
```

```console
============================= test session starts ==============================

---------- coverage: platform darwin, python 3.9.6-final-0 -----------
Name                            Stmts   Miss  Cover
---------------------------------------------------
src/pycounts/__init__.py            4      0   100%
src/pycounts/data/__init__.py       0      0   100%
src/pycounts/datasets.py            5      5     0%
src/pycounts/plotting.py           12      0   100%
src/pycounts/pycounts.py           16      0   100%
---------------------------------------------------
TOTAL                              37      5    86%

============================== 5 passed in 0.46s ===============================
```

The output summarizes the coverage of the individual modules in our `pycounts` package. By default, `pytest-cov` calculates line coverage. `Stmts` is how many lines are in a module, `Miss` is how many lines were not executed by tests, and `Cover` is the percentage of lines executed by your tests.

We can calculate branch coverage by specifying the argument `--cov-branch`:

```{prompt} bash \$ auto
$ pytest --cov=pycounts --cov-branch
```

```console
============================= test session starts ==============================

---------- coverage: platform darwin, python 3.9.6-final-0 -----------
Name                            Stmts   Miss Branch BrPart  Cover
-----------------------------------------------------------------
src/pycounts/__init__.py            4      0      0      0   100%
src/pycounts/data/__init__.py       0      0      0      0   100%
src/pycounts/datasets.py            5      5      0      0     0%
src/pycounts/plotting.py           12      0      2      0   100%
src/pycounts/pycounts.py           16      0      2      0   100%
-----------------------------------------------------------------
TOTAL                              37      5      4      0    88%

============================== 5 passed in 0.46s ===============================
```

In this output `Branch` is the number of branches in the module, and `BrPart` is the number of branches executed by tests. "Branch coverage" in `pytest-cov` is actually calculated using a mix of branch and line coverage, which can be useful to get the best of both:

$$
  \text{coverage} = \frac{\text{lines covered} + \text{branches covered}}{\text{total lines} + \text{total lines}} * 100\%
$$

(05:Types-of-code-coverage)=
### Coverage reports

As we've seen, `pytest --cov` provides a helpful high-level summary of our test coverage at the command line. But if we want to see a more detailed output we can generate a useful HTML report using the argument `--cov-report html` as follows:

```{prompt} bash \$ auto
$ pytest --cov=pycounts --cov-report html
```

The report will be available at *`htmlcov/index.html`* (relative to your working directory) and will look as below:

```{figure} images/test-report-1.png
---
width: 100%
name: 05-test-report-1
alt: HTML test report.
---
HTML test report.
```

We can click on elements of the report, like the *`datasets.py`* module, to see exactly what lines/branches tests are hitting/missing:

```{figure} images/test-report-2.png
---
width: 100%
name: 05-test-report-2
alt: Detailed view of the datasets module in the HTML report.
---
Detailed view of the datasets module in the HTML report.
```

(05:Version-control)=
## Version control

We've now updated our *`test_pycounts.py`* file to include tests for the *`datasets.py`* module and we've added `pytest-cov` as a development dependency (which is recorded in *`pyproject.toml`* and *`poetry.lock`*). Let's commit these changes to local version control, and then push all the changes we've made in this chapter to our remote repository:

```{prompt} bash \$ auto
$ git add poetry.lock pyproject.toml
$ git commit -m "build: add pytest-cov as dev dependency"
$ git add tests/test_pycounts.py
$ git commit -m "test: add tests to cover datasets module"
$ git push
```