# Code development: pytest and pip

*Davide Gerosa (Milano-Bicocca)*

**Sources**: Michael Zingale at Stony Brook University: https://sbu-python-class.github.io

# Part 1. Share your own package 

In [1]:
import os, sys

So far, we've been writing our code all in Jupyter.  But when it comes
time to write code that we want to reuse, we want to put it into a
standalone `*.py` file.

Then we can load it on in python (or Jupyter) and use the capabilities
it provides or make it a standalone program that can be run from the
command line.

Jupyter is great for interactive explorations and sharing your workflow with others
in a self-contained way.  But if there is an operation that you do over and over,
you should put it into a separate module that you import.  That way you only need to
maintain and debug a single instance of the function, and all your workflows can reuse it.


## Editors

There are a number of popular editors for writing python source.  Some
popular ones include:

* VS Code: https://code.visualstudio.com/
* spyder: https://www.spyder-ide.org/
* emacs / vi


## Standalone module

Here's a very simply module (lets call it `hello.py`):

```python
def hello():
    print("hello")

if __name__ == "__main__":
    hello()
```

There are two ways we can use this.

* Inside of python (or jupyter), we can do:

  ```python
  import hello
  hello.hello()
  ```

* From the command line, we can do:

  ```python
  python hello.py
  ```

Additionally, on a Unix system, we can add:

```python
#!/usr/bin/env python3
```

to the top and then mark the file as executable, via:

```bash
chmod a+x hello.py
```

allowing us to execute it simply as:

```bash
./hello.py
```

Here we see how the `__name__` variable is treated by python:

* If we import our module into python, then `__name__` is set to the module name
* If we run the module from the command line, then `__name__` is set to `__main__`


## Changing module contents

If we make changes to our module file, then we need to re-import it.  This can be done as:

```python
import importlib
example = importlib.reload(example)
```


# Command line arguments

For standalone programs, we often want to have our program take
command line arguments that affect the runtime behavior of our
program.  There are a variety of mechanisms to do this in python, but
the best option is the [argparse
module](https://docs.python.org/3/library/argparse.html).

Here's an example of using `argparse` to take a variety of options:


In [2]:
# %load argparse_example.py


```python
#!/usr/bin/env python3

# to get usage: use -h
import argparse


def setup_args():

    # simple example of argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("-a", help="the -a option", action="store_true")
    parser.add_argument("-b", help="-b takes a number", type=int, default=0)
    parser.add_argument("-c", help="-c takes a string", type=str, default=None)
    parser.add_argument("--darg", help="the --darg option", action="store_true")
    parser.add_argument("--earg", help="--earg takes a string", type=str, metavar="test",
                        default="example string")

    # extra arguments (positional)
    parser.add_argument("extras", metavar="extra", type=str, nargs="*",
                        help="optional positional arguments")

    return parser.parse_args()


if __name__ == "__main__":

    args = setup_args()


    if args.a:
        print("-a set")
    print(f"-b = {args.b}")
    print(f"-c = {args.c}")
    if args.darg:
        print("--dargs set")
    print(f"--earg value = {args.earg}")

    print(" ")
    print("extra positional arguments: ")
    if len(args.extras) > 0:
        for e in args.extras:
            print(e)
```

A nice feature of `argparse` is that it automatically generates help for us.  If
we place the above code in `argparse_example.py` then we can do:

In [3]:
os.system('python argparse_example.py --help');

usage: argparse_example.py [-h] [-a] [-b B] [-c C] [--darg] [--earg test]
                           [extra ...]

positional arguments:
  extra        optional positional arguments

options:
  -h, --help   show this help message and exit
  -a           the -a option
  -b B         -b takes a number
  -c C         -c takes a string
  --darg       the --darg option
  --earg test  --earg takes a string


# Paths

How does python find modules?  It has a [search order](https://docs.python.org/3/tutorial/modules.html#the-module-search-path):

* current directory

* `PYTHONPATH` environment variable (this follows the same format as
  the shell `PATH` environment variable)

* System-wide python installation default path (usually has a
  `site-packages` directory)

We can look at the path via ``sys.path``.  On my machine I get:

In [4]:
sys.path

['/Users/dgerosa/Documents/reps/scientificcomputing_bicocca_2023/lectures',
 '/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python311.zip',
 '/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11',
 '/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib-dynload',
 '',
 '/Users/dgerosa/box/lib/python3.11/site-packages']

You can add things explicitly to my the `PYTHONPATH` shell variable.

Using `PYTHONPATH` to quickly add a module to your search path is an easy hack,
but if you are developing a library that will be used by others, it is better
to make the modules installable to the system search paths.  This is where
_packaging_ comes into play.

# Packaging



Let's look at the structure of creating an installable python package.
The python packaging system is constantly evolving, and the current recommendations
of tools is list here: https://packaging.python.org/en/latest/guides/tool-recommendations/

Eventually, you'll want people to do  `pip install myawesomepackage`. One thing at the time...


## Our example

We'll work on an example that builds on the Mandelbrot set exercise
from the matplotlib session.  Our example is hosted here:

https://github.com/dgerosa/scientificcomputing_bicocca_2023/tree/main/lectures/mymodule

On your local computer, if you have `git` installed and know how to use it, you can clone the repository. Otherwise go [here](https://github.com/dgerosa/scientificcomputing_bicocca_2023/tree/main), then "Code", and "Download ZIP".

The directory structure appears as:

```
mymodule
├── mymodule
│   ├── __init__.py
│   └── mandel.py
└── README.md
```

This is a rather common way of structuring a project:

* The top-level `mymodule` directory is not part of the python
  package, but instead is where the source control (e.g. git) begins,
  and also hosts setup files that are used for installation

* `mymodule/mymodule` is the actual python module that we will load. To make python recognize this as a module, we need an `__init__.py` file there --- it can be completely empty.
   
   
* The actual `*.py` files that make up our module are in `mymodule/mymodule`

Right now, this package does not appear in our python search path, so
the only way to load it is to work in the top-level `mymodule/`
directory (that's because the current directory is *always* in the path). Then we can do:

```python
import mymodule.mandel
```

we could also do:

```python
from mymodule.mandel import mandelbrot
```

## setuptools

The current python package recommendation are:

* Installation:

  * `pip` to install packages from PyPI
  * `conda` for disctribution cross-platform software stacks

* Packaging tools:

  * `setuptools` to create source distributions
  * `build` for binary distributions
  * `twine` to upload to PyPI

Let's look at how to use [`setuptools`](https://setuptools.pypa.io/en/latest/index.html) to package our library.  See the
packaging guidelines here:
https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/

The main thing we need to do is create a `setup.py` that describes our
package and its requirements:

Here's a first `setup.py`:

```python
from setuptools import setup, find_packages

setup(name='mymodule',
      description='test module the SciComp class',
      url='https://github.com/dgerosa',
      author='Davide Gerosa',
      author_email='davide.gerosa@unimib.it',
      license='MIT',
      packages=find_packages(),
      install_requires=['numpy', 'matplotlib'])
```


The packaging ecosystem is always evolving.  There are 2 special config files that can
help customize the package and contain the defaults for other tools: `setup.cfg` and
`pyproject.toml`.  See:

* [Configuring setuptools using `setup.cfg` files](https://setuptools.pypa.io/en/latest/userguide/declarative_config.html)

* [Configuring setuptools using `pyproject.toml` files](https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html)
```


## Installing

We can use setup in a variety of ways.  Two useful ways are:

* Install:

  `python setup.py install --user`

  This will copy the source files into your install location (likely
  `~/.local/...`) putting them into your python search path.  Then you
  can use this package from anywhere.

* Development mode (https://setuptools.pypa.io/en/latest/userguide/development_mode.html):

  `python setup.py develop --user`

  This doesn't actually install anything in your user- or site-wide
  install location, but instead it creates a special link in that
  install directory back to your actual project code.

  This allows you to continue to develop the package without needed to
  re-install each time you change the source.

  You can uninstall via:

  `pip uninstall mymodule`

The above put the package in the user-specific install location
(because of the `--user` flag).  If you leave this off, it will try to
install in the system-wide path, which might require admin privileges.


## Using our module

Once the module is installed, we can use it from any directory.  For example, if we do:

```python
import mymodule
print(mymodule.__file__)
```

it shows us where the module is installed on our system.  In my case, it is:

```
/Users/dgerosa/Documents/reps/scientificcomputing_bicocca_2023/lectures/mymodule/mymodule/__init__.py
```

Let's generate a plot:

```python
from mymodule.mandel import mandelbrot
fig = mandelbrot(128)
fig.savefig("test.png")
```

This produces the plot shown below:

![test.png](attachment:test.png)


## pip Deployment 

Making your code pip-installable is now trivial. The one thing left to do is uploading your code to https://pypi.org/ which is where pip looks for information. 

You need to create an account on pypi and generate an API token. Instructions [here](https://packaging.python.org/en/latest/tutorials/packaging-projects/). Then create the distribution bundle with 

```python setup.py sdist``` to create the distribution bundle

Pypi has a test server https://test.pypi.org/ where you cna try things out before going to the real pip repository (don't want to mess things up...). Upload with


```twine upload --repository pypitest dist/* #for the test server```

When you're confident everything works

```twine upload dist/* #for the real pip```

and you're done! Everybody will be able to pip-install and use your code. 


For a complete example from my reseach have a look at https://github.com/dgerosa/precession which is indeed pip installable. 




<div class="alert alert-block alert-warning">

Let me say here that producing good, well documented, and reusable code is *crucial* for being influential in modern research. No point understanding a beautiful piece of physics if nobody else can use it!  
    
</div>



# Tools to Make Your Life Easier

## Version control

Put your project under git version control. Just do it. 


## Code checkers

There are a number of tools that help check code for formatting and
syntax errors that are quite useful for developers.  Many projects
automatically enforce these tools on changes submitted to github.

Many editors have plugins that can automatically run these tools
as your write your code.

* [flake8](https://flake8.pycqa.org/en/latest/)

  `flake8` is a checker for [PEP 8](https://peps.python.org/pep-0008/)
  style conformance.  You can turn off checks that you don't like
  via a [`.flake8`
  file](https://flake8.pycqa.org/en/latest/user/configuration.html#configuration-locations).

* [pylint](https://pypi.org/project/pylint/)

  `pylint` is a static code analyzer.  It can find errors and also suggest improvements
  to your code.  You can [generate a configuration file](https://pylint.readthedocs.io/en/latest/user_guide/configuration/index.html)
  to customize its behavior (or add a section to `pyproject.toml`).

* [black](https://pypi.org/project/black/)

  `black` is an _uncompromising code formatted_.  It will automatically rewrite your code
  based on PEP-8 style.

* [pyupgrade](https://github.com/asottile/pyupgrade)

  `pyupgrade` will upgrade source to a later python standard, making
  use of new features where available.  For instance, you can run as:

  ```
  pyupgrade --py39-plus file.py
  ```

  to update to python 3.9 support.

* [isort](https://pycqa.github.io/isort/)

  `isort` simply sorts the module imports at the top of your modules,
  grouping the standard python ones together followed by
  package-specific ones.


# Part 2. Unit testing with pytest

Testing is an integral part of the software development process.  We want to catch
mistakes early, before the go on to affect our results.

## Types of testing

There are a lot of different types of software testing that exist.
Most commonly, for scientific codes, we hear about:

* Unit testing : Tests that a single function does what it was designed to do

* Integration testing : Tests whether the individual pieces work together as intended.
  Sometimes done one piece at a time (iteratively)

* Regression testing : Checks whether changes have changed answers

* Verification & Validation (from the science perspective)

  * Verification: are we solving the equations correctly?

  * Validation: are we solving the correct equations?

## Automating testing

The best testing is automated.  Github provides a *continuous integration* service that can
be run on pull requests.  You write a short definition (a Github workflow) that tells Github
how to run your tests and then any time there is a change, the tests are run.

## Unit testing

* When to write tests?

  * Some people advocate writing a unit test for a specification
    before you write the functions they will test

    * This is called Test-driven development (TDD):
      https://en.wikipedia.org/wiki/Test-driven_development

  * This helps you understand the interface, return values,
    side-effects, etc. of what you intend to write

* Often we already have code, so we can start by writing tests to
  cover some core functionality

  * Add new tests when you encounter a bug, precisely to ensure that
    this bug doesn't arise again

* Tests should be short

  * You want to be able to run them frequently





# pytest

`pytest` is a unit testing framework for python code.

Basic elements:

* Discoverability: it will find the tests

* Automation

* Fixtures (setup and teardown)

## Installing

You can install `pytest` for a single user as:

```
pip install pytest
```

This should put `pytest` in your search path, likely in `~/.local/bin`.

If you want to generate coverage reports, you should also install `pytest-cov`:

```
pip install pytest-cov
```

## Test discovery

Adhering to these naming conventions will ensure that your tests are automatically found:

* File names should start or end with “test”:

  * `test_example.py`
  * `example_test.py`

* For tests in a class, the class name should begin with `Test`

  * e.g., `TestExample`
  * There should be no `__init__()`

* Test method / function names should start with `test_`

  * e.g., `test_example()`

## Assertions

Tests use assertions (via python’s `assert` statement) to check behavior at runtime

* https://docs.python.org/3/reference/simple_stmts.html#assert 

* Basic usage: `assert expression`

  * Raises `AssertionError` if expression is not true

  * e.g., `assert 1 == 0` will fail with an exception

## Simple pytest example

Create a file named `test_simple.py` with the following content:

```python
def multiply(a, b):
    return a*b

def test_multiply():
    assert multiply(4, 6) == 24

def test_multiply2():
    assert multiply(5, 6) == 2
```

then we can run the tests as:

```
pytest -v
```

and we get the output:

```
============================= test session starts ==============================
platform linux -- Python 3.11.3, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/zingale/temp/pytest
plugins: anyio-3.6.2
collected 2 items                                                              

test_simple.py::test_multiply PASSED                                     [ 50%]
test_simple.py::test_multiply2 FAILED                                    [100%]

=================================== FAILURES ===================================
________________________________ test_multiply2 ________________________________

    def test_multiply2():
>       assert multiply(5, 6) == 2
E       assert 30 == 2
E        +  where 30 = multiply(5, 6)

test_simple.py:8: AssertionError
=========================== short test summary info ============================
FAILED test_simple.py::test_multiply2 - assert 30 == 2
========================= 1 failed, 1 passed in 0.04s ==========================
```

this is telling us that one of our tests has failed.


# More pytest

Unit tests sometimes require some setup to be done before the test is run.  Fixtures
provide this capability.

pytest provides `setup` and `teardown` functions/methods for tests --
see https://docs.pytest.org/en/6.2.x/fixture.html for more details

Note:  By default, pytest will capture stdout and only show it on failures.  If you want
to always show stdout, add the `-s` flag.


## Example class

It is common to use a class to organize a set of related unit tests.  This is
not a full-fledged class -- it simply helps to organize data.  In particular,
there is no constructor, `__init__()`.  See https://stackoverflow.com/questions/21430900/py-test-skips-test-class-if-constructor-is-defined

We'll look at an example with a NumPy array

* We always want the array to exist for our tests, so we'll use
  fixtures (in particular `setup_method()`) to create the array

* Using a class means that we can access the array created in setup from our class.

* We'll use numpy's own assertion functions: https://numpy.org/doc/stable/reference/routines.testing.html


Here's an example:

```python
# a test class is useful to hold data that we might want setup
# for every test.

import numpy as np
from numpy.testing import assert_array_equal

class TestClassExample:

    @classmethod
    def setup_class(cls):
        """ this is run once for each class, before any tests """
        pass

    @classmethod
    def teardown_class(cls):
        """ this is run once for each class, after all tests """
        pass

    def setup_method(self):
        """ this is run before each of the test methods """
        self.a = np.arange(24).reshape(6, 4)

    def teardown_method(self):
        """ this is run after each of the test methods """
        pass

    def test_max(self):
        assert self.a.max() == 23

    def test_flat(self):
        assert_array_equal(self.a.flat, np.arange(24))
```


Here we see the [`@classmethod` decorator](https://docs.python.org/3/library/functions.html#classmethod).
This means that the function receives the class itself as the first argument rather then an instance,
e.g., `self`.


Put this into a file called `test_class.py` and then we can run as:

```
pytest -v
```



## Modern workflow

Modern code tested is collaborative and automated. Say you found a bug in numpy and want to fix it. You can only pull request to their git repo only if your patch passes a whole suite of tests.

That's for big things, but I argue having some basic unit tests is important for every code, even your little PhD project (and well... if you end up having a pull request merged into numpy, that should go straight into your CV!!!)


## Other types of tests

Unit tests are only one form of testing - they test a function in
isolation of others.  Sometimes we need to test everything working together.
For scientific codes, regression testing is often used.  The basic workflow
is:

* Start with the project working in a way you are happy with

* Store the output of one (or more) runs as a _benchmark_.

* Each time you make changes, run the code and compare the new output
  to the stored benchmark.

  * If there are no differences, then your changes are likely good
    (but there is always the case of some feature not being tested).

  * If there are differences, then either you introduced a bug, in which
    case you should fix it, or you fixed a bug, in which case you should
    update the benchmarks.


# Exercise

For the exams work on at least two of the these three exercises

## Q1: I love pip 

- Take a piece of python code you wrote (for instance pick one of the exercises you've done for this class). 
- Turn it into a module
- Install it locally
- Deploy on pypi (only using the test-pypi server!)

<div class="alert alert-block alert-warning">
<span class="fa fa-flash"></span> VERY IMPORTANT
    
Both the pypi and test-pypi server are public on the web! Don't compromise your research by putting up something your competitors can use and steal your idea before you've published a paper on it!!! 

</div>


## Q2: My own test

- Pick a piece of python code that you like (your own PhD project, or take one of the exercises from this class). Implement a unit test and a regression test. 
- Put it under git version control, and write a github action that runs the tests at every commit.
- Edit the github options to make sure the code *cannot* be committed if the tests fail (that's a common thing for big projects with many collaborations, nobody is allowed to break the code)



## Q3: How do professionals do it?
Pick a big git repo, say [scipy](https://github.com/scipy/scipy) or [numpy](https://github.com/numpy/numpy),  and have a look at their development workflow, including their testing strategies.