(03:How-to-package-a-Python)=
# How to package a Python
<hr style="height:1px;border:none;color:#666;background-color:#666;" />

```{attention}
We are currently making awesome improvements to this chapter and the [Cookiecutter template](https://github.com/UBC-MDS/cookiecutter-ubc-mds) it uses. If you find an inconsistency in the text, please let us know by [making an issue](https://github.com/UBC-MDS/py-pkgs/issues) in the GitHub repository. 
```

To start this book, we will first develop an entire Python package from beginning to end. The aim of this chapter is to give the reader a simple and high level overview of the key steps involved in developing a Python package and what the final product we're building actually looks like. Later chapters will then explore each of these key steps in more detail. This chapter was inspired by [The Whole Game chapter](https://r-pkgs.org/whole-game.html) of the [R packages book](https://r-pkgs.org/) written by Hadley Wickham and Jenny Bryan.

## *partypy*: simulate attendance at your party!

The example package we are going to create in this book will help us simulate guest attendance at an event. Have you ever thrown a party, had a wedding, planned a conference, or any other kind of event and wondered how many of the invited guess will actually show up? We can assign a "probability of attendance" to each invited guest and then run simulations of our event to estimate how many guests will attend.

We'll explore this concept more in code a later in the chapter, but for now, here's an example of what we're trying to come up with:

## Package structure

The first thing we need to do to develop our `partypy` package is create an appropriate directory structure. Without getting too technical, a Python package is just a particular file and directory structure, consisting of one or more modules. A module is simply a file with a *.py* extension that contains Python definitions and statements you wish to reuse; such as functions, classes, variables or executable statements. We'll discuss modules and package structure more in **Chapter 4: {ref}`04:Package-structure-and-state`**. For now, we will use the Python package `cookiecutter` (which you installed back in **Chapter 2: {ref}`02:System-setup`**) to quickly create our package structure for us.

The `cookiecutter` package is a tool for populating a file and directory structure from a pre-made template. We have developed our own [`cookiecutter` template](https://github.com/UBC-MDS/cookiecutter-ubc-mds) for creating Python packages to complement this book. To use the `cookiecutter` template to set up the structure of our Python package, open up a terminal, change into the directory where you want your package to live and run the line of code below:

```{prompt} bash
cookiecutter https://github.com/UBC-MDS/cookiecutter-ubc-mds.git
```

You will be prompted to provide information that will help customize the project. Below is an example of how to respond to the `cookiecutter` prompts (default values for each attribute are shown in square brackets and hitting enter without entering any text will accept the default attribute value). In this tutorial we will be calling our package `partypy`, however, we will eventually be publishing our package to Python's main package index [PyPI](https://pypi.org/). Package names on PyPI must be unique. As a result, if you plan to follow along with this tutorial you should choose a unique name for your package. Something like `partypy_[your intials]` might be appropriate, but you can always check if a particular name is already taken by visiting PyPI and searching for that name.

```console
author_name [Monty Python]: Tomas Beuzen
github_username [mpython]: TomasBeuzen
project_name [My Python package]: partypy
project_slug [pypkgs]: 
project_short_description [A package for doing great things!]: Simulate attendance at your party!
version [0.1.0]: 
python_version [3.9]: 
Select open_source_license:
1 - MIT
2 - Apache License 2.0
3 - GNU General Public License v3.0
4 - Creative Commons Attribution 4.0
5 - None
Choose from 1, 2, 3, 4, 5 [1]: 
Select include_github_actions:
1 - no
2 - build
3 - build+deploy
Choose from 1, 2, 3 [1]:
```

```{attention}
In the example above we chose not to include any GitHub Actions files in our initial directory structure. GitHub Actions can help automate the building, testing and deployment of your Python package. We'll explore these topics in more detail in **Chapter 8: {ref}`08:Continuous-integration-and-deployment`**.
```

After responding to the `cookiecutter` prompts, we now have a new directory called `partypy`, with the following structure:

```md
partypy
├── .gitignore
├── .readthedocs.yml
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
│   ├── conduct.rst
│   ├── conf.py
│   ├── contributing.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── make.bat
│   ├── Makefile
│   └── usage.ipynb
├── partypy
│   ├── __init__.py
│   └── partypy.py
├── LICENSE
├── pyproject.toml
├── README.md
└── tests
    ├── __init__.py
    └── test_partypy.py
```

This simple step has given us a boilerplate file and directory structure suitable for building a Python package. While there are quite a few files in our boilerplate, at this point we only need to worry about a few of these to get a working package together. Specifically, we'll be working on:

1. the `pyproject.toml` file that defines our project's metadata and dependencies and how it will eventually be built and distributed;
2. the file where we will write the Python functions that our package will distribute (`partypy/partypy.py`);
3. the file where we will write tests to ensure that our package's functions work as we expect (`tests/test_partypy.py`); and,
4. the directory where we will write our documentation (`docs/`).


Later chapters will focus on the other components of the boilerplate, which can be used to refine your package and packaging process with, for example, quality documentation, extensive testing, continuous integration, version bumping, continuous deployment, etc.

## Putting your project under version control

Before continuing to develop our package it is generally good practice to put your projects under local and remote version control, to better track changes to the project over time and to facilitate collaboration. The tools we recommend using for this are Git & GitHub (which we set up in **Chapter 2: {ref}`02:System-setup`**). 

```{note}
For this book, we assume readers have [basic Git skills](https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository).
```

### Set up local version control

To set up local version control from a terminal, enter the root `partypy` directory, and initialize the project as a repository to be tracked by Git using:

```{prompt} bash
cd partypy
git init
```

```console
Initialized empty Git repository in /Users/tbeuzen/partypy/.git/
```

Next, we need to tell Git which files to track (which will be all of them at this point) and commit these changes locally:

```{prompt} bash
git add .
git commit -m "initial package setup"
```

```console
[master (root-commit) 8b4edcb] initial package setup
 19 files changed, 722 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 .readthedocs.yml
 create mode 100755 CONDUCT.rst
 create mode 100755 CONTRIBUTING.rst
 ...
 create mode 100644 partypy/__init__.py
 create mode 100644 partypy/partypy.py
 create mode 100644 pyproject.toml
 create mode 100644 tests/__init__.py
 create mode 100644 tests/test_partypy.py
```

### Set up remote version control

Now that we have set up our local version control, let's create a repository on [GitHub.com](https://github.com/) and set that as the remote version control home for this project:

```{figure} images/set-up-github-1.png
---
width: 100%
name: 03-set-up-github-1
alt: Creating a new repository in GitHub.
---
Creating a new repository in GitHub.
```

To follow along with this tutorial, select the following options when setting up your GitHub repository: 

1. Give the GitHub.com repository the same name as your Python Poetry project's name;
2. Make the GitHub.com repository public; and,
3. **Do not** initialize the GitHub.com repository with a README file.

```{figure} images/set-up-github-2.png
---
width: 100%
name: 03-github-2
alt: Setting up a new repository in GitHub.
---
Setting up a new repository in GitHub.
```

Next, copy the remote link to your repository and use the following commands to set the remote address locally, and push your project to GitHub.com:

```{prompt} bash
git remote add origin git@github.com:TomasBeuzen/partypy.git
git branch -M main
git push -u origin main
```

```console
Enumerating objects: 23, done.
Counting objects: 100% (23/23), done.
Delta compression using up to 8 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (23/23), 9.72 KiB | 2.43 MiB/s, done.
Total 23 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:TomasBeuzen/partypy.git
 * [new branch]      main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.
```

```{note}
The example above uses SSH authentication with GitHub. SSH is useful for connecting to GitHub without having to supply your username and password every time. If you're interested in setting up SSH, take a look at the [GitHub documentation](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh). If you don't have SSH authentication set up, HTTPS authentication works as well and would require the use of this url in place of the one shown above to set the remote: `https://github.com/TomasBeuzen/pypkgs.git`. 
```

## Creating a virtual environment

Before we get started writing the Python code for our package, we should set up a virtual environment for our project. Recall that a virtual environment will help isolate our package and its dependencies from other software installed on our computer to avoid breaking things. There are several options available when it comes to creating and managing virtual environments, and `poetry` can even take care of this for you, but we've found it simple to use `conda` to manage our virtual environments.

To use `conda` to create and activate a new virtual environment called `partypy` that includes Python 3.9, run the following in your terminal:

```{prompt} bash
conda create --name partypy python=3.9 -y
```

```console
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/miniconda3/envs/partypy

  added / updated specs:
    - python=3.9


The following NEW packages will be INSTALLED:

  ca-certificates    conda-forge/osx-64::ca-certificates-2020.12.5-h033912b_0
  certifi            conda-forge/osx-64::certifi-2020.12.5-py39h6e9494a_1
  libcxx             conda-forge/osx-64::libcxx-11.1.0-habf9029_0
  libffi             conda-forge/osx-64::libffi-3.3-h046ec9c_2
  ncurses            conda-forge/osx-64::ncurses-6.2-h2e338ed_4
  openssl            conda-forge/osx-64::openssl-1.1.1j-hbcf498f_0
  pip                conda-forge/noarch::pip-21.0.1-pyhd8ed1ab_0
  python             conda-forge/osx-64::python-3.9.2-h2502468_0_cpython
  python_abi         conda-forge/osx-64::python_abi-3.9-1_cp39
  readline           conda-forge/osx-64::readline-8.0-h0678c8f_2
  setuptools         conda-forge/osx-64::setuptools-49.6.0-py39h6e9494a_3
  sqlite             conda-forge/osx-64::sqlite-3.34.0-h17101e1_0
  tk                 conda-forge/osx-64::tk-8.6.10-h0419947_1
  tzdata             conda-forge/noarch::tzdata-2021a-he74cb21_0
  wheel              conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0
  xz                 conda-forge/osx-64::xz-5.2.5-haf1e3a3_1
  zlib               conda-forge/osx-64::zlib-1.2.11-h7795811_1010


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate partypy
#
# To deactivate an active environment, use
#
#     $ conda deactivate
```

Then activate the environment:

```{prompt} bash
conda activate partypy
```

Anytime you wish to work on your package, you should activate this environment using the command above.

## Adding dependencies

We are going to leverage the `numpy` and `pandas` packages to build our `partypy` package. Thus, before we get started we need to install these dependencies and record them in a useful place so that when we publish our packaged code this important information will be shipped along with it. We will use the command `poetry add` to add dependencies to our package. This command will update the `[tool.poetry.dependencies]` section of the `pyproject.toml` file which currently only lists Python as a project dependency:

```toml
[tool.poetry.dependencies]
python = "^3.9"
```

Let's add `numpy` and `pandas` as dependencies now by running the following in the terminal:

```{prompt} bash
poetry add numpy pandas
```

```console
Using version ^1.20.1 for numpy
Using version ^1.2.3 for pandas

Updating dependencies
Resolving dependencies... (0.2s)

Writing lock file

Package operations: 5 installs, 0 updates, 0 removals

  • Installing six (1.15.0)
  • Installing numpy (1.20.1)
  • Installing python-dateutil (2.8.1)
  • Installing pytz (2021.1)
  • Installing pandas (1.2.3)
```

Now if we view our `pyproject.toml` file we see that `numpy` and `pandas` are listed as a dependencies:

```toml
[tool.poetry.dependencies]
python = "^3.9"
numpy = "^1.20.1"
pandas = "^1.2.3"
```

Running `poetry add` actually changed two files, `pyproject.toml` (which we showed above) and `poetry.lock` (a record of all the packages and exact versions of them that `poetry` downloaded for this project). These changes are important for our package, so let's commit them to version control:

```{prompt} bash
git add pyproject.toml poetry.lock
git commit -m "add numpy and pandas as dependencies"
```

```{note}
For readers who have used `requirements.txt` before with `pip` or `environment.yaml` with `conda`, you can think of `poetry.lock` as the `poetry` equivalent of those files.
```

## Your first package code

We're now ready to write some code for our package. Recall that the package we want to create will simulate guest attendance at a party using simulations. The core idea is to assign a "probability of attendance" to each guest and then simulate their attendance as a Bernoulli random variable. In layman's terms, that means modelling each guest's attendance by flipping a coin with two sides, "won't attend" and "attend", but we can specify the probability of the coin landing on "attend". 

A Bernoulli random variable is the same as a Binomial random variable with a single trial, so we can run a Bernoulli simulation using NumPy's `binomial()` function, with the argument `n=1`. Imagine we have a guest that we believe will attend our party with a probability of 0.9 (90%). We can simulate the attendance of that guest by first opening up an interactive Python session:

```{prompt} bash
python
```

```console
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:20) 
[Clang 11.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
```

Then running the following code:

In [3]:
import numpy as np

np.random.binomial(n=1, p=0.9)

1

Using the `size` argument of the `binomial()` function, we can repeat this simulation as many times as we want, let's run it 10 times:

In [28]:
simulations = 10
probability = 0.9
results = np.random.binomial(n=1, p=probability, size=simulations)
results

array([1, 1, 0, 1, 1, 1, 1, 0, 1, 1])

Now imagine we have three guests, that we believe will attend our party with probabilities 0.3, 0.5, 0.9, we can simulate each of their attendance in 10 simulations easily:

In [29]:
probability = [0.3, 0.5, 0.9]
results = np.random.binomial(n=1, p=probability, size=(simulations, len(probability)))
results

array([[1, 1, 1],
       [1, 1, 1],
       [0, 0, 1],
       [0, 1, 1],
       [0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       [1, 1, 1],
       [0, 0, 1],
       [1, 1, 1]])

The above array represents 10 simulations of a three-guest party. We want to know how many guests attended each simulated party, so we should take the sum of each simulation:

In [30]:
results.sum(axis=1)

array([3, 3, 1, 2, 1, 1, 1, 3, 1, 3])

Finally, it would be nice to display this information in a `pandas` dataframe:

In [35]:
(pd.DataFrame({"Total guests": results.sum(axis=1),
               "Simulation": np.arange(simulations) + 1})
   .set_index("Simulation")
)

Unnamed: 0_level_0,Total guests
Simulation,Unnamed: 1_level_1
1,3
2,3
3,1
4,2
5,1
6,1
7,1
8,3
9,1
10,3


Let's turn this simulation code into a function called `simulate()`:

```{note}
This book assumes you know how to write and document functions in Python. To learn more about this see [Think Python, Chapter 3: Functions](http://greenteapress.com/thinkpython/html/thinkpython004.html) by Allen Downey.
```

In [36]:
def simulate_party(p, simulations: int = 5000):
    """Simulate guest attendance at a party.

    The attendance of each guest is treated as a Bernoulli random variable
    with probability of attendance `p`. The total number of attending guests
    is summed up for each `simulations`.

    Parameters
    ----------
    p : float or array_like of floats
        Probability of guest attendance, >= 0 and <=1.
    simulations : int, optional
        Number of simulations to run.

    Returns
    -------
    pandas.DataFrame
        DataFrame with total number of guests per simulation. 

    Examples
    --------
    >>> simulate([0.1, 0.5, 0.9], simulations=5)
       guests
    1       1
    2       2
    3       2
    4       1
    5       1
    """
    result = np.random.binomial(n=1, p=p, size=(simulations, len(p))).sum(axis=1)
    return pd.DataFrame(
        {"Total guests": result, "Simulation": np.arange(simulations) + 1}
    ).set_index("Simulation")

In [37]:
results = simulate_party(p=[0.3, 0.5, 0.9], simulations = 10)
results

Unnamed: 0_level_0,Total guests
Simulation,Unnamed: 1_level_1
1,3
2,2
3,2
4,2
5,1
6,1
7,2
8,2
9,2
10,2


https://github.com/scikit-learn/scikit-learn/blob/95119c13af77c76e150b753485c662b7c52a41a2/sklearn/utils/validation.py#L869

Now we have a working function, where should we save it if we want it to be a part of our `partypy` package? Let's review the structure of our Python project:

```md
partypy
├── .gitignore
├── .readthedocs.yml
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
│   ├── conduct.rst
│   ├── conf.py
│   ├── contributing.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── make.bat
│   ├── Makefile
│   └── usage.ipynb
├── partypy
│   ├── __init__.py
│   └── partypy.py
├── LICENSE
├── pyproject.toml
├── README.md
└── tests
    ├── __init__.py
    └── test_partypy.py
```

All the code that we would like the user to run as part of our package should live inside the `partypy` directory. For a relatively small package with just a few functions, we would house them inside a single python module (i.e., a `.py` file). Our template project directory structure already created and named such a module for us: `partypy/partypy.py`. Let's save our function there. Because our function depends on `numpy` and `pandas`, we should import them at the top of the file. Here's what `pypkgs.py` should now look like:

```python
import numpy as np
import pandas as pd


def simulate():

```

## Test drive your package code

The whole point of creating a package is so that we can easily reuse our code without having to re-execute every time we start a Python session. To test drive our minimalist package we can try installing it locally using `poetry install` from the root package directory:

```{prompt} bash
poetry install
```

```console
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: partypy (0.1.0)
```

```{note}
Be sure to run the above command while in your virtual environment. Recall that you can enter you virtual environment by running `conda activate partypy` in the terminal.
```

Now, inside the root project directory we can open an interactive Python session (by typing `python` at the command line) and import our `partypy` module which contains our `catbind` function as shown:

```python
from partypy.partypy import simulate_party
```

```{note}
The above syntax is telling Python to import the function `simulate_party` from the `partypy` module of the `partypy` package. There are various other way to import code from python packages, which we'll explore more in **Chapter 4: {ref}`04:Package-structure-and-state`**.
```

<hr>

The `partypy` module has now been mapped to the current session's namespace and we can access the `catbind` function in our Python session using dot notation: `pypkgs.catbind` (note that if you wanted to import just the `catbind` function, rather than the whole `partypy` module, you could do `from pypkgs.pypkgs import catbind`, in which case "dot notation" would not be required to use the function). Let's try to use the function to concatenate two Pandas categoricals:

```python
import pandas as pd
a = pd.Categorical(["character", "hits", "your", "eyeballs"])
b = pd.Categorical(["but", "integer", "where it", "counts"])
pypkgs.catbind(a, b)

[character, hits, your, eyeballs, but, integer, where it, counts]
Categories (8, object): [but, character, counts, eyeballs, hits, integer, where it, your]
```

Hurray again! This seems to work as expected! Now that we have something working, you can exit your Python session (by typing `exit()`) and commit changes to version control:

```{prompt} bash
git add .
git commit -m "First working version of catbind function"
```

## Your second package code

For simple packages, you may choose to add all your reuseable code into `partypy.py` and that's often suitable for reusing code between your own packages. But more more complex packages, your package will benefit from better compartmetnalisation and organisation. we are now going to add a plotting function to our package. You may choose to add the following code into `partypy.py`, but we are going rename that file and split our package into two modules:
1. `partypy.simulate`
2. `partypy.plotting`

This leaves room for expansion later on (which we will do) and we fin dit's always useful to split packages up like this.

It would be useful to plot a histogram of simualtion results instead of just looking through a dataframe. We'll use `altair` to make our plot (but you could of course use any plotting library you like). Let's add `altair` as a dependency of our package now:

```{prompt} bash
poetry add altair
```

```console
Using version ^4.1.0 for altair

Updating dependencies
Resolving dependencies... (2.1s)

Writing lock file

Package operations: 8 installs, 0 updates, 0 removals

  • Installing attrs (20.3.0)
  • Installing markupsafe (1.1.1)
  • Installing pyrsistent (0.17.3)
  • Installing entrypoints (0.3)
  • Installing jinja2 (2.11.3)
  • Installing jsonschema (3.2.0)
  • Installing toolz (0.11.1)
  • Installing altair (4.1.0)
```

Now, we'll add the following code into `plotting.py`:

In [38]:
import altair as alt


def plot_simulation(results):
    """Simulate guest attendance at a party.

    The attendance of each guest is treated as a Bernoulli random variable
    with probability of attendance `p`. The total number of attending guests
    is summed up for each `simulations`.

    Parameters
    ----------
    results : pandas.DataFrame
        DataFrame of simulation results from `partpy.simulate_party()`

    Returns
    -------
    altair.Chart
        Histogram of simulation results.

    Examples
    --------
    >>> from partypy.simulate import simulate_party
    >>> from partypy.plotting import plot_simulation
    >>> results = simulate([0.1, 0.5, 0.9])
    >>> plot_simulation(results)
    altair.Chart
    """

    histogram = (
        alt.Chart(results)
        .mark_bar()
        .encode(
            x=alt.X(
                "Total guests",
                bin=alt.Bin(maxbins=30),
                axis=alt.Axis(format=".0f"),
                title="Attendees",
            ),
            y="count()",
            tooltip="count()",
        )
    )

    return histogram

Let's give our new code a spin but installing the package and testing it out.

```{prompt} bash
poetry install
```

```console
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: partypy (0.1.0)
```

Now, inside the root project directory we can open an interactive Python session (by typing `python` at the command line) and import our `partypy` module which contains our `catbind` function as shown:

<hr>

In [25]:
results = simulate(np.linspace(0, 1, 100))

In [26]:
plot_simulation(results)

## Package documentation

### Reading and rendering documentation locally

For the users of your code (including your future self) we need to have readable and accessible documentation expressing how to install your package, and how to use the functions within it. We'll discuss documentation in detail in **Chapter 6: {ref}`06:Documentation`**, but for now, we will demonstrate the basic steps required to get your documentation up-and-running quickly.

The Python packaging ecosystem has a tool to help you easily make documentation - [Sphinx](https://docs.readthedocs.io/en/stable/intro/getting-started-with-sphinx.html). In the Cookiecutter template we used to define our package's directory structure, there is a basic docs template that the Cookiecutter progam filled in with the information you entered interactively when you ran `cookiecutter https://github.com/UBC-MDS/cookiecutter-ubc-mds.git`. These files live in the `docs` directory and are `.rst` (reStructuredText markup language) filetype. This is a lightweight markup language that works similar to Markdown but uses different syntax. The templates provided to you here are fairly well formatted already, so you do not have to change the `.rst` formatting, however if you are interested in doing so, you can see the [Sphinx documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html) to get started.

First, we need to install `sphinx`, `nbsphinx`, and `ipykernel` as development dependencies using `poetry`. 

```{prompt} bash
poetry add --dev sphinx nbsphinx ipykernel
```

```{note}
The use of `--dev` specifies a development dependency, rather than a package function dependency. A development dependency is a package that is not required by a user to use your package, but is required for development purposes. If you look in `pyproject.toml` you will see that `sphinx` gets added under the `[tool.poetry.dev-dependencies]` section as opposed to the `[tool.poetry.dependencies]` section.
```

Next, to render the help documents locally from `.rst` to `.html` we need to navigate into the `docs` directory and then run the `Makefile` there, directing it to run the `html` target:

```{prompt} bash
cd docs
poetry run make html
```

```{note}
We append `poetry run` in front of most of our unix shell commands in this Python package workflow to ensure our commands are executed within our project's virtualenv and are using only the software tools we have specifically installed in that virtual environment.
```

```{attention}
You may see some red warnings while your docs are rendering, but these can be ignored and are typically just suggestions on how to improve your docs if you wish.
```

If we now look inside our `docs` directory we see that it has expanded, and the rendered `.html` files live in `_build/html`. We can open `_build/html/index.html` to view our docs locally on our laptop, they should look something like this:

```{figure} images/documentation-1.png
---
width: 100%
name: 03-documentation-1
alt: The rendered docs homepage.
---
The rendered docs homepage.
```

If we click on the "Module Index" link under the heading "Indices and tables" at the bottom of the page we get a "Your file was not found message":

```{figure} images/documentation-2.png
---
width: 100%
name: 03-documentation-2
alt: File not found error!
---
File not found error!
```

This is because we haven't written any documentation for our package function. Let's do that now by adding a `NumPy`-style docstring to the `catbind` function in `pypkgs/pypkgs.py` as shown below (we'll discuss docstring style more in **Chapter 6: {ref}`06:Documentation`**):

```python
import pandas as pd


def catbind(a, b):
    """
    Concatenates two pandas categoricals.

    Parameters
    ----------
    a : pandas.core.arrays.categorical.Categorical
      A pandas categorical.
    b : pandas.core.arrays.categorical.Categorical
      A pandas categorical that you wish to concatenate to a.

    Returns
    -------
    pandas.core.arrays.categorical.Categorical
      The new concatenated pandas categorical.

    Examples
    --------
    >>> from pypkgs import pypkgs
    >>> import pandas as pd
    >>> a = pd.Categorical(["character", "hits", "your", "eyeballs"])
    >>> b = pd.Categorical(["but", "integer", "where it", "counts"])
    >>> pypkgs.catbind(a, b)
    [character, hits, your, eyeballs, but, integer, where it, counts]
    Categories (8, object): [but, character, counts,
    eyeballs, hits, integer, where it, your]
    """
    concatenated = pd.concat([pd.Series(a.astype("str")),
                              pd.Series(b.astype("str"))])
    return pd.Categorical(concatenated)

```

Now we can use a `sphinx` extension (`napolean`) to render our `NumPy`-styled docstring into a modules page on our docs. To do this we need to install `napoleon` as a dev dependency:

```{prompt} bash
poetry add --dev sphinxcontrib-napoleon
```

```{note}
Normally to use this extension, we would also have to add `extensions = ['sphinx.ext.napoleon']` in the `conf.py` file in the `docs` directory, but we have taken care of this for you already with our Cookiecutter template.
```

Now we can change back to our root `partypy` directory, and use `sphinx-apidoc` and `poetry` to re-render our docs:

```{prompt} bash
cd ..
poetry run sphinx-apidoc -f -o docs/source pypkgs
cd docs
poetry run make html
```

Now when we click on the "Module Index" link under the heading "Indices and tables" we see a webpage that has a link to our module, `pypkgs.pypkgs`:

```{figure} images/documentation-3.png
---
width: 100%
name: 03-documentation-3
alt: The rendered docs module index.
---
The rendered docs module index.
```

And we can click on that to see the docs for `pypkgs.pypkgs.catbind`. Which should look roughly like this:

```{figure} images/documentation-4.png
---
width: 100%
name: 03-documentation-4
alt: Our function documentation.
---
Our function documentation.
```

Another hurray! 🎉🎉🎉 Let's commit this to version control and push to our remote:

```{prompt} bash
cd ..
git add .
git commit -m "generated and rendered docs for local viewing"
git push
```

### Reading and rendering documentation remotely

To share these docs online, we need to link our GitHub repository to [Read the Docs](https://readthedocs.org/) (where we will build and host our docs remotely). To do this:

1. Visit <https://readthedocs.org/> and click on "Sign up";
2. Select "Sign up with GitHub";
3. Click "Import a Project";
4. Click "Import Manually";
5. Fill in the project details by providing a package name (this must be a unique name, we've already taken "pypkgs" so perhaps try "pypkgs[your initials]"), the repository URL, **set the default branch to "main"**, and leave the rest as is. Click "Next"; and,
6. Click "Build version".

After following the steps above, your docs should get successfully built on [Read the Docs](https://readthedocs.org/) and you should be able to access them via the "View Docs" button on the build page, or from the link that Cookiecutter created for you on your repositories `README.md` file.

```{note}
For [Read the Docs](https://readthedocs.org/) to work with the `poetry` package workflow you need to have a `.readthedocs.yml` in the root of your Python package. We have created this for you using Cookiecutter and you can view it [here](https://github.com/UBC-MDS/cookiecutter-ubc-mds/blob/main/%7B%7Bcookiecutter.project_slug%7D%7D/.readthedocs.yml).
```

## Writing tests

We have interactively taken `catbind` for a test drive, but to prove to our future self and others that our code does in fact do what it is supposed to do, let's write some formal unit tests. We'll discuss testing in detail in **Chapter 5: {ref}`05:Testing`**, but will go over the key steps here. In Python packages, our tests live inside the `test` directory, typically in a file called `test_<module_name>.py`, thus for this package this is `tests/test_pypkgs.py`. Let's add a unit test (as a function named `test_catbind`) for our `catbind` function there now:

```python
from pypkgs import __version__
from pypkgs import pypkgs
import pandas as pd


def test_version():
    assert __version__ == '0.1.0'

def test_catbind():
    a = pd.Categorical(["character", "hits", "your", "eyeballs"])
    b = pd.Categorical(["but", "integer", "where it", "counts"])
    assert ((pypkgs.catbind(a, b)).codes == [1, 4, 7, 3, 0, 5, 6, 2]).all()
    assert ((pypkgs.catbind(a, b)).categories == ["but", "character",
            "counts", "eyeballs", "hits", "integer", "where it", "your"]).all()

```

```{note}
Given that we use `pd.Categorical` to create objects to test on, we have to import the `pandas` package at the top of our test file.
```

While we could test our test functions by starting a Python session, importing and running them it is much more efficient to automate the testing workflow. In the Python package ecosystem one way we can do this is to use `pytest`. A single call to `pytest` from the root of a project will look for all files in the `tests` directory, import all files prefixed with `test*` and then call all functions prefixed with `test*`. Pretty great! 

To try this out, we first add `pytest` as a dev dependency via `poetry`:

```{prompt} bash
poetry add --dev pytest
```

Then to run the tests, we use:

```{prompt} bash
poetry run pytest
```

```console
============================= test session starts ==============================
platform darwin -- Python 3.7.6, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/tbeuzen/GitHub/py-pkgs/pypkgs
collected 2 items                                                              

tests/test_pypkgs.py ..                                                  [100%]

============================== 2 passed in 0.56s ===============================
```

We get no error returned to us, indicating that our tests passed, Hurray! This suggests that the code we wrote is correct (at least to our test specifications)! Now we can share this with the world by putting these under local and remote version control:

```{prompt} bash
git add .
git commit -m "added unit tests for catbind"
git push
```

## Building and publishing your package

### TestPyPI

Python packages are generally shared via the [PyPI package index](https://pypi.org/). However, when we are just starting to develop packages, and/or at the development stage of our package, we typically first check that everything works by submitting to [testPyPi](https://test.pypi.org/). `poetry` has a command called `publish` which we can use to do this, however the default behaviour is to publish to PyPI. So we need to add testPyPI to the list of repositories `poetry` knows about via:

```{prompt} bash
poetry config repositories.test-pypi https://test.pypi.org/legacy/
```

Before we send our package, we first need to build it to source and wheel distributions (the format that PyPI distributes and something you'll learn more about in **Chapter 4: {ref}`04:Package-structure-and-state`**) using `poetry build`:

```{prompt} bash
poetry build
```

```console
Building pypkgs (0.1.0)
 - Building sdist
 - Built pypkgs-0.1.0.tar.gz

 - Building wheel
 - Built pypkgs-0.1.0-py3-none-any.whl
```

Finally, to publish to testPyPI we can use `poetry publish` (you will be prompted for your testPyPI username and password, sign up for one if you have not already done so):

```{prompt} bash
poetry publish -r test-pypi
```

Now you should be able to visit your package on testPyPI (e.g., <https://test.pypi.org/project/pypkgs/>) and download it from there using `pip` via:

```{prompt} bash
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pypkgs
```

```{note}
By default `pip install` will search PyPI for the named package. However, we want to search testPyPI because that is where we uploaded our package. The argument `--index-url` points `pip` to the testPyPI index. However, our package `partypy` depends on `pandas` which can't be found on testPyPI (it is hosted on PyPI). So, we need to use the `--extra-index-url` argument to also point `pip` to PyPI so that it can pull any necessary dependencies of `partypy` from there.
```

### PyPI

When you're at the point where you're happy to officially share your package with the world, you can publish to PyPI by simply typing:

```{prompt} bash
poetry publish
```

Your package will then be available on PyPI (e.g., <https://pypi.org/project/pypkgs/>) and can be installed with `pip`:

```{prompt} bash
pip install pypkgs
```

There are a number of optional arguments you can specify in your `pyproject.toml` file to control the metadata of your package, check them out in the [`poetry` documentation](https://python-poetry.org/docs/pyproject/). For example, you can use your `README.md` file as the description of your package on testPyPI or PyPI. To do this, you need to add the `readme` argument to the `[tool.poetry]` section of your `pyproject.toml` file and point to your `README.md` file, for example:

```toml
[tool.poetry]
name = "pypkgs"
version = "0.1.0"
description = "Python package that eases the pain of concatenating Pandas categoricals!"
authors = ["Tomas Beuzen <tomas.beuzen@gmail.com>"]
license = "MIT"
readme = "README.md"
```

## Summary

1. Build directory structure
2. poetyr init
3. add dependencies
4. write package code
5. install locally with `poetry install`
6. write tests (optional)
7. render documentation (optional)
8. publish to testpypi and pypi