# A quick introduction to creating packages in Python

## Why Are We Doing This?


One of the most powerful things about coding for the sciences is that it costs nothing to re-use code we’ve written in the past, allowing us to build on past work rather than starting over every project or paper.

However, one practice we see again and again is copying and pasting code from one project into another. Sometimes it will just be a function, other times (`MATLAB`, `NCL`) it’s files.

Assembling code in packages makes it really easy to re-use old code: all the scripts and functions end up in a central location and can be called and 
imported from anywhere on the computer - just like the packages `numpy`, `xarray` or `matplotlib`.

## Setting Up

In this tutorial, we’ll walk through the creation of a simple package from some code that’s just lying around. You can view and clone the demo package: https://github.com/ncar-hackathons/hello-cesm-package. 


### 1. The Basics

The most basic directory structure for a Python package looks like this:

```bash
project
|
├── LICENSE
├── README.md
├── my_package
│   ├── __init__.py
│   └── some_module.py
└── setup.py
```

But at the moment, we've just got some flat files.


```bash
project
|
├── statistics.py
└── climatologies.py
```

So, the first step is to move files around. First comes the hardest part: choosing a package name. I’ll call mine `cesm_package`. Create a directory with that name, and move the python files in there.

```bash
project
└── cesm_package
    ├── climatologies.py
    └── statistics.py
```

There is one more crucial file: `__init__.py` lets the Python interpreter know that there are importable modules in this directory. This is the script that gets run when you execute import measure. For more about what you can do with modules, you can see the [Python docs](https://docs.python.org/3/tutorial/modules.html). After adding `__init__.py`, the project directory should be


```bash
project
└── cesm_package
    ├── __init__.py
    ├── climatologies.py
    └── statistics.py
```

## `2. setup.py`

At this point, the library can be imported if we’re in the same directory, but it isn’t a package. To let `setuptools` and `pip` know how to handle it, we need to add the `setup.py` file.

A very basic version of setup.py is:


```python


"""The setup script."""


from setuptools import setup

install_requires = ['xarray', 'numpy', 'matplotlib'] # Whatever third-party libraries are needed



setup(
    author='Alice Doe',
    author_email='alice@example.com',
    description='My CESM analysis package',
    install_requires=install_requires,
    license='MIT',
    long_description='CESM data analysis package'
    keywords='ocean modeling',
    name='cesm-package',
    packages=['cesm_package'],
    url='https://github.com/github-user-name/project-name',
    version='0.1',
    zip_safe=False,
)

```

After adding `setup.py` the project directory should be 


```bash
project
├── cesm_package
│   ├── __init__.py
│   ├── climatologies.py
│   └── statistics.py
└── setup.py
```

### 3. Installing?

Technically, you can install the module by running `python setup.py install`, however during development, it’s much more convenient to run the following command from the root directory of your project (directory in which`setup.py` is located in)

```bash
python setup.py develop
```

In [15]:
ls

LICENSE      README.md    [34mcesm_package[39;49m[0m setup.py


In [21]:
python setup.py develop

running develop
running egg_info
writing cesm_package.egg-info/PKG-INFO
writing dependency_links to cesm_package.egg-info/dependency_links.txt
writing requirements to cesm_package.egg-info/requires.txt
writing top-level names to cesm_package.egg-info/top_level.txt
reading manifest file 'cesm_package.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'cesm_package.egg-info/SOURCES.txt'
running build_ext
Creating /Users/abanihi/opt/miniconda3/envs/devel/lib/python3.6/site-packages/cesm-package.egg-link (link to .)
cesm-package 0.1 is already the active version in easy-install.pth

Installed /Users/abanihi/devel/ncar-hacks/hello-cesm-package
Processing dependencies for cesm-package==0.1
Searching for matplotlib==3.1.0
Best match: matplotlib 3.1.0
Adding matplotlib 3.1.0 to easy-install.pth file

Using /Users/abanihi/opt/miniconda3/envs/devel/lib/python3.6/site-packages
Searching for numpy==1.16.4
Best match: numpy 1.16.4
Adding numpy 1.16.4 to easy-install

#### Using GIT/GitHub

This is my preferred way of quickly sharing/storing/installing packages. Simply create a git repo of the project directory, and put it somewhere. Then, use pip to install it from that repo directly.


```bash 
pip install git+https://github.com/ncar-hackathons/hello-cesm-package
```

An advantage of this is that you can also install to particular branches, tags, and commits. A common practice is add a tag for the version release `git tag v0.1`. Users of the library can then lock into that particular version,


```bash 
pip install git+https://github.com/ncar-hackathons/hello-cesm-package@v0.1
```

### Documentation!

We’ve already begun the documentation by adding a short description, author/maintainer name, and version… but there’s a lot more to add. At a minimum, a README file should be included, but so should a license dilw, changes between versions, and actual software documentation.

### README

A readme file summarizes the software. For Python packages, this can named README, README.rst, or README.md. The recommendation is to use [reST](http://docutils.sourceforge.net/rst.html), as this is the standard on PyPI



### LICENSE

The typical thing is to put the license you choose in LICENSE.md. There are a few choices for licenses. If you’re using github, you can add most standard open source licenses through the website.

Check this web page for learning more about open source licenses: https://opensource.guide/legal/

#### CHANGES

Changes between versions are usually tracked in `CHANGELOG.rst`. This is more of a concern if you’re planning to distribute the software.


### 5. OTHER THINGS TO THINK OF

A good package should also include full documentation and testing. I won’t cover this here, but unit tests can be performed with the `pytest` library: https://docs.pytest.org/en/latest/, with the tests stored in the directory tests. 

We can run tests with the following command:

In [20]:
pytest tests -v

platform darwin -- Python 3.6.7, pytest-3.9.1, py-1.8.0, pluggy-0.12.0 -- /Users/abanihi/opt/miniconda3/envs/devel/bin/python
cachedir: .pytest_cache
rootdir: /Users/abanihi/devel/ncar-hacks/hello-cesm-package, inifile:
plugins: icdiff-0.2, env-0.6.2, cov-2.7.1, print-0.1.2, cpp-1.1.0
collected 1 item                                                               [0m

tests/test_statistics.py::test_sample [32mPASSED[0m[36m                             [100%][0m




Many people use Sphinx for documentation, and that can be stored in the docs directory.

There’s a further step required for distributing. If you want to include these files in the distribution file of your package, you’ll need a 
`MANIFEST.in` file.

#### `MANIFEST.in`

```
include *.py
recursive-include docs *.rst
```



## 6. Where To Go From Here?

So, at this point, we’ve got a pretty good project skeleton, and most of the basics are covered. Your package should look something like this:

```bash

project
├── CHANGELOG.rst
├── LICENSE
├── MANIFEST.in
├── README.md
├── cesm_package
│   ├── __init__.py
│   ├── climatologies.py
│   ├── demo.ipynb
│   └── statistics.py
├── docs
│   └── overview.rst
├── setup.py
└── tests
    └── test_statistics.py
```

If you have software that covers all these points, then congratulations! You’re well on your way to a well-maintained software package.


## So Where Do I Actually Go?

To get a detailed look at packaging, check out 

- [the official Python docs](https://docs.python.org/3/distributing/index.html); they’re complete, accessible, and generally more up to date.
- [NCAR pop-tools project](https://github.com/NCAR/pop-tools) for reference