# Packaging Python Projects for Fun and Profit

A Python project (a framework, a tool or an application)

* a collection of modules, packages and resources intented to solve a problem, e.g.

> * provide new functionality, new numerical method or an algorithm
> * process and dispatch requsets from connected users
> * perform data collection, transformation and analytics

Goal of packaging
> a self-contained "file" for reliable transfer to, deployment on, and use in production environment

What we are going to cover:

* origanizing a project into a package
* adding external extensions and shipping data with the package
* preparing a package for distribution

<br>

## What is a Python module and package?

A Python **module** is 
* a block of code imported by some other code
* an object that serves as a **basic organizational unit** of code reusability in Python
* a namespace containing arbitrary Python objects or modules

Three types of modules: pure Python modules, extension modules, and packages
* __(pure)__ Python code contained in a single `.py` script

* __(extension)__ dynamically loadable code `.so` or `.dll` written in lower-level language (C/C++ for pyhton)

A Python **package** is a module which contains pure modules, extensions, or recursively, other packages.

<br>

### Distinctions

At the file system level:

* a `.py` script is a *module*
* a directory with modules and subfolders is a *package*

        package             │  module.py
          ├── __init__.py   │
          ├── foo           │
          ├── bar           │
          │   └── baz.py    │
          └── module.py     │

At the programming level
* `.py` module is standalone, it just imports from Python's stdlib

* a `package` can also import from witih itself

<br>

### Types of package

**namespace package** is a directory that contains a bunch of Python packages or modules, but no `__init__.py`.
* serves **only** as a logical container for packages

        regular             │  namespace
          ├── __init__.py   │    ├── foo
          ├── foo           │    ├── bar
          ├── bar           │    └── baz.py
          └── baz.py        │

**regular package** is a package containing `__init__.py`, which may be empty
* a regular package may even be a lonely `__init__`

<br>

### Comparison

A standalone module is compact
* new objects and functionality quickly **bloat** to an unmaintainable mess

Package is better organized
* overzealous obsessive organization may lead to **fragmentation**

Best of both worlds:
* classes that tend to be reused together belong in the same sub-package
* a sub-module has a single well-defined responsibility

<br>

### Importing from within

Within a package it is allowed and **recommended** to use [relative imports](https://docs.python.org/3/reference/import.html#package-relative-imports):
```python
from .bar import base       # object `base` from parent's module `bar`

from .. import func         # object `func` from the grantparent
```
* scripts can't import relative, only packages

Here the **dot** specifies the module **relative to the current** script.
Each extra `.` takes one **parent higher** in the module tree.

* `.bar` is the *sibling* module `bar`
* `..foo` is the *sibling* module `foo` of the parent
* `...bar.baz` equals "go up to *grand grand parent* and get **bar.baz** from there"

**<span style="color:red">IMPORTANT</span>** Watch out for **circular** imports,
i.e. you cannot import from a module, which imports from you!

<br>

### Import-visible objects

When a package is imported, the `__init__.py` file is **implicitly executed**,
and the objects it defines are bound to names in the package's namespace.

These **declared or imported** objects are directly **visible and immediately
accessible**. Otherwise, you need to expose them by **importing** explicitly.

        package
          ├── __init__.py      # from . import foo
          ├── foo              <- `package.foo` visible after `import package`
          ├── bar
          │    └── zoo.py      <- `package.bar.zoo` only visible after `import package.bar.zoo`
          └── baz.py           <- `package.baz` only visible after `import package.baz`

You can import any sub-module or sub-package from a package.

<br>

## Naming and structure

Decently named modules and objects make the structure and logic of the library clear and transparent.
* keeps code organized, easier to maintain and reuse.

        mtfusion
          ├── __init__.py              <- `.pipeline` refers to `module.pipeline`
          ├── pipeline.py              <- `.config.json` resolves as `module.config.json`
          ├── core
          │   ├── __init__.py
          │   ├── load.py
          │   └── model
          │       ├── __init__.py
          │       ├── attn.py          <- `..` refers to `module.core`
          │       └── lstm.py          <- `...config.json` resolves as `module.config.json`
          └── config
              ├── __init__.py          <- `.` refers to `module.config`
              ├── yaml.py
              └── json.py

The single resposibility principle entails that a Python module implements a single
well-defined distinct piece of the solution. In the above example,

* `core.models.attn` -- attention models solving the core problem
* `core.models` -- general models solving the core problem
* `core.load` -- loading dataset for the core problem
* `core` -- solution of the core problem

Implementing the same in a single `module.py` is a **mess**
* difficult to navigate the code when developing
* hard to maintain and extend the functionality
* semi-related functionality is mixed together
* troublesome to communicate to collaborators

<br>

### Running modules as scripts with \_\_main\_\_.py

Any Python module may have an [ifmain](https://docs.python.org/3/library/__main__.html#module-__main__)
section which is executed only if run as a script.

* this makes module both importable and directly executable.

```python
# ./script.py
def experiment_run(a=1, b="two"):
    pass

if __name__ == '__main__':
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument('a', type=int)
    parser.add_argument('b', type=str)

    # vars() turn a dotted-namespace into a dict
    options = vars(parser.parse_args())
    experiment_run(**options)
```
call with `python script.py [arguments]`

<br>

In a package this can be done in `__main__.py`

```python
# ./package/__main__.py
import argparse
from .experiment import run  # `from package.experiment import run`

parser = argparse.ArgumentParser()
parser.add_argument('a', type=int)
parser.add_argument('b', type=str)

run(**vars(parser.parse_args()))
```
cal with `python -m package [arguments]` (run as module)
* [PEP366](https://www.python.org/dev/peps/pep-0366/) allows relative imports to work correctly from main modules executed with the `-m` switch
* running as script `python package/__main__.py` fails


[ifmain](https://github.com/ivannz/fds2020-packaging.git)

<br>

# Preparing package distribution

A **distribution** is
* a collection of Python packages bundled into a single downloadable resource
* meant to be installed en masse onto an existing Python installation

A distribution is not a standalone application

As a developer, your responsibilities (apart from writing solid, well-documented and well-tested code, of course!) are:

* write a setup script using `setuptools`

* (optional) write a setup configuration file

* create a source distribution

* (optional) create one or more built (binary) distributions

### `setuptools`

You actually will neve need to run the folowing install
```bash
pip install setuptools
```

* provides support for building and installing additional modules into a Python installation

```python
import setuptools

from setuptools import setup, Extension
```

* a succesor to `distutils` from the Python's *stdlib*

```python
from distutils.core import setup, Extension
```

* use `numpy` enhanced `setup` for scientific distributions (mostly for easier linking and fortran code)

```python
import setuptools

from numpy.distutils.core import setup, Extension
```

<br>

## Essential `setup.py`

Every package distribution needs a setup script to be properly built and installed.

```python
# setup.py
from setuptools import setup, find_packages

setup(
    name="project",            # name of the distribution (may not be the same as the package name)
    version="0.1",             # version of the release
    packages=find_packages(),
)
```

`python setup.py [...]` builds, distributes, and installs modules and packages, lists requirements

<br>

### Name

This is the `name` of the distribution, which does not necessarily coincide with the package name

<br>

### Version

`version` of the package release, e.g. `major.minor[.patch[.sub]]`, helps with tracking changes and incompatibilities
* automatically upgrading experiment specifications from older versions

[Semantic Versioning](https://semver.org/) with `major.minor[.patch][sub]`
* The major number is **0** for initial, experimental releases of software, and
    * API is **work-in-progress** and not guaranteed to be consistent between minor releases

* The minor number is incremented when important new features are added to the package.

* The patch number increments when bug-fix backwards compatible releases are made.

Version may also specify [extra information](https://setuptools.readthedocs.io/en/latest/setuptools.html#specifying-your-project-s-version)
such as pre-release tags, relesae candidates, post-release tags, and local builds.

In [None]:
from packaging.version import Version, VERSION_PATTERN

# VERSION_PATTERN?

<br>

## Content of the distribution

Modules may be either pure Python, or extension modules written in C/C++
* or collections of packages, which include modules both Python and C/C++

### Distributing individual modules

Sometimes a `.py` module with "640 lines of code ought to be [enough for anybody](https://quoteinvestigator.com/2011/09/08/640k-enough/)".


* `py_modules` lists all standalone Python modules specified by module name (w/o `.py`)

        #         root
        #           ├── tiny.py
        #           └── setup.py

```python
from setuptools import setup

setup(
    name="TinyProject",
    version="0.1",
    py_modules=["tiny.py",]
)
```

`pip` tracks contents of such small distributions with `name`, so
```bash
pip uninstall TinyProject   # removes tiny.py and the related meta information
```

* **HOWEVER** imports are done like this:
```python
import tiny  # same as in `py_modules`
```

[tinyproj](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Distributing whole packages

In order to distribute an organized reusable Python package you need to list it's structure in `packages`
* package names corespond to directories in the filesystem
* by default relative to `setup.py`, can be overridden with `package_dir`

For packages with simple directory structure `find_packages` can automatically populate `packages`
* **scans** a specified directory for packages, that include an **\_\_init\_\_.py** file

In [None]:
from setuptools import find_packages

find_packages?

In some cases, however it is better to manually specify packages:

        root
          ├── src
          │   ├── __init__.py
          │   └── core.py
          ├── project
          │   ├── __init__.py
          │   └── experiment.py
          └── setup.py

```python
from setuptools import setup

setup(
    name="CharmingCerf",
    version="0.1",
    package_dir={'project.core': 'src'},
    packages=[
        'project',
        'project.core',
    ],
)
```

[charmingcerf](https://github.com/ivannz/fds2020-packaging.git)
[data-n-stuff](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Declaring extension modules

Unlike pure Python modules, which just need to be copied, modules written in C/C++ must be compiled for installation

* `ext_modules` -- a list of `Extension` instances, each describing a single extension module

```python
from setuptools import setup, Extension

setup(
    ...,
    ext_modules=[
        Extension(...),
    ],
)
```

`Extension` arguments
* `name` -- extension's name indicating where in Python’s namespace hierarchy the resulting extension lives

* `sources` -- a list of C/C++/Objective-C source files
* `libraries` -- the libraries to link against when building
* `include_dirs` -- include directories in which to search required header files for compilation

* [other options](https://docs.python.org/3/distutils/setupscript.html#other-options)

#### Cython extensions

Cython code files (`.pyx|.pxd`) should be converted to C/C++ source
* If Cython is installed then `Extension` is smart enough to do it automatically

```python
from Cython.Build import cythonize

setup(
    ...,
    ext_modules=cythonize([...])
)
```

[extension-galore](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Developing package in editable mode

it's common to locally install your project in "editable" mode while you're working on it
* the project is built and installed as a Python package and accessible
* you can alter it at any time, and the changes and updates you make are automatically visible
<!-- * this will also install any dependencies declared with "install_requires", if necessary -->

```python
# in the directory of `setup.py`
pip install -e .
```

If you change an extension source code, it must be rebuilt:

```python
# in the directory of `setup.py`
python setup.py build_ext
```

<br>

## Types of distribution

### Source distribution (`sdist`)

Bundles metadata and the **essential** source files needed for building and installing
* `setup.py` setup script, package *resources*, standard README files, e.g. `.md|.txt|.rst`
* Python source files from `py_modules` and `packages`
* C/C++ source files in `ext_modules` or `libraries`
* [and other assets](https://docs.python.org/3/distutils/sourcedist.html#specifying-the-files-to-distribute)

To create an archive with the setup script `setup.py` and package source:

```bash
python setup.py sdist
# python setup.py sdist --help-formats
```

the archive's is determined by in `setup()` by keywords `name=...` and `version=...`

[any example](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Wheel distribution (`bdist`)

**wheel** is a "binary package" containing Python source code and/or compiled byte-code
designed to distiribute Python solutions
* allows installation on the target system without needing to go through the "build" process 
* contains files and metadata that which only need to be **moved** to the correct location

```bash
# pip install wheel
python setup.py bdist_wheel
```

Other platform-specific *bdist* are being [deprecated](https://www.python.org/dev/peps/pep-0527/) in favour of `wheel` or `sdist`.

Three types of wheels:

* **Pure Python**: runnable on any Python installation with the **same major version as the one used to build the wheel** (no compiled extensions)

* **Platform**: code using compiled extensions, needing a build platform-specific, e.g. Linux, macOS, or Windows

* **Universal**: pure Python natively supporting **both** 2 **and** 3 (compiled extensions forbidden)

[any example](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Relationships between Distributions and Packages

A distribution can declare which modules or packages it **requires**, **provides** and **obsoletes**
* useful for dependency manager only, but **does not force** the prerequisites to be met


`setup()` keywords:
* `requires=...` : a list of required dependencies on other Python modules/packages
    * optional comma0separated list of version qualifiers, which must ALL be met (logical AND)


        ["scikit-learn (>=0.19)", "networkx (>=1.0, !=1.11, <2.0)", "cvxopt==3.2.1"]

* `provides=...` a list of Python modules or packages the distribution provides for others

* `obsoletes=...` a list, like `requires`, of other packages that are obsoleted by the package
    * if no qualifiers are given, all versions of a module/package are obsoleted

<br>

## Additional options

### Force requirements

It is possilbe to enforce requirements in `setup.py` with `install_requires`
* a list of strings specifying what other distributions should to be installed

```python
setup(
    ...,
    install_requires=[
        "scipy",
        "numpy >=1.10.4",
        "cloudpickle >=1.2.0, <1.4.0",
        "gym[atari] >=0.16, !=1.16.1",
        "package @ https://example.com/repo.git@revision#egg=project-version"
    ],
    python_requires='>=3.5',
)
```

<br>

### Disctribution traits

Sometimes your package does can have optional functionality,
e.g. `pip install 'gym[atari,box2d]'` in OpenAi gym

These optional extras can be specified with `extras_require` dictionary of lists:
```python
setup(
    ...,
    extras_require={
        'atari': ['atari_py~=0.2.0', 'Pillow', 'opencv-python'],
        'box2d': ['box2d-py~=2.3.5'],
        'classic_control': [],
        'mujoco': ['mujoco_py>=1.50, <2.0', 'imageio'],
        'robotics': ['mujoco_py>=1.50, <2.0', 'imageio'],
    }
)
```

<br>

### Distributing Package Data and Resources

Data relevant to the package's implementation, or documentation to the end users of the package, configuration files, user message catalogs

* `package_data` -- a dict mapping a package to a list of relative paths to include in the package
    * files are expected to be part of the package in the source directories

        #     root
        #       ├── project
        #       │   ├── ...
        #       │   └── samples
        #       │       └── *.dat
        #       └── setup.py
```python
setup(...,
      package_data={
          'project': [
              'samples/*.dat'
          ]
      },
```

[data-n-stuff](https://github.com/ivannz/fds2020-packaging.git)

<br>

### Optional Package Meta-Data

[Additional meta-data](https://docs.python.org/3/distutils/setupscript.html#additional-meta-data)
is good to keep filled in, so that the end users of ypur package know the necessary details such as
* who is the author, and whom to contact for support/bugreport
* [license](https://choosealicense.com/) covering the terms and limitations
* description

```python
setup(
    ...,
    author="Author Name",
    author_email="author@email.io",
 
    description=
"""Our package implements  a framework for end-to-end irc app  integration and"""
""" uses Spring  Boot to serve  the application,  leveraging Core  Annotations"""
""" to manage the user model, model binding and migration code, and additional"""
""" dependencies on Vault and Spring Boot.""",

    keywords="lorem ipsum dolor sit amet consectetur adipiscing elit",
    license="MIT License",
)
```

<br>

# Practical

You are given a file `flamboyant_lamarr.py` which works and does something.

Your goal is to make it into an better organized package that can be run as a script.
* create a couple of files and directories here and there
* move some blocks of code around with the necessary glueing imports
* try to separate code from data and put the latter in a some resource file (`csv`, `json`, `yaml` whatever)
* write a `setup.py`, build a `wheel`, and run
 `py_modules`

```bash
python -m <the name of your choosing> 100
```

Try to be creative and reasonable with the structure:
* what is the most natural way of grouping code?
* which code is service and tools?
* what is core functionality?

[flamboyant_lamarr](https://github.com/ivannz/fds2020-packaging.git)

<br>

## Glossed over details of `setup()` 

* other [keywords](https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords)
* specifying setup via [setup.cfg](https://docs.python.org/3/distutils/configfile.html)

https://setuptools.readthedocs.io/en/latest/setuptools.html

<br>

In [None]:
assert False

<br>

# Trunk

Here is how you export different logic and functionality inside the module.

```python
# module/__init__.py
from .pipeline import run, save  # import `run`, `save` from sub-module `pipeline`
from .config import load         # import `load` from sub-package `pipeline`


# module/core/__init__.py
from .load import load           # ! overrides exposing .load
from .model import build


# module/core/model/__init__.py
from .attn import Attention      # `Attention` from child `attn`
from .lstm import LSTM           # `LSTM` from child `lstm`

def build(options):
    # build model from options

# module/config/__init__.py
from . import json, yaml         # import `json` and `yaml` from own parent

def load(filename):
    # dispatch to json.load or yaml.load depending on filename's extension
```

This allows you to write something on the lines of

```python
import module

from module import run                # `run` exposed by `module`
from module import config as cfg      # `config` of `module` aliased as `cfg`
from module.core import build, load   # `build` and `load` from submodule `core`


config = cfg.load("./experiments/config__20200416__grid.json")
config = cfg.fix_defaults(config)

result = run(model(config), load("./data/records.csv", config))

module.save(result, config["target"])
```

<br>

<br>

The most basic version is the release number -- a **series of digits punctuated by dots**. Each series is treated numerically ignoring leading zeros and specify ordinal numbers of *release*, *subrelease*, *subsub-release* and so on.

Versions are comparable
* `0.9 <= 0.10 == 1.90.0 <= 1.10.01`

**Release zero** usually indicates that the API of your library is **work-in-progress** and is not guaranteed to be consistent even between subreleases.

In [None]:
Version("0.0.5") <= Version("0.1.2"), \
Version("1.3.4") <= Version("1.3.12"), \
Version("2019.12.31") <= Version("2020.4")

A version consists of an alternating series of release numbers and
pre-release or post-release tags. it is a 

* pre-release tags are older than the release number (`a|b|c|rc|alpha|beta|pre|preview` with a number)
    * `1.2a-1` $\leq$ `1.2b` $\leq$ `1.2b-3` $\leq$ `1.2`

* post-release tags, e.g. patches, ports, builds, revisions, or date stamps, are newer than the release (`post|rev|r` or `-[0-9]`)
    * `1.2` $\leq$ `1.2-1` $\leq$ `1.2post-2`

<br>