(04:Package-structure-and-state)=
# Package structure and state
<hr style="height:1px;border:none;color:#666;background-color:#666;" />

The previous chapter provided a high level overview of the key steps involved in developing a Python package. The first step in that process is to create the file and directory structure of your package. We typically recommend creating this structure from a pre-made template, like we did in the previous chapter with the help of the `cookiecutter` package. However, it's important to know why this particular structure is used in Python packaging and this chapter will describe in more detail what packages actually are, why they have this structure, and how we can leverage it for different functionality.

We'll first discuss some underlying useful theory, then dive back into the practicals.

## Packaging fundamentals

Part of the beauty of Python is that it abstracts lower-level implementations details away from users who don't need to understand them. However, to gain a deeper understanding of Python packages, it's useful to have basic familiarity of some of these lower-level details, which we'll explore in this section.

Firstly, all data in a Python program is represented by objects or by relations between objects. For example, integers and functions are kinds of Python objects. We can find the type of a given object using the in-built `type()` function, as demonstrated in the examples below:

```{prompt} bash \$ auto
$ python
```

```{prompt} python >>> auto
>>> a = 1
>>> type(a)
```

```python
int
```

```{prompt} python >>> auto
>>> def hello_world(name):
        print(f"Hello world! My name is {name}.")
>>> type(hello_world)
```

```python
function
```

In the above code, we created an integer and function object which are mapped to the names `a` and `hello_world` respectively.

Now, the object relevant to our discussion of Python packages is the module object. A module is an object that serves as an organizational unit of Python code. In the simplest case, this code is stored in a file with a .py suffix and is imported using the `import` statement. The created module object’s name is the name of the imported file (excluding the .py suffix). For example, imagine we have a module `greetings.py` in our current directory containing functions to print "Hello World!" in English, German, and Spanish:

```python
def hello_world():
    print("Hello World!")


def hallo_welt():
    print("Hallo Welt!")


def hola_mundo():
    print("Hola Mundo!")
```

We can import that module using the `import` statement and can use the `type()` function to verify that we created a module object which has been mapped to the name "greetings" (the name of the file):

```{prompt} python >>> auto
>>> import greetings
>>> type(greetings)
```

```python
module
```

As mentioned earlier, this module object is an organisational unit of code. We say this because the contents of the module (in this case, the three different "Hello World!" functions) can be accessed via the module name and "dot notation". For example:

```{prompt} python >>> auto
>>> greetings.hello_world()
```

```python
"Hello World!"
```

>As a real-world analogy, think of your bedroom as a module. Your possessions (like your bed, computer, pet Python, etc.) are contained and organised within your bedroom and can be accessed via your bedroom door (it seems efficient to have your possessions in a single room rather than scattered around the house).

At this point in our discussion, it's useful to mention Python's namespaces. A "namespace" in Python is simply a mapping from names to objects. From the code examples above, we've added the symbolic names `a` (an integer), `hello_world` (a function), and `greetings` (a module) to the current namespace and can use those names to refer to the objects we created. Python provides various tools for introspecting namespaces, one of which is the `dir()` function, which, when called with no arguments, returns a list of names currently defined:

```{prompt} python >>> auto
>>> dir()
```

```python
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
 '__package__', '__spec__', 'a', 'hello_world', 'greetings']
```

In the output above, we can see the names of the three objects we defined: `a`, `hello_world`, and `greetings`. The other names prefixed with double underscores are objects that were initialised automatically when we started the Python interpreter and are implementation details that aren't important to our discussion here, but can be read about in the [Python documentation](https://docs.python.org/3/reference/executionmodel.html?highlight=__builtins__#execution-model). To help focus on just the names we specifically defined, we can ignore those names prefixed with a double underscore using the following list comprehension:

```{prompt} python >>> auto
>>> [name for name in dir() if not name.startswith("__")]
```

```python
['a', 'greetings', 'hello_world']
```

Namespaces are created at different moments, have different lifetimes, and can be accessed from different parts of a Python program - but these details digress from the text and we point interested readers to the [Python documentation](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces) to learn more. The important point to make here is that, when a module is imported using the `import` statement, a module object is created and it has its own namespace populated by the Python code (i.e, definitions and statements) within that module. We can access that code via the module name and "dot notation" as we did earlier. In this way, the module object isolates a codebase and provides us with a clean, logical, and organised way to access code. We can view all the names a module defines (i.e., its namespace) by passing the module object in as an argument to the `dir()` function :

```{prompt} python >>> auto
>>> [name for name in dir(greetings) if not name.startswith("__")]
```

```python
['hallo_welt', 'hello_world', 'hola_mundo']
```

A final point to stress is that there is no relation between names in different namespaces. For example, the Python session we've been running in this section now gives us access to two `hello_world` functions; one that was defined in our interactive interpreter, and one defined in the `greetings` module. While these functions have the exact same name, there is absolutely no relation between them because they exist in different namespaces; `greetings.hello_world` exists in the `greetings` module namespace and `hello_world` exists in the top-level "global" namespace. So, we can access both with the appropriate syntax:

```{prompt} python >>> auto
>>> hello_world("Tom")
```

```python
"Hello world! My name is Tom."
```

```{prompt} python >>> auto
>>> greetings.hello_world()
```

```python
"Hello World!"
```

Now that we have a basic understanding of modules, we can further discuss packages. In fact, it's a pretty intuitive transition from modules to packages: simply speaking, Python packages are just a collection of one or more modules. Packages provide another level of abstraction for our code base and allows us to group and organise modules (as well as non-code files, like data; but more on that later) in one place for easy access and distribution.

A useful analogy to remember the distinction between modules and packages is to think of a file and directory structure on your computer: directories are packages and files within those directories are individual modules. Just like the file system on your computer, a root directory (package) may contain files (modules) and/or subdirectories (which we would call subpackages).

While this analogy holds at the conceptual level, the distinction between modules and packages in Python is a little more vague. In fact, regardless of whether you `import` a single module or a package, Python will create a module object in the current namespace. For example, let's import the `partypy` package we created in **Chapter 3: {ref}`03:How-to-package-a-Python`** and check its type:

```{note}
If you're following on from **Chapter 3: {ref}`03:How-to-package-a-Python`**, recall that we created and installed our `partypy` package in a virtual environment which can be activated by running `conda activate partypy` in the terminal.
```

```{prompt} python >>> auto
>>> import partypy
>>> type(partypy)
```

```python
module
```

Just as before, we can access the contents of a package via "dot notation". For example:

```{prompt} python >>> auto
>>> from partypy.simulate import simulate_party
```

While we get a module object regardless of whether we import a module or a package, the key difference between a module and a package to Python is that packages are module objects that have a `__path__` attribute. This `__path__` attribute basically tells Python where to look when importing the contents (modules or sub-packages) of your package. For example, let's check that the `partypy` module object we just created does indeed have an attribute called `__path__`, but the `greetings` module we imported from `greetings.py` earlier does not:

```{prompt} python >>> auto
>>> partypy.__path__
```

```python
['/Users/tomasbeuzen/GitHub/py-pkgs/partypy/src/partypy']
```

```{prompt} python >>> auto
>>> greetings.__path__
```

```python
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'greetings' has no attribute '__path__'
```

>We will discuss the nuances of importing from packages more a little bit laterIf you're interested in learning more about Python's import system, we recommend checking out the [Python documentation](https://docs.python.org/3/reference/import.html). 

We'll talk a bit more about the nuances of importing from packages in the next section, but the final point to stress here once again is that, due to the way Python's namespaces work and by leveraging dot notation, we can easily use different packages that happen to have module or sub-package names in common. For example, the popular `numpy` and `pandas` packages both have a sub-package called `core`, which we are able to access independently using dotted module names:

```{prompt} python >>> auto
>>> import pandas as pd
>>> pd.core
```

```python
<module 'pandas.core' from '/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python3.9/site-packages/pandas/core/__init__.py'>
```

```{prompt} python >>> auto
>>> import numpy as np
>>> np.core
```

```python
<module 'numpy.core' from '/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python3.9/site-packages/numpy/core/__init__.py'>
```

Maybe replace the above with a simple "Just as there is no relation between names in different modules, there is no relation between names (modules and sub-packages) in different packages."

## Package structure

In this section we'll talk about some of the more practical considerations of structuring Python packages.

### Types of packages

Python actually supports two kinds of packages: "regular packages" and "namespace packages".

A regular package is what 99% of readers will be familiar with and use. Regular packages typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

```
parent/
    __init__.py
    one/
        __init__.py
    two/
        __init__.py
    three/
        __init__.py
```

Namespace packages https://docs.python.org/3/reference/import.html#namespace-packages

```
parent/
    __init__.py
    one/
        __init__.py
    two/
        __init__.py
    three/
        __init__.py
```

### Importing and the \_\_init\_\_.py file

Maybe I can talk about this in the "initialization" section?

Last thing to talk about is how packages and modules are loaded. The `import` statement does two things, it loads modules and binds them to a name in the current namespace. But when importing a package, all modules get run so it really doesn't matter how you import, it's more a matter of preference (https://docs.python.org/3/reference/import.html#the-module-cache). For example, we can view loaded modules in `sys.modules`, but they may not be available in the current namespace. When mporting a package, "loading" means running `__init__.py` and then loading the contents of the module (if it's a package, run `__init__.py`, if it's a module, load the contents.

We can view the namespace of `partypy` which should be composed of two modules, `plotting` and `simulate`:

```{prompt} python >>> auto
>>> [name for name in dir(partypy) if not name.startswith("__")]
```

```python
['plotting', 'simulate']
```

Finally, we can access the code within these modules using dot notation. For example:

```{prompt} python >>> auto
>>> type(partypy.simulate.simulate_party)
```

```python
function
```

### Package and modules names

Python package naming guidelines and conventions are described in [Python Enhancement Proposal (PEP) 8 - Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) and [PEP 423 - Naming conventions and recipes related to packaging](https://www.python.org/dev/peps/pep-0423/). PEP 8 and PEP 423 should be read through at least once, but the fundamental guidelines are that packages/modules should have a single, short, all-lowercase name. Underscores can be used to separate words in a name but are typically discouraged.

From a more practical point of view, we've come up with the "three M's" which recommend names be:
1. **Meaningful**: the name should somewhat reflect the functionality of the package.
2. **Memorable**: the name should be easy for users to find, remember, and relate to other packages.
3. **Manageable**: remember that users of your package will access its contents via dot notation. Make it as quick and easy as possible for them to do this by keeping your package name short and sweet. For example, imagine if we called our `partypy` package something like `simulate_party_attendance`. Every time a user wanted to access the `simulate_party()` function, they'd have to write this: `from simulate_party_attendance.simulate import simulate_party()` - yikes!

### Package layout

The package structure we've been using so far looks like this:

```
x
```

This is sometimes called the "src" or "source" layout due to putting code in the `src` folder. You often see packages with this format too:

```
x
```

We recommend the former.

There are of course many different ways to bundle the other parts of your package too. We present the most standard form here which will be useful to all readers. If a project requires a different structure you can do that too following the same principes presented here.


Different kinds of package structures, e.g., src layout vs non src layout

In Chapter 3, we got an error the first time we tested our code. We would not have got thsi error if using the non-src layout.

### Internal references

- Move binomial function to utils.py
- Reference it from the package using dot notation
- Note that if in a subpackage, use double dot notation

### Including non-code files in a package

- the init.py file
- the layout of a package
- importing internals
- adding data

- Include the guest list as a resource in the package
- Publish a new version of the package, it's not breaking, but it's a new feature, so we should bump to 0.2.0 and re=publish

## Package state

We'll finish this chapter with a discussion of package states. Once again, this departs from the practical focus of this book to provide some noteable theory which may help the intermediate Python user/packager better grasp the working of packages. 

We'll finalize this chapter by gaining a better understanding of the different states our package holds as we develop it.

By "package" here we mean the code that you wish to bundle up and distribute. In Python, your package can be in several different states depending on its complexity, target audience, and stage of development. The ones we'll talk about here are:

- Modules
- Packages
- Source distributions
- Built distributions
- Binary distributions
- Installed packages
- Imported packages
- Python applications

You've already seen some of the commands that put packages into these various states. For example, `poetry build` at the command line or `import` in a Python session. In the following sections, we'll be giving those operations some context.

```{figure} images/packaging-flowchart.png
---
width: 75%
name: 04-package-flowchart-2
alt: The Python packaging workflow.
---
The Python packaging workflow.
```

### Source Code  (probably delete)

#### Modules

A module is any Python `.py` file. A module may consist of Python functions, classes, variables, and/or runnable code. A module that relies *only* on the standard Python library can easily be distributed and used by others (on the appropriate version of Python). In this way, a module can be thought of as a very simple package. For example, consider a module `simple_math.py` that contains the functions `list_range` and `odd_even`:

```python
def list_range(x):
    return max(x) - min(x)
  
def odd_even(x):
    if x % 2:
        print('x is odd.')
    else:
        print('x is even.')
```

If the module `simple_math.py` is in your working directory then you can import the module using:

```python
import simple_math  # imports the entire module. Functions can then be accessed via dot notation, e.g., simple_math.list_range()
from simple_math import list_range  # import only list_range function
from simple_math import odd_even  # import only odd_even function
from simple_math import *  # import all functions
```

Because modules are single files they can easily be shared to others by e.g., email, GitHub, Slack, etc. Another user would simply place the module in their working directory to use it. However, this method of distribution does not scale well in cases of multiple files, if your code depends on other libraries/packages, or needs a specific version of Python.

#### Python packages

Projects consisting of multiple Python `.py` files (i.e., modules) are, by their nature, harder to distribute. If your project consists of multiple files, it is typical to organise it into a directory structure. Any directory containing Python files can comprise a Python "package". 

While we've been using the term "package" fairly generically so far, it does have a specific meaning in Python and it's important to make clear the distinction between "modules" and "packages". As described in the previous section, any Python `.py` file is a module. In contrast, a package is a directory containing module(s) and/or additional package(s) (sometimes called "nested packages" or "subpackages") along with an `__init__.py` file. An `__init__.py` file is required to make Python treat a directory as a package (as opposed to it simply being a plain-old directory of Python files); in the simplest case `__init__.py` is an empty file, but it can also execute initialization code for the package upon import (read more [here](https://docs.python.org/3/tutorial/modules.html#importing-from-a-package)). Packages allow us to structure and organise our Python code and intuitively access it using “dotted module names”. Consider having the following two packages in your working directory:

A package containing modules:

```md
pkg1
├── __init__.py
├── simple_math.py
└── advanced_math.py
```

A package containing nested packages:

```md
pkg2
├── __init__.py
├── simple
│   ├── __init__.py
│   └── simple_math.py
├── advanced
    ├── __init__.py
    └── advanced_math.py
```

Modules can be accessed using dot notation. For example:

```python
from pkg1 import simple_math  # import simple_math module from pkg1
from pkg2.simple import simple_math  # import simple_math module from pkg2
```

It would be possible to share a package by transferring all the files that comprise the package (keeping the directory structure intact) to another user, who could then use the package if it were placed in their working directory. However, just like single modules, this method of distribution does not scale well, makes it difficult to support or update your code, and won't work if your code depends on additional libraries, or needs a specific version of Python. We need a more efficient and reliable way to package and distribute our code which leads us to "source distribution packages" and "built distribution packages" which are described below.

### Source distribution packages

A "distribution package" (often referred to simply as a "distribution") is a single archive of the Python packages, modules and other files that make up your project. Having a single archive makes it easier to distribute your code to the world. The fundamental distribution format is called a "source distribution" (`sdist`). An `sdist` is a compressed archive (e.g., `.tar.gz` or `.zip`) of your package. Essentially, an `sdist` provides all of the metadata and source files needed for building and installing your package. You can read more about source distributions [here](https://docs.python.org/3/distutils/sourcedist.html). The standard tool in Python for creating `sdists` (and binary distributions, which we'll explore in the next section) is `setuptools`. 

```{note}
As we saw in **Chapter 3: {ref}`03:How-to-package-a-Python`**, we prefer to use `poetry` to create distribution packages of our Python code, as a simpler and more intuitive alternative to `setuptools`. We'll discuss Poetry a little later.
```

As a very simple example, consider the following directory which now contains a `setup.py` file.

```md
root
├── pkg1
│   ├── __init__.py
│   ├── simple_math.py
│   └── advanced_math.py
└── setup.py
```

The `setup.py` file is a standard file that contains metadata about your project and helps `setuptools` build your `sdist` - in the very simplest case, it may look like this:


```python
from distutils.core import setup


setup(name='pkg1',
      version='0.1.0',
      packages=['pkg1'],
      )
```

We won't talk about `setup.py` too much more as we will advocate for using `poetry` for building and distributing your packages (we'll get to that a little latter), but if you see a `setup.py` file somewhere in your packaging jounrey at least you now know what it's for! If you want to learn more about creating a `setup.py` file, it is described in detail [here](https://docs.python.org/3/distutils/setupscript.html#). If you do decide to use `setuptools` for building your package and you have your `setup.py` file all set up, your `sdist` can be built by changing to the `root` directory of your package and running the following command:

```{prompt} bash
python setup.py sdist
```

This will create an archive file (`.tar.gz` by default) of your project which is your `sdist`. If your code is pure Python then an `sdist` is a perfectly acceptable way to distribute your code, and a user could install it using:

```{prompt} bash
pip install .
```

You could also share your `sdist` to PyPI from which a user could install it using `pip install`. It's important to note that installing a package actually adds the package to your default installation directory (more on that later in the chapter) such that it is accessible outside of your working directory - this is a key difference to simply sharing code as a module or package as we explored in the last two sections. We recommend consulting the [The Hitchhiker's Guide to Packaging](https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/creation.html#) and the [Python docs](https://docs.python.org/3/distutils/sourcedist.html) for more information on creating and distributing source distributions. Some notable examples of Python `sdists` include: [Django](https://github.com/django/django), [hyperlink](https://github.com/python-hyper/hyperlink), and [requests](https://github.com/psf/requests).

### Built distribution packages

Source distributions are "unbuilt" and require a build step before they can be installed. This nuance is most relevant in cases where your code relies on non-Python code/libraries that require building (aka "compilation") before they can be used (more on that in the next section). However, even if your package is written in pure Python, a build step is still required to build out the installation metadata. As a result, built distributions are the preferred format for distributing your Python packages. They are packages that have been pre-built and do not require a build step before installation - they only need to be moved to the correct location on your system (as we'll explore more later in the chapter). Like a source distribution, a built distribution is a single artefact, and the main built distribution format used by Python is called a `wheel`.

Python's installer `pip` always prefers installing built distributions (`wheels`) over source distributions (`sdists`) because installation is faster. Building `wheels` is similar to building source distributions with `setuptools` as described in the previous section. We won't go into details here because for most users we recommend the use of `poetry` (described later in this chapter) which handles this build process for you in a simple and intuitive way. However, if you're interested in learning more about using `setuptools` to build a `wheel` of your project we recommend taking a look at the [Python Packaging User Guide tutorial](https://packaging.python.org/tutorials/packaging-projects/).

If your code relies on any non-Python code/libraries, you'll need to use a specific kind of built distribution known as a binary distribution to bundle up your package, which is described in the next section.

### Binary distribution packages (merge with above)

One of the most powerful features of Python is its ability to interoperate with libraries written in other languages, for example, C. Developers sometimes choose to take advantage of this interoperability and include code from other languages in their package to make their code faster, access libraries written in other languages, and generally improve the functionality of their code. While Python is typically referred to as an interpreted language (i.e., your Python code is translated to machine code as it is executed), languages such as C require compilation before they can be used (i.e., your code must be translated into "machine code" *before* it can be executed). Most end-users will probably not have the tools, experience, or time to build packages containing code written in other languages (typically called "extensions"), so in these cases binary distributions are how you make life as easy as possible for installers of your code. Binary distribution packages are simply packages that contains pre-compiled extensions - as an analogy, you can think of your source code as a cake recipe, while a binary distribution is the fully cooked cake.

For example, much of the commonly used Python library `NumPy` is implemented as C extensions. The existence of pre-built `wheels` in Python means that a user can, for example, simply run `pip install numpy` to install `NumPy` from PyPi, as opposed to having to build it from source with the help of a C compiler, amongst other requirements. If you're feeling particularly masochistic you can actually try to build `NumPy` from source following [these instructions from the `NumPy` docs](https://numpy.org/devdocs/user/building.html).

Recall that binary distributions contain compiled code (code that has been translated from human-readable form to machine code), but different platforms (i.e., Windows, Mac, Linux) read machine code differently. As a result, binary distributions are platform specific. For this reason, binary distributions are usually provided with their corresponding source distributions; if you don’t upload binary `wheels` of your code for every platform, end-users will still be able to build it from source. Take a look at the downloadable file list of [`NumPy` on PyPi](https://pypi.org/project/numpy/#files) - you'll see `wheels` for most common platforms, as well as the source distribution at the bottom of the list. `wheels` actually come in three flavours (which you can read more about [here](https://packaging.python.org/guides/distributing-packages-using-setuptools/#wheels)):

1. *Universal wheels*: pure Python and support Python 2 and 3. Can be installed anywhere using `pip`.
2. *Pure Python wheels*: pure Python but don’t support both Python 2 and 3
3. *Platform wheels*: binary package distributions specific to certain platforms as a result of containing compiled extensions.

You can tell a lot about a `wheel` from the name itself which follows a [strict naming convention](https://www.python.org/dev/peps/pep-0427/#file-name-convention): `{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl`. For example, the `NumPy` wheel `numpy-1.18.1-cp37-cp37m-macosx_10_9_x86_64.whl` tells us that:

- The distribution is `NumPy v1.18.1`;
- It is made for Python 3.7;
- It is specific to the `macosx_10_9_x86_64` platform (i.e, this is a "platform wheel" because it is platform-specific).

Most readers will never deal with building extensions in other languages for their Python package, so this section is intended to be read as general information on Python's packaging ecosystem and the `wheel` format. However, if you are interested in building binary extensions for your package, the [Python Packaging Authority guide](https://packaging.python.org/guides/packaging-binary-extensions/) is a good place to start.

### Poetry and pyproject.toml  (probably move to "packaging tools")

The previous sections gave a high level overview of Python's standard packaging options and tools. However, in **Chapter 3: {ref}`03:How-to-package-a-Python`** we used `poetry` to create a toy Python package - so where does this tool fit into the Python packaging landscape? Well, in the previous sections, we really only touched the tip of the iceberg of Python packaging. When creating a package there's a lot of customisation to think about with your `setup.py` file, and a host of other files we didn't even talk about (e.g., `requirements.txt`, `setup.cfg`, etc)! Needless to say, packaging in Python can be hard to understand, especially for beginners. These words echo the sentiments of `poetry's` creator Sébastien Eustace and the motivation for creating the tool:

> *"Packaging systems and dependency management in Python are rather convoluted and hard to understand for newcomers. Even for seasoned developers it might be cumbersome at times to create all files needed in a Python project: setup.py, requirements.txt, setup.cfg, MANIFEST.in, and the newly added Pipfile. So I wanted a tool that would limit everything to a single configuration file to do: dependency management, packaging and publishing."*

That "single configuration file" is `pyproject.toml` (you can read more about `.toml` files [here](https://www.python.org/dev/peps/pep-0518/)). Essentially, `poetry` is based on all the concepts of `sdists` and `wheels` discussed previously - it just simplifies and streamlines the whole packaging process in an intuitive way. In fact, the `poetry build` command you used previously in **Chapter 3: {ref}`03:How-to-package-a-Python`**, actually creates the `sdist` and `wheel` distributions of your package for you. It really is simple to create and distribute Python packages with `poetry` - go back and check out **Chapter 3: {ref}`03:How-to-package-a-Python`** for our recommended workflow, or check out the [poetry docs](https://python-poetry.org/docs/).

```{figure} images/python-packages.png
---
width: 80%
name: 04-python-packages
alt: Python packaging gamut. Modified after [The Packaging Gradient by Mahmoud Hashemi](https://www.youtube.com/watch?v=iLVNWfPWAC8).
---
Python packaging gamut. Modified after [The Packaging Gradient by Mahmoud Hashemi](https://www.youtube.com/watch?v=iLVNWfPWAC8).
```

### Installed packages

An installed package is a distribution that’s been decompressed, built (in the case of an `sdist`) and then copied to your chosen installation directory. The default "chosen installation directory" varies by platform and by how you installed Python. For example, I installed Python using the [miniconda](https://docs.conda.io/en/latest/miniconda.html) distribution and my default directory for package installation is `/Users/tbeuzen/miniconda3/lib/python3.7/site-packages`.

"Installing" a package (e.g., by `pip install XXX`) is really a two-step process: 1) building the package, and 2) installing the package. Using `wheels` takes out the first step, meaning we only need to install. The install step is simple, all it really has to do is copy decompressed package files to the appropriate directory. In fact, we can manually install a package ourselves if we want to by manually decompressing a `wheel` and copying the files to their appropriate locations - there's no real reason to do this because it's far more effort than using a single one-liner at the CL, it does not resolve dependencies so could break your installation, and probably has other unwanted side-effects. However, it's a nice way to learn about the package installation process, so if you'd like to give it a go, you can try the following steps (which are based on using the MacOS and [`conda` package manager](https://docs.conda.io/en/latest/)):

1. Create a new virtual environment to act as a safe, test playground. As a `conda` user, the CL command for me to create and then activate a new empty virtual environment called "manualpkg" including Python 3.7 is:

    ```bash
    conda create --name manualpkg python=3.7
    conda activate manualpkg;
    ```
2. You can find a toy `wheel` to download in the GitHub repository of this book [here](https://github.com/UBC-MDS/py-pkgs/blob/master/docs/toy-pkg/dist/toy_pkg-0.0.1-py3-none-any.whl) (although you can try this manual installation procedure with a `wheel` downloaded from any source, e.g., PyPI). Download the `wheel` into the `site-packages` directory of the `manualpkg` environment, which for me was located at `/opt/miniconda3/envs/manualpkg/lib/python3.7/site-packages`;
3. From the CL, change to that `site-packages` directory and unzip the wheel:

    ```{prompt} bash
    cd /opt/miniconda3/envs/manualpkg/lib/python3.7/site-packages
    unzip toy_pkg-0.0.1-py3-none-any.whl
    ```
4. You'll now find two new unzipped directories `toy_pkg` and `toy_pkg-0.0.1.dist-info`;
5. From the CL start a Python session by typing `python` and try the following: 

    ```python
    from toy_pkg.toy_module import test_function
    test_function()
    "You manually installed the toy_pkg example! Well done!"
    ```
6. You can remove the `conda` virtual environment if you wish with the following:

```{prompt} bash
conda deactivate
conda env remove -n manualpkg
```

### Imported packages (probably delete)

We now arrive at our last package state, the "imported package". This state is associated with a command that is familiar to everyone that uses Python:

```python
import somemodule
```

You can read about the import system in detail in the [Python documentation](https://docs.python.org/3/reference/import.html). Briefly, the `import` statement comprises two operations:

1. it searches for the named module; and,
2. then binds the results of that search to a name in the local namespace.

Note that for efficiency, each module is only imported once per interpreter session. If you modify your module, you can't just re-run your `import` statement (as that name in the namespace is already populated and won't be re-loaded). Instead, you have to restart your interpreter or force the import using `importlib.reload()`, but this is inefficient when working with multiple modules.

### Packaging Python applications

In this chapter we've only talked about packaging and distributing reusable Python code, a process which is really aimed at developers and audiences familiar with Python. While it's outside the scope of this book, it's also possible to package and distribute entire Python applications, that is, software that is meant to be used rather than developed on. Some good examples of Python-based applications are Sublime Text, EVE online, and Reddit. There are a lot of options available for packaging and distributing Python applications and we recommend watching the excellent talk by Mahmoud Hashemi ["The Packaging Gradient"](https://www.youtube.com/watch?v=iLVNWfPWAC8) to learn more. To give you an idea of the available options, the figure below shows a summary of the different options discussed by Mahmoud for packaging Python applications.

```{figure} images/python-applications.png
---
width: 80%
name: 04-python-applications
alt: Python application packaging gamut. Modified after [The Packaging Gradient by Mahmoud Hashemi](https://www.youtube.com/watch?v=iLVNWfPWAC8).
---
Python application packaging gamut. Modified after [The Packaging Gradient by Mahmoud Hashemi](https://www.youtube.com/watch?v=iLVNWfPWAC8).
```

## Packaging tools

- setuptools
- flit
- poetry

## OLD

### Modules and namespaces

Readers of this book should already know how to import Python modules and packages with the help of the `import` statement - but what's actually going on here? 

When we run the `import` statement, names in the module are bound to the current namespace making them accessible. So we firs tneed to talk about namesapces and objects.

While part of the beauty of Python is that it abstracts these lower-level details away from users who don't need to understand them (which forms a large portion of the Python user base), to gain a deeper understanding of Python packages, it's important to have a basic understanding of how Python actually imports and accesses the code within packages.

To that end, we first need to briefly talk about objects and namespaces in Python. Firstly, an "object" is Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. Examples of objects you might be familiar with are integers, functions, or modules (yes, even modules are objects in Python!). Recall that a module is a file with a `.py` suffix containing Python definitions and statements. Imagine we have a module `greetings.py` in our working directory containing functions to print "Hello World!" in English, German, and Spanish:

```python
def hello_world():
    print("Hello World!")


def hallo_welt():
    print("Hallo Welt!")


def hola_mundo():
    print("Hola Mundo!")
    
```

Let's go ahead and start an interactive Python interpreter to create and check the type of some objects. We'll import the `greetings.py` module, create an integer `a`, and define a function `hello_world`:

```{prompt} bash \$ auto
$ python
```

```{prompt} python >>> auto
>>> import greetings
>>> type(greetings)
```

```python
module
```

```{prompt} python >>> auto
>>> a = 1
>>> type(a)
```

```python
int
```

```{prompt} python >>> auto
>>> def hello_world(name):
        print(f"Hello world! My name is {name}.")
>>> type(hello_world)
```

```python
function
```

As you can see from the above code output, we just created three different objects: an integer, a function, and a module.

A "namespace" in Python is simply a mapping from names to objects, similar to a Python dictionary which maps names names to objects (in fact most namespaces in Python are currently implemented as dictionaries). In the example above, we've added the symbolic names "a", "hello_world", and "greetings" to the current namespace and can now use those names to refer to the objects we created.

Namespaces are created at different moments and have different lifetimes and there are often multiple namespaces existing at any given time while a Python program is running. The namespace you have access to is determined by the *scope* you are currently in. These details digress from the point we're trying to make here but you can read more about Python namespaces and scope in the [Python documentation](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces).

Python gives the programmer various tools for introspecting namespaces, one of which is the `dir()` function, which, when called with no arguments, returns a list of names currently defined:

```{prompt} python >>> auto
>>> dir()
```

```python
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'a', 'hello_world', 'greetings']
```

In the output above, we can see the names of the three objects we defined earlier: "a", "hello_world", and "greetings". The other names prefixed with double underscores are objects that were initialised automatically when we started the Python interpreter and are implementation details that aren't important to our discussion here, but can be read about in the [Python documentation](https://docs.python.org/3/reference/executionmodel.html?highlight=__builtins__#execution-model). For clarity moving forward, we'll ignore those names prefixed with double underscore using the following list comprehension:

```{prompt} python >>> auto
>>> [name for name in dir() if not name.startswith("__")]
```

```python
['a', 'greetings', 'hello_world']
```

Now that we're equipped with a fundamental understanding of objects and namespaces, let's further discuss the main object we're interested in for the purpose of this book: the "module" object (`greetings` in the above output). A module object is created when running the `import` statement and it has its own namespace populated by the Python definitions and statements within that module. These are attributes of the module object which can be accessed using dot notation, for example:

```{prompt} python >>> auto
>>> greetings.hello_world()
```

```python
"Hello World!"
```

We can view all the names a module defines by passing the module object in as an argument to the `dir()` function (we'll use list comprehension again to ignore all the "implementation details"):

```{prompt} python >>> auto
>>> [name for name in dir(greetings) if not name.startswith("__")]
```

```python
['hallo_welt', 'hello_world', 'hola_mundo']
```

There are the three functions defined in our `greetings.py` module.

An important point to make here is that we now have access to two `hello_world` functions in the current Python session; one that was defined in our interactive interpreter earlier, and one defined in the `greetings` module. While these functions have the same name,there is absolutely no relation between them because they exist in different namespaces. So we can access both with the appropriate syntax:

```{prompt} python >>> auto
>>> hello_world("Tom")
```

```python
"Hello world! My name is Tom."
```

```{prompt} python >>> auto
>>> greetings.hello_world()
```

```python
"Hello World!"
```





When developing Python packages, understanding namespaces is useful to understand how users can access your code and how you might want to develop your code in an intuitive and organised manner. When we import an entire module using the `import` statement, that module is run and then bound to a name in the current namespace. However, there are various other ways we can import modules into our session too, for example:

``````{list-table} A table title
:header-rows: 1
:widths: 10 20 20

* - Statement
  - Usage
  - Comments
  - Namespace
* - `import greeting_module`
  - greeting_module.hello_world()
  - Anything in the module is available via the `greeting_module` prefix.
  - `greeting_module`
* - `import greeting_module as greet`
  - greet.hello_world()
  - `greet` is an "alias" used to refer to the module.
  - `greet`
* - `from greeting_module import *`
  - hello_world()
  - mm
  - Everything in the module is imported to the current namespace.
* - `from greeting_module import hello_world`
  - hello_world()
  - mm
  - `hello_world`
``````

|Import statement|Usage|Comments|
|---|---|---|
|from greeting_module import *||Imports everything, can pollute the namespace|
|from greeting_module import hello_world|
|import greeting_module as greet|

In all the methods above, the module will be run regardless (we'll discuss this further in the next section), so there's no real performance gain to be had. For example, if we run from  so while not bound, it's still been cached. But what is of importance is how many names in the current namespace you'll be populating, and to use a method that helps you write succinct but interpretable code.

In the next section we'll talk more about the difference between modules and packages and what happens when you `import` a package rather than a single module.

When developing Python packages, understanding namespaces and how users might import your code is useful to understand how your code can be accessed. In the next section, we’ll talk more about the difference between modules vs packages, different ways of importing and how to control the way package code is imported.

>set of attributes that make up that module object also form a namespacea namespace implemented by a dictionary object (this is the dictionary referenced by the __globals__ attribute of functions defined in the module). Attribute references are translated to lookups in this dictionary, e.g., m.x is equivalent to m.__dict__["x"]. A module object does not contain the code object used to initialize the module (since it isn’t needed once the initialization is done). __dict__ is the module’s namespace as a dictionary object.

So a package is just a collection of modules. What's different about it? __path__ and __init__

We can of course import parts of this package in different ways.

But it doesnt really matter which you choose, it's a personal preference.

Now we know a bit more about packages!

---

Most readers will be aware that modules can be imported in different ways and this affects the kind of object created by Python and mapped to the current namespace.

example code? or table?

In all the methods above, the entire module will be run regardless of whether you import the entire thing or just one object from it, so there's no real performance gain to be had using one importing method over another. For example, if we run from  so while not bound, it's still been cached. But what is of importance is how many names in the current namespace you'll be populating, and to use a method that helps you write succinct but interpretable code.

When developing Python packages, understanding namespaces and how users might import your code is useful to understand how your code can be accessed. In the next section, we’ll talk more about the difference between modules vs packages, different ways of importing and how to control the way package code is imported.

Up until now, we've been talking about modules, but what about packages? Packages are just collections of modules. When we `import` a package, the object created by Python is still of type module. For example, let's import the `partpy` package we created in **Chapter 3: {ref}`03:How-to-package-a-Python`**:

```{note}
If you're following on from **Chapter 3: {ref}`03:How-to-package-a-Python`**, recall that we created and installed our `partypy` package in a virtual environment which can be activated by running `conda activate partypy` in the terminal.
```

```{prompt} python >>> auto
>>> import partypy
>>> type(partypy)
```

```python
module
```

So what differeniates a package from a single module apart from the file system level distinction of one file vs a collection of files?

- A package is a module with a __path__ attribute, what does that mean?
- Well when we import a module with the `import` statement, two things happen: 1. search, 2. binds
- the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations
- A package’s __path__ attribute is used during imports of its subpackages. Within the import machinery, it functions much the same as sys.path, i.e. providing a list of locations to search for modules during import. However, __path__ is typically much more constrained than sys.path.
- So how do we identify the __path__ attribute? We add an init.py
- So when does `import` result in a module object with a __path__ attribute (i.e., a package?). Well there are two types of packages. The first kind is a "regular package" and that's what 99% of the readers of this book are interested in. A regular package is typically implemented as a directory containing an __init__.py file. COnsider our partypy package from chapter 3:
- For the most part, a package is a directory with an __init__.py file which, when imported initializes with a __path__ attribute. COnsider our partypy module:

```
partypy
|-src
|  - simulate
|  - plotting
```

- In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package.
- For example, if we expect users of our package to simply run `import partypy` then we need to accommodate that in the `__init__.py` file:

### The difference between modules and packages

In this section we aim to answer three questions:
- how is a module different from a package?
- how can a packages be imported?
- how can we control import behaviour for our users?

- Module is this
- A package is a module with a __path__ attribute, what does that mean?
- Well when we import a module with the `import` statement, two things happen: 1. search, 2. binds
- the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations
- A package’s __path__ attribute is used during imports of its subpackages. Within the import machinery, it functions much the same as sys.path, i.e. providing a list of locations to search for modules during import. However, __path__ is typically much more constrained than sys.path.
- So how do we identify the __path__ attribute? We add an init.py
- So when does `import` result in a module object with a __path__ attribute (i.e., a package?). Well there are two types of packages. The first kind is a "regular package" and that's what 99% of the readers of this book are interested in. A regular package is typically implemented as a directory containing an __init__.py file. COnsider our partypy package from chapter 3:
- For the most part, a package is a directory with an __init__.py file which, when imported initializes with a __path__ attribute. COnsider our partypy module:

```
partypy
|-src
|  - simulate
|  - plotting
```

- In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package.
- For example, if we expect users of our package to simply run `import partypy` then we need to accommodate that in the `__init__.py` file:


- Show an example of how importing with dotted names still imports the modules, but it caches them, it doesn't bind them a name in the local namespace.

- how is a module different from a package?
- how can a package be imported?
- how can we control import behaviour for our users?

The import statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope. 
When we import a module in Python 

the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:

A package’s __path__ attribute is used during imports of its subpackages. Within the import machinery, it functions much the same as sys.path, i.e. providing a list of locations to search for modules during import. However, __path__ is typically much more constrained than sys.path.

How does the `__path__` attribute work?

So we have module objects, what is a package?


Modules are this:

Up until now, we've described packages as a "collection of modules". A more specific defintion is a package is a module with a `__path__` attribute. With that in mind all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module.

All modules have a name. Subpackage names are separated from their parent package name by a dot, akin to Python’s standard attribute access syntax. Thus you might have a module called sys and a package called email, which in turn has a subpackage called email.mime and a module within that subpackage called email.mime.text.

__init__.py https://stackoverflow.com/a/48804718

In [41]:
import numpy

In [42]:
numpy.__path__

['/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/numpy']

In [44]:
type(numpy)

module

In [45]:
type(numpy.random)

module

In [46]:
numpy.random.__path__

['/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/numpy/random']


```{tip}
A related concept here is the use of `if name == __main__` to allow your code to be imported as a module or run as a script form the command line. We'll discuss that more in Appendix A.
```

````{note}
As an example of how packages work:

>>> from partypy import simulate
>>> import sys
>>> sys.modules["partypy"].__path__ = "fake"
>>> from partypy import plotting
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'plotting' from 'partypy' (/Users/tomasbeuzen/GitHub/py-pkgs/partypy/src/partypy/__init__.py)
>>> sys.modules["partypy"].__path__ = ["/Users/tomasbeuzen/GitHub/py-pkgs/partypy/src/partypy"] # change back
>>> from partypy import plotting
````

To Python, a key difference between a module and a package in Python, is that packages are module objects that have a `__path__` attribute. This `__path__` attribute basically tells Python where to look when importing the contents of your package. For example, let's check that the `partypy` module object we just created does indeed have an attribute called `__path__`, but the `greetings` module we imported from `greetings.py` earlier does not:

A package’s `__path__` attribute is used during imports of its subpackages and functions much the same as `sys.path`, i.e. providing a list of locations to search for modules during `import`.

```{prompt} python >>> auto
>>> partypy.__path__
```

```python
['/Users/tomasbeuzen/GitHub/py-pkgs/partypy/src/partypy']
```

```{prompt} python >>> auto
>>> greetings.__path__
```

```python
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'greetings' has no attribute '__path__'
```

```{note}
Some readers may have noticed the first entry in the list output by `sys.path` is an empty string. This occurs when the interpreter is invoked interactively and tells Python to search modules in the current directory first.
```

If you're interested in learning more about Python's import system, we recommend checking out the official documentation.

#maybe don't need this section...

```{prompt} python >>> auto
>>> import sys
>>> sys.path
```

```python
['',
'/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python39.zip', '/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python3.9', '/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python3.9/lib-dynload', '/opt/homebrew/Caskroom/miniforge/base/envs/partypy/lib/python3.9/site-packages', '/Users/tomasbeuzen/GitHub/py-pkgs/partypy/src']
```

Without getting into too much detail about Python's import system, when you use the `import` statement to import a module Python has a number of paths it checks to see if the thing you want to import exists at. You can view your default paths using the `sys` module: