# Packaging and Files

One of my favorite subjects, because it's absolutely critical as soon as you:

* Work on more than one thing
* Share your work with anyone (even if not as a package)
* Work in more than one place
* Upgrade or change anything on your computer

Unfortunately, packing has a _lot_ of historical cruft, bad practices that have easy solutions today but are still propagated.

We will split our focus into two situations, then pull both ideas together.

## Installing a package

You will see two _very_ common recommendations:
    
```bash
pip install <package>         # Use only in virtual environment!
pip install --user <package>  # Never use
```

Don't use them unless you know exactly what you are doing! The first one will try to install globally, and if you don't have permission, will install to your user site packages (as of a recent pip update). In global site packages, you can get conflicting versions of libraries, you can't tell what you've installed for what, it's a mess. And user site packages are worse, because all installs of Python on your computer share it, so you might override and break things you didn't intend to.

The solution depends on what you are doing:

### Safe libraries

There are likely a few libraries that you just have to install globally. Go ahead, but be careful (and always use your system package manager instead if you can, like [`brew` on macOS](https://brew.sh) or the Windows ones - Linux package managers tend to be too old to use for Python libraries).

Ideas for safe libraries: the other libraries you see listed in this lesson! It's likely better than bootstrapping them.

### Application install

If you are installing an "application", that is, it has a script end-point and you don't expect to import it, use [pipx](https://pipxproject.github.io/pipx/). It will isolate it in a virtual environment, but hide all that for you, and then you'll just have an application you can use with no global/user side effects!

```bash
pip install pipx  # Easier to install like this
pipx install black
black myfile.py
```

Now you have "black", but nothing has changed in your global site packages!


### Environment tools

There are other tools we are about to talk about, like `virtualenv`, `poetry`, `pipenv`, etc. that you could also install this way (either directly or with `pipx`), and are _not too_ likely to interfere or break down. But keep it to a minimum.


## Environments

There are several environment systems available for Python, and they generally come in two categories. The Python Packaging Authority supports the PyPI (Python Packaging Index), and all the systems except on build on this (usually by pip somewhere). The lone exception is Conda, which has a completely separate set of packages (often but not always with matching names).

### Environment specification

All systems have an environment specification, something like this:

```
requests
rich >=9.8
```

This is technically a valid `requirements.txt` file. If you wanted to use it, you would do:


```bash
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt`
```
Use `deactivate` to "leave" the virtual environment.

These two tools (venv to isolate a virtual environment) and the requirements file let you set up non-interacting places to work for each project, and you can set up again anywhere.

### Locking an environment

But now you want to share your environment with someone else. But let's say `rich` updated and now something doesn't work. You have a working environment (until you update), but your friend does not, theirs installed broken (this just happened to me with `IPython` and `jedi`, by the way). How do you recover a working version without going back to your computer? With a lock file! This would look something like this:

```
requests ==2.25.1
rich ==9.8.0
typing-extensions ==3.7.4
...
```

This file lists all installed packages with exact versions, so now you can restore your environment if you need to. However, managing these by hand is not ideal and easy to forget. If you like this, `pipenv`, which was taken over by `PyPA` has a `Pipfile` and a `Pipfile.lock` which do exactly this, and combines the features of a virtual environment and pip. You can look into it off-line, but we are moving on. We'll encounter this idea again.

### Dev environments or Extras

Some environment tools have the idea of a "dev" environment, or optional components to the environment that you can ask for. Look for them wherever fine environments are made.

When you install a package via pip or any of the (non-locked) methods, you can also ask for "extras", though you have to know about them beforehand. For example, `pip install rich[jupyter]` will add some extra requirements for interacting with notebooks. *These add requirements only*, you can't change the package with an extra.

### Conda environments

If you use Conda, the environment file is called `environment.yaml`. The one we are using can be seen here:

<!-- %load environment.yml -->

```yaml
name: level-up-your-python
channels:
  - conda-forge
dependencies:
  - python =3.8
  - pip
  - rich >=9.8
  - pandas >=1.1
  - matplotlib >=3
  - jupyterlab >= 3.0
  - numba >=0.50.0
  - nb_conda_kernels
  - numpy >=1.19
  - jedi <0.18
  - xeus-python
  - pip:
    - jupyter-book>=0.9.1
```

## Packages

Now, let's change gears and look at packaging. If you want to make your code accessible to someone else to use via `pip` install, you need to make it a package. In fact, as you'll see at the end of this section, even if you just want to develop an application, it's much better to be working in a package. I won't show you the internals of setting up a setuptools package, but we'll just go over how you work with it and how it is distributed.

To install a local package, use:

```bash
pip install .
```

This will _copy_ the files into site-packages. If you want to actively develop a module, use this instead (setuptools only, command varies on other tools):

```bash
pip install -e .
```

If you want to produce an SDist for distributing the source, use


```bash
pip install build
python -m build --sdist
```

If you want to produce a wheel for distributing, use

```python
python -m build --wheel
```

You'll see old tutorials directly call `python setup.py ...`; if you can possibly avoid doing that, please do! The `setup.py` file is still a good idea for setuptools, but it's not even required there (and doesn't exist for any other packaging software).

## Wheel: fast and simple

A wheel is just a normal zipped file with the extension `.whl`. It contains folders that get copied to specific locations, and a metadata folder.

It _does not_ contain `setup.py`/`setup.cfg`/`pyproject.toml`.


Why use wheels?

* Secure installs - arbitrary code does not run
* Fast installs - files are just copied inplace
* Reliable - does not depend on pretty much anything being on user's machine, including setuptools!
* Faster first imports - pip makes .pyc files when it installs
* Can be tagged for Python version, OS, and/or architecture (supports binaries!).

See <https://pythonwheels.com>

## SDist: Source distribution

This is a `.tar.gz` file holding the files needed to make a wheel. It is often a subset of the files in the GitHub repo, though sometimes it contains generated files, like `version.py` or maybe Cython/SWIG generated source files. If there is no matching wheel (only for projects with binary components, in general), then pip gets the SDist and builds/installs manually.

## Poetry: A breath of fresh air


Let's look at an all-in-one solution: Poetry. It is a bit young, and somewhat opinionated (like all tools replacing a broken standard, it wants to stand out). There are some caveats:

* Should be pure python (no compiled extensions in your code)
* Should be PyPI based (no Conda integration AFAIK)
* Updates to `packaging` take a bit longer to get in (due to PyPA syncing releases with pip and not poetry)

### Step 1: make a new project

In [None]:
!poetry new tmp_project

In [None]:
ls tmp_project/

In [None]:
cat tmp_project/pyproject.toml

The following commands I'll demo in a shell, if I have time.

```bash
# Create a virtual environment, start the poetry.lock file
poetry install

# "Enter" the environment
poetry shell

# Run without entering the environment
poetry run ...

# Add a new package (--dev to make it development only)
# Modifies your pyproject.toml
poetry add rich

# Update the environment and lock files
poetry update

# You can use python -m build, or you can use poetry build
# You can publish to PyPI with poetry publish
# And that's package + environment management!
```

When you publish your package, it makes completely normal wheels, so `pip install` works exactly as expected.

New developers can start developing right away by getting your repository and running `poetry install`. They _even get the dev dependencies_ by default! (which was a brilliant choice, IMO). They start with the lock file if it exists, so they always get what you have, and anyone can run `poetry update` if needed.

## Setuptools: Classic, powerful, verbose

The most powerful (and originally, forced by pip) tool is setuptools. This is a collection of hacks built on top of distutils, which is a collections of hacks to build packages (which was the standard library tool that is now deprecated and may be removed in Python 3.12). There are some awful examples around on using it, so look at <https://scikit-hep.org/developer> for a proper example.

The short version:

* Use declarative `setup.cfg` for everything you can
     - Use file: to read files
     - Always use find: for packages - include or exclude if you need to
     - Always set `python_requires`!
* Logic goes in `setup.py`; often it's just `from setuptools import setup; setup()`
     - Binary extensions go here too
* Always include a `pyproject.toml`, often it's just 5 or so lines
* Check your `MANIFEST.in` to make sure it's not missing things going into the SDist