# Data files

Sometimes, besides the source code, some additional information is required for packages to work. For example, in my practice I was preparing an interface for interaction with a machine learning model, but the model itself was saved in model.pkl and this file had to be somehow delivered to the final build of the model.

## Default behaviour

By default, pip doesn't save any files other than `*.py` files during model building. It's shown in the following example.

The following cell contains a script that mimics a Python module. It saves files in the `my_files` folder with `.py` and `.txt` extensions, so `.py` files mimic module source code, but `.txt` files mimic data files.

In [1]:
%%bash
mkdir data_files
cd data_files
mkdir -p src/my_files
touch src/my_files/data_file{1..5}.txt
touch src/my_files/script_file{1..5}.py
tree

[01;34m.[0m
└── [01;34msrc[0m
    └── [01;34mmy_files[0m
        ├── data_file1.txt
        ├── data_file2.txt
        ├── data_file3.txt
        ├── data_file4.txt
        ├── data_file5.txt
        ├── script_file1.py
        ├── script_file2.py
        ├── script_file3.py
        ├── script_file4.py
        └── script_file5.py

2 directories, 10 files


Now let's add minimal `pyproject.toml`.

In [2]:
%%writefile data_files/pyproject.toml
[project]
name = "toy_package"
version = "0.0.0"

Writing data_files/pyproject.toml


And finally instalation of the library.

In [3]:
%%bash
python3 -m venv venv
source venv/bin/activate
cd data_files
pip3 install .

Processing /home/fedor/Documents/knowledge/python/advanced/build_package/pyproject_toml/data_files
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: toy_package
  Building wheel for toy_package (pyproject.toml): started
  Building wheel for toy_package (pyproject.toml): finished with status 'done'
  Created wheel for toy_package: filename=toy_package-0.0.0-py3-none-any.whl size=1674 sha256=c7163e28f727f9d4d086299ebb6fbdfaaa68d12d8c9da784f0b52eba8ef2b1e0
  Stored in directory: /tmp/pip-ephem-wheel-cache-bzp_zlkb/wheels/cc/cb/f7/95fa019d083d40b603f83331a55267702f34437865bcdd8f3e
Successfully built toy_package
Installing collected packages: toy_package
Successfully installed

Ok lets check what installing process added to the libraries.

In [4]:
%%bash
cd venv/lib
# the names of the folders in venv/lib depend on the
# on the python version. But inside the
# folder should be site-packages, which
# which contains files of installed packages.
# So the next line allows you to get the path to `site-packages`
# regardless of the installed Python version
lib_path=$(find . -type d -name site-packages)
ls $lib_path/my_files

__pycache__
script_file1.py
script_file2.py
script_file3.py
script_file4.py
script_file5.py


As you can see there is only `.py` files and no `.txt` files.

## Add other files

To add files other than `.py` you need to make changes to pyproject.toml and attach `MANIFEST.in` which will specify which files should be added.

So `pyproject.toml` should have `[tool.setuptools]` and `[tool.setuptools.packages.find]` as in the following cell:

In [5]:
%%writefile data_files/pyproject.toml
[project]
name = "toy_package"
version = "0.0.0"

[tool.setuptools]
include-package-data = true

[tool.setuptools.packages.find]
where = ["src"]

Overwriting data_files/pyproject.toml


In `MANIFEST.in`, after the `include` keyword, add patterns for files to be included.

In [6]:
%%writefile data_files/MANIFEST.in
include src/my_files/*.txt

Writing data_files/MANIFEST.in


Let's try what we've got. Reinstall our library and check files that appered in the environment.

In [7]:
%%bash
source venv/bin/activate
cd data_files
pip3 install . &> /dev/null

cd ..
cd venv/lib
# the names of the folders in venv/lib depend on the
# on the python version. But inside the
# folder should be site-packages, which
# which contains files of installed packages.
# So the next line allows you to get the path to `site-packages`
# regardless of the installed Python version
lib_path=$(find . -type d -name site-packages)
ls $lib_path/my_files

data_file1.txt
data_file2.txt
data_file3.txt
data_file4.txt
data_file5.txt
__pycache__
script_file1.py
script_file2.py
script_file3.py
script_file4.py
script_file5.py


Now data files (`.txt` files) added to the package deployment as well.

## Delete trash

Don't forget to delete the folders you've been playing in.

In [8]:
rm -r data_files venv