# Data files

Sometimes, besides the source code, some additional information is required for packages to work. For example, in my practice I was preparing an interface for interaction with a machine learning model, but the model itself was saved in model.pkl and this file had to be somehow delivered to the final build of the model.

## Default behaviour

By default, pip doesn't save any files other than `*.py` files during model building. It's shown in the following example.

The following cell contains a script that mimics a Python module. It saves files in the `my_files` folder with `.py` and `.txt` extensions, so `.py` files mimic module source code, but `.txt` files mimic data files.

In [1]:
%%bash
mkdir data_files
cd data_files
mkdir -p src/my_files
touch src/my_files/data_file{1..5}.txt
touch src/my_files/script_file{1..5}.py
tree

[01;34m.[0m
└── [01;34msrc[0m
    └── [01;34mmy_files[0m
        ├── data_file1.txt
        ├── data_file2.txt
        ├── data_file3.txt
        ├── data_file4.txt
        ├── data_file5.txt
        ├── script_file1.py
        ├── script_file2.py
        ├── script_file3.py
        ├── script_file4.py
        └── script_file5.py

2 directories, 10 files


Now let's add minimal `pyproject.toml`.

In [2]:
%%writefile data_files/pyproject.toml
[project]
name = "toy_package"
version = "0.0.0"

Writing data_files/pyproject.toml


And finally instalation of the library.

In [3]:
%%bash
python3 -m venv venv
source venv/bin/activate
cd data_files
pip3 install .

Processing /home/fedor/Documents/knowledge/python/advanced/build_package/pyproject_toml/data_files
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: toy_package
  Building wheel for toy_package (pyproject.toml): started
  Building wheel for toy_package (pyproject.toml): finished with status 'done'
  Created wheel for toy_package: filename=toy_package-0.0.0-py3-none-any.whl size=1674 sha256=8b222d6da4cbf1d604459fb9ea3c5537a452cf8d2757ca0ad75d032e6024ea51
  Stored in directory: /tmp/pip-ephem-wheel-cache-316chubl/wheels/cc/cb/f7/95fa019d083d40b603f83331a55267702f34437865bcdd8f3e
Successfully built toy_package
Installing collected packages: toy_package
Successfully installed

In [4]:
%%bash
cd venv/lib
lib_path=$(find . -type d -name site-packages)
ls $lib_path/my_files

__pycache__
script_file1.py
script_file2.py
script_file3.py
script_file4.py
script_file5.py


## Add other files

In [56]:
%%writefile pyproject_toml_files/data_files/pyproject.toml
[project]
name = "toy_package"
version = "0.0.0"

[tool.setuptools]
include-package-data = true

[tool.setuptools.packages.find]
where = ["src"]

Overwriting pyproject_toml_files/data_files/pyproject.toml


In [57]:
%%writefile pyproject_toml_files/data_files/MANIFEST.in
include src/my_files/*.txt

Writing pyproject_toml_files/data_files/MANIFEST.in


In [5]:
rm -r data_files venv