Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for data_files #890

Closed
2 tasks done
delphyne opened this issue Feb 12, 2019 · 38 comments
Closed
2 tasks done

Support for data_files #890

delphyne opened this issue Feb 12, 2019 · 38 comments
Labels
kind/feature Feature requests/implementations

Comments

@delphyne
Copy link

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

Poetry does not current support the setup(data_files:[]) element which allows you to include datafiles which live outside of the package files area. This functionality is generally used for shipping non-code files which might be necessary for your library to run, or for other libraries to build. Examples include protobuf .proto files, avro schemas, thrift idl, etc.

delphyne added a commit to csdisco/poetry that referenced this issue Feb 19, 2019
delphyne added a commit to csdisco/poetry that referenced this issue Feb 19, 2019
@bersace
Copy link

bersace commented Apr 16, 2019

I used data_files to ship systemd unit file. This is a very important feature !

@nickpresta nickpresta mentioned this issue May 8, 2019
2 tasks
@brycedrennan brycedrennan added the kind/feature Feature requests/implementations label Aug 15, 2019
@stale
Copy link

stale bot commented Nov 13, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 13, 2019
@bersace
Copy link

bersace commented Nov 13, 2019

@sdispater I guess this would wait for post 1.0, right ?

@stale stale bot removed the stale label Nov 13, 2019
@jrabbit
Copy link

jrabbit commented Dec 10, 2019

This is critical for a (basic) gui app

@michaelaye
Copy link

Do I understand correctly, that data_files that live next to the package modules are supported?

So, this layout should work and config_data.csv will be packaged?

pkgname/
  pyproject.toml
  src/
    pckname/
      __init__.py
      data_files/
        config_data.csv

@dustinlacewell
Copy link

Every time I jump head first into a new tool, I smash my face into the bottom of the pool.

@abn
Copy link
Member

abn commented Apr 12, 2020

This should now be covered by https://python-poetry.org/docs/pyproject/#include-and-exclude. If this is not the case, please feel free to comment here or open a new issue with the specific scenario not covered.

@abn abn closed this as completed Apr 12, 2020
@jrabbit
Copy link

jrabbit commented Apr 13, 2020

that doesn't let you specify where they should go? how are users supposed to install a .desktop for DE integration?

@kalfa
Copy link

kalfa commented May 29, 2020

There are at least two use cases:

  1. https://docs.python.org/2/distutils/setupscript.html#installing-package-data
  2. https://docs.python.org/2/distutils/setupscript.html#installing-additional-files

AIUI include/exclude mechanism do not match either, they just add it to the package

Now, substantially, if my package is going to be installed in
/some/path/lib/python3.6/site-packages/ then those files are going to be installed directly into such directory

Those two use cases specify something as much required to be able to move from setuptools, as not implemented yet in poetry.

note: package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path, pacakge_data is package oriented (as make sense to be, managing packaging). also the empty package '' use case is extremely important:
{'': ['assets/*']} is extremely expressive for whom has lots of files in lots of packages (which can be added and removed) and with include would need to explicitely list them all.

@thejohnfreeman
Copy link
Contributor

@kalfa, as you later realized, include/exclude matches exactly the first use case you linked, package data. Can you provide a concrete example of package data that is easy to do with setuptools but difficult with Poetry?

@kalfa
Copy link

kalfa commented May 29, 2020

Package data is less expressive with include/exclude and more difficult to read.
Overall it is possible to achieve most if not all use cases.

Setup tools approach is more compact and readable

  • Install the directory asset in each package.
  • install the directory foo in package X
'':'assets/*',
'X':'foo/*'

With poetry I have to specify a list of more obscure patterns. But for simple enough projects, is good enough. As you said, i understood later the potentiality.

What is missing is the other use case, which this ticket is about, and has been closed and IMHO should be reopened

I'm porting setup.py files to pyproject.toml and trying to build the same wheel. Happy to find out I'm wrong and it's possible

@jrabbit
Copy link

jrabbit commented May 29, 2020 via email

@mtkennerly
Copy link

"data_files" are delivered relative to sys.prefix, whereas "package_data" is delivered to site-packages. I don't think it's possible to deliver files relative to sys.prefix using Poetry's include/exclude options.

@pawamoy
Copy link

pawamoy commented Sep 22, 2020

Another use-case is the distribution of man pages with the package.

@Ezhvsalate
Copy link

I tried to move a unix console app following FHS from setuptools to poetry but was stuck in this issue and looks like will have to rollback :(

@kalfa in the post above you wrote that

package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

@kalfa
Copy link

kalfa commented Nov 10, 2020

@Ezhvsalate

@kalfa in the post above you wrote that

package_data can be implemented with include

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

I don't think you can yet (unless in the meantime I wrote my comments it's been implemented and I'm unaware of it).

Only package_data has a way in poetry.

What you mentioned is the same use case of desktop files & co.

@Ezhvsalate
Copy link

@kalfa thank you, got it.

@abn Is there any chance for the feature to be implemented? Maybe there is some way to reopen it? Found also a pull request #901 with implementation but it's also closed.

@sinoroc
Copy link

sinoroc commented Nov 10, 2020

From my point of view (and many others) using data_files (the one from setuptools) is a bad practice. And I would venture that it is why it is not supported in poetry.

The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting. Things like: data_files=[('/etc/myapp/', ['myapp.conf'])] get a "no" from me. For one, it would mean pip-installing with sudo, which is also a "no" (there are way too many issues coming with that).

It is true, that there is a need for such things, in particular for applications. But from what I understood Python's packaging ecosystem was initially built with libraries in mind, applications were (and still very much are) some kind of second class citizens in a sense. So packaging applications in order to distribute them on PyPI is still very awkward. There are many other issues showing this divide between libraries and applications all around the Python packaging ecosystem in general, poetry included (but not poetry's fault in any sense, as far as I have seen).

My usual recommendation when something like data_files is needed is to go beyond the standard/common Python packaging techniques and reach for the packaging techniques specific to the operating systems. So for example, for Linux I would recommend looking into packaging your applications with apt/.deb, yum/.rpm, pacman, appimage, snap, etc. Give pyinstaller, or beewares's briefcase or other similar tools a look. Those would probably give you a much better experience for such things.

The setuptools package_data on the other hand, is perfectly fine and encouraged. It only results in files written in the venv/lib/site-packages/mylibrary directory of your own package for the environment. So for poetry, as it was already mentioned, use include and exclude. More often than not, those are sufficient, no need to write files to random places on the file system. Also remember to use importlib.resources to read those files, never rely on paths relative to __file__.

I will also add that for things such as configuration files, user data files, cache files, etc. you should have a look at platformdirs.

So, in short:
If you need data_files, think twice. If you really need data_files, I would recommend you to rely on something more than just the common Python packaging tools. Go for pyinstaller, or briefcase, etc. or for more heavy duty tools (apt, yum, pacman, etc.). Because you want OS-specific or Linux distro-specific things anyway. More generally if you want to distribute applications, you might want to look beyond distributing as sdist and wheels, those are not really made for applications.

@thejohnfreeman
Copy link
Contributor

thejohnfreeman commented Nov 10, 2020

To add onto @sinoroc's excellent explanation, remember that Python is a cross-platform language. People install Python software on Windows, and if you publish an application on PyPI, Windows users might expect that it will work on their system. If you install platform-specific files (e.g. /etc/whatever), then you might need a platform-specific installer.

@thomassross
Copy link

@sinoroc From what I see in this issue, it looks like the reason data_files is not supported in Poetry currently is that the maintainers do not see the use case for it. It is certainly true that use of data_files should be minimized (libraries almost never need it), but applications in many cases have no other option to bundle assets properly.

The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting.

Even without data_files, this is the case. Arbitrary code is executed whenever you pip install a package; pip installing a package means you trust that package.

Additionally, data_files does not require that you specify absolute paths for your files to be installed into (in fact, it's discouraged). Relative paths work (e.g. ('share/applications', 'xyz.desktop')), and the files will be installed relative to either sys.prefix or site.USER_BASE.

Your recommendation for using tools like pyinstaller, briefcase, Debian packages, etc. isn't really possible for application developers in a lot of cases. If, for example, an application wanted to support only Linux, there are still a lot of different kinds of package formats that the application developer would have to support. For that reason, distribution-specific package formats are usually created and maintained specifically for those distributions by someone on behalf of the distribution, rather than the maintainer of the application. Also, many of these formats take advantage of using the application's setuptools setup to install data files (see for example pybuild).

include is not a replacement for data_files in many cases, as other users have mentioned here (application desktop files, systemd unit files, man pages, etc).

@thejohnfreeman To address your concern of Linux-only applications on PyPI, it is not a requirement that Poetry packages are published to PyPI. A lot of applications won't be. Also, PyPI has classifiers to mark applications as supporting only Linux. Users should not be blindly installing applications from PyPI --- that is a recipe for disaster.

@sinoroc
Copy link

sinoroc commented Nov 10, 2020

@thomassross I totally understand your point of view. But my point still stands: I do not believe data_files is a good practice for the common use cases. And as far as I understood, one of the big drivers for the development of poetry is to enforce good practices.

There are obviously very legitimate use cases where data_files are helpful and a good solution. For example if the project is only used in controlled environment for private usage, then I have nothing against using data_files.

So I would side on not adding support for data_files in poetry, and I would absolutely encourage a plugin that adds this feature (plugin system is scheduled for v1.2).

@lpsinger
Copy link

I am interested in data_files support for exactly the same reason as @bersace gave above (#890 (comment)):

I used data_files to ship systemd unit file. This is a very important feature !

Neither package_data nor include/exclude work for this case.

Was this closed because there has been no work on it? Or was it closed because a PR to add this feature would be rejected?

@N-Coder
Copy link

N-Coder commented Apr 5, 2022

It seems that data_file support is pretty much needed for packaging anything that works with Jupyter, see here and here.
flit has added support for a simplified and constrained version of data_files, which might also make it to replace the deprecated data_file functionality in setuptools. Thanks to this new feature, there is also some work on enabling flit for packaging with Jupyter extensions. Having the same possibility for poetry would be very nice.

@kalfa
Copy link

kalfa commented Apr 5, 2022

@N-Coder , can you open another bug explicitly about jupyther and mentioning this bug please?

This bug is now closed, but I think it is still worth it underlining data_file-equivalent feature is still missing (unless it has been added meanwhile, which would be great and a new ticket would still a win if we learned that)

@N-Coder
Copy link

N-Coder commented Apr 5, 2022

I guess #4013 describes the issue from the Jupyter side or would you want a feature request for replicating the external-data functionality from flit?

@ofek
Copy link
Contributor

ofek commented May 6, 2022

Hello! I'm trying to assess how this feature would be used generally.

Hatchling supports a shared-data option for wheels. Would that satisfy everyone's use case here?

@noctuid
Copy link

noctuid commented May 6, 2022

In my case, I need to install a manpage and a zsh completion file, so for me, yes.

@WillDaSilva
Copy link

Hatchling supports a shared-data option for wheels. Would that satisfy everyone's use case here?

@ofek the link you provided results in a 404 error. I think the following links to what you intended: https://hatch.pypa.io/latest/plugins/builder/#options

@ofek
Copy link
Contributor

ofek commented May 11, 2022

Thanks! Hatch was adopted by the PyPA so the docs site was moved.

@spoorn
Copy link
Contributor

spoorn commented May 29, 2022

I created a poetry plugin that adds support for data_files in pyproject.toml: https://github.com/spoorn/poeblix, https://pypi.org/project/poeblix/

@N-Coder
Copy link

N-Coder commented Jun 1, 2022

@spoorn did you get a jupyter plugin packaged with poetry working with your plugin. Do you have an example?

@droserasprout
Copy link

@spoorn, it's an incredible amount of work you've done! I think you should ask maintainers to mention this plugin in docs somehow.

@spoorn
Copy link
Contributor

spoorn commented Jun 1, 2022

@N-Coder I got this working with nbconvert template files. Example: https://github.com/spoorn/poeblix/blob/main/test/positive_cases/happy_case_example/pyproject.toml

@droserasprout Thanks! Up to the poetry maintainers

apyrgio added a commit to freedomofpress/dangerzone that referenced this issue Aug 31, 2023
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: python-poetry/poetry#890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
apyrgio added a commit to freedomofpress/dangerzone that referenced this issue Aug 31, 2023
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: python-poetry/poetry#890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
apyrgio added a commit to freedomofpress/dangerzone that referenced this issue Aug 31, 2023
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: python-poetry/poetry#890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
apyrgio added a commit to freedomofpress/dangerzone that referenced this issue Sep 20, 2023
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: python-poetry/poetry#890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Feature requests/implementations
Projects
None yet
Development

No branches or pull requests