Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

So many ways to include data files goes against PEP20 #1865

Closed
chsasank opened this issue Oct 8, 2019 · 1 comment
Closed

So many ways to include data files goes against PEP20 #1865

chsasank opened this issue Oct 8, 2019 · 1 comment

Comments

@chsasank
Copy link

chsasank commented Oct 8, 2019

There are at least three ways to include data files using setup:

  1. package_data
  2. include_package_data=True and Manifest.in
  3. data_files

This goes against zen of python (PEP20), in particular this line:

There should be one-- and preferably only one --obvious way to do it.

Can we please deprecate (with warnings and in documentation) all these options but one? It'll make this situation whole less complicated for newbies for python packaging like me. I'm not alone in this confusion. See the following discussions:

  1. https://stackoverflow.com/q/7522250
  2. https://stackoverflow.com/q/24347450
  3. neither package_data or data_files get installed sampleproject#30
@jaraco
Copy link
Member

jaraco commented Oct 8, 2019

Unfortunately, all three of these are needed as they serve different use-cases.

The technique I use most often is 2(b), include_package_data=True and setuptools_scm for finding files (instead of MANIFEST.in). This technique that I use is the least involved, as it automatically includes any files that are present and registered with the source code.

One step less automated is to use include_package_data, which means include any data files that are known to be part of the source (such as through MANIFEST.in).

Another step less automated (more manual and explicit) is to explicitly specify which globs should be included in which packages).

Even more explicit would be to use data_files and disregard altogether the Python package structure when considering package data. In some ways, this technique is discouraged.

In effect, these techniques are all increasing sophistication built on the other and should be left available for a user to use as appropriate. In the same way that Python builds on C or the urllib module builds on the socket module, these different hooks have value.

I do agree, there should probably be one preferred, recommended way to include package data, and that's probably the simplest, most sophisticated form include_package_data (and either MANIFEST.in or using file-finders). The place to advertise that functionality is in the python packaging users guide.

If, however, you can show that one of these techniques obviates the other and one can be eliminated, that would be great. I think you'll find, however, that there's usefulness in each one, and you'll get the most traction working on the packaging user's guide to provide clarity.

@jaraco jaraco closed this as completed Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants