Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Put both sdist and bdist package data configuration in setup.cfg #2790

Open
1 task done
mcarans opened this issue Sep 13, 2021 · 11 comments
Open
1 task done

[FR] Put both sdist and bdist package data configuration in setup.cfg #2790

mcarans opened this issue Sep 13, 2021 · 11 comments

Comments

@mcarans
Copy link

mcarans commented Sep 13, 2021

What's the problem this feature will solve?

What data files will end up in sdist or bdist is a little confusing as it is not configured in one place. This stackoverflow answer summarises:

"Basically it goes like this:

MANIFEST.in adds files to sdist (source distribution).

include_package_data adds these same files to bdist (built distribution), i.e. it extends the effect of MANIFEST.in to bdist.

exclude_package_data prevents files in sdist to be added to bdist, i.e. it filters the effect of include_package_data.

package_data adds files to bdist, i.e. it adds build artifacts (typically the products of custom build steps) to your bdist and has of course no effect on sdist."

Describe the solution you'd like

My suggestion is to remove the need for MANIFEST.in by putting the configuration into setup.cfg. Maybe it could go under a new key under [options.package_data] or maybe it would be good to move towards something new like have [build_distribution] and [source_distribution] sections to make it clearer what is going where.

Just throwing out some ideas, but the main aim is to put all configuration in setup.cfg and make it obvious what ends up in the source distribution and what in the build one.

Alternative Solutions

No response

Additional context

No response

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@mcarans mcarans added enhancement Needs Triage Issues that need to be evaluated for severity and status. labels Sep 13, 2021
@jaraco
Copy link
Member

jaraco commented Sep 15, 2021

The behavior is heavily dependent on the behavior from distutils, which is currently in the process of being adopted by Setuptools. This request is maybe something we can consider once the two implementations have merged.

In the meantime, you may want to consider using setuptools_scm. Including setuptools_scm obviates the need for a MANIFEST.in (for git-based projects) by automatically including all repo files in the sdist. Thereafter, files to be included in the build/install are configured in setup.cfg through directives like package_data. That's the approach I use and I haven't had to fuss with MANIFEST.in for years. It's not for everybody, but it does substantially simplify and reduce toil (derive from SCM metadata rather than repeat it).

@jaraco jaraco added deferred and removed Needs Triage Issues that need to be evaluated for severity and status. labels Sep 15, 2021
@abravalheri
Copy link
Contributor

Hi @mcarans, considering the latest developments and bug fixes in setuptools, this experiment shows that package_data does affect both sdist and wheel (the previous behaviour was subject to a bug fixed in #2844.

Therefore currently it is possible to control everything from setup.cfg
(although I do agree with Jason that setuptools-scm makes everything much easier to deal with).

Does this solution works for you?

@mcarans
Copy link
Author

mcarans commented Jan 25, 2022

@abravalheri I have to admit that having examined what is included in my wheel, I found that the files I had listed in MANIFEST.in aren't there after all even though I have include_package_data = True.

I've realised though that my problem is more fundamental: I am not sure what should be included in the sdist vs what should be included in a wheel and don't know what gets included in each by default. Having searched the web for clear answers, I have not found anything and tried posting a question on StackOverflow, but it was closed as "opinion-based". I have now posted it as a discussion in this repo: #3051

I can then set up my project accordingly and test that package_data does what is expected.

@abravalheri
Copy link
Contributor

I found that the files I had listed in MANIFEST.in aren't there after all even though I have include_package_data = True.

Hi @mcarans, I had a look in your project, tried to build it locally and it seems that both MANIFEST.in and include_package_data = True are working as expected.

I noticed that you have in your MANIFEST.in:

recursive-include src *.txt
recursive-include src *.yml

So I did:

# Finding out which files would be included in the wheel:
% fd '\.(yml|txt)' src
src/hdx/api/hdx_base_configuration.yml
src/hdx/data/hdx_datasource_topline.yml
src/hdx/data/indicator_resource_view_template.yml

# Find out which files were actually included in the wheel:
% unzip -l dist/hdx_python_api-5.5.2-py2.py3-none-any.whl | grep yml
     2003  2022-01-26 16:32   hdx/api/hdx_base_configuration.yml
      366  2022-01-26 16:32   hdx/data/hdx_datasource_topline.yml
     2828  2022-01-26 16:32   hdx/data/indicator_resource_view_template.yml

and I noticed that all the files that are supposed to be in the wheel, are indeed in the wheel.

Please note that package_data, include_package_data and exclude_package_data consider files insider your package directory, i.e. files under src/hdx. Arbitrary files outside the package are not very well supported in the Python package ecosystem, because they usually don't work as developers would expect (so I don't even recommend them).

By default the folders docs, tests and workingexample will not be included in the wheel only on the sdist, you can do some gymnastics to force them to be included, but I find that it is not worthy...

I will try to share my opinion with you about the files in the discussion 😄

@mcarans
Copy link
Author

mcarans commented Jan 26, 2022

@abravalheri That is extremely helpful. The missing piece of knowledge you've filled for me here is that things outside the package directory (in my case src/hdx) will (intentionally) not be included in the wheel package even though they are specified in the MANIFEST.in. (They will be included in the sdist.)

I will now test package_data but from what you've said, presumably it cannot include in the sdist things outside of src/hdx?

@abravalheri
Copy link
Contributor

Yeah, package_data will not help you with files outside of the package directory...

Think about the following:

  • If you place your files inside the package directory you can use importlib.resources to locate them at runtime and read their contents.
  • If you place your files outside the package directory how can you access them later? Each operating system can place them in a different location. If your files are installed in a virtualenv, it will be a different location. If your files are installed with --user, it would be yet a different location...

Moreover, let's say another package has a workingexample folder, what happens when your package and the other package are installed at the same time? Which package will overwrite the workingexample folder?

Those are just some reasons why I don't recommend relying on files outside the package directory.

@abravalheri
Copy link
Contributor

@abravalheri That is extremely helpful. The missing piece of knowledge you've filled for me here is that things outside the package directory (in my case src/hdx) will (intentionally) not be included in the wheel package even though they are specified in the MANIFEST.in. (They will be included in the sdist.)

This information is supposed to be available in https://setuptools.pypa.io/en/latest/userguide/datafiles.html. Please feel free to submit any improvement suggestions to the docs that you think would make them easier to understand 😄

@mcarans
Copy link
Author

mcarans commented Jan 26, 2022

@abravalheri That is extremely helpful. The missing piece of knowledge you've filled for me here is that things outside the package directory (in my case src/hdx) will (intentionally) not be included in the wheel package even though they are specified in the MANIFEST.in. (They will be included in the sdist.)

This information is supposed to be available in https://setuptools.pypa.io/en/latest/userguide/datafiles.html. Please feel free to submit any improvement suggestions to the docs that you think would make them easier to understand smile

Ok, will do

The issue is then that to have an sdist containing the recommended files like tests etc. requires having a MANIFEST.in (unless using setuptools_scm). It would be great if there were a flag in setup.cfg to say include all in sdist (as setuptools_scm does) and/or a way to specify everything in MANIFEST.in in setup.cfg. Given what you've said should be included in an sdist and what setuptools_scm does, I'd say the flag could be enough. (Meanwhile, I do plan to migrate to setuptools_scm)

(Also I guess one day everything in setup.cfg could be in pyproject.toml simplifying things still further, but that's a separate issue.)

EDIT: I've just tested and it appears to me that by default the sdist does contain everything. I did not have to specify in package_data the YAML files in the src/hdx folder for example for them to be present in the sdist ie. if I don't specify anything (not MANIFEST.in or package_data), the YAML files in src/hdx, test folder, docs, workingexample are in the sdist but not the wheel. (I built the sdist with pypa build as with pyproject.toml, I'm not sure how to do it with setuptools)

@abravalheri
Copy link
Contributor

@mcarans, just to make sure I would recommend you doing a rm -rf build dist src/hdx_python_api.egg-info before trying again.

@abravalheri
Copy link
Contributor

abravalheri commented Jan 27, 2022

The issue is then that to have an sdist containing the recommended files like tests etc. requires having a MANIFEST.in (unless using setuptools_scm). It would be great if there were a flag in setup.cfg to say include all in sdist (as setuptools_scm does) and/or a way to specify everything in MANIFEST.in in setup.cfg. Given what you've said should be included in an sdist and what setuptools_scm does, I'd say the flag could be enough. (Meanwhile, I do plan to migrate to setuptools_scm)

Please note that I was describing my personal approach (and I also recommended using setuptools-scm for that). There are people in the community that may argue that you should only include the files your package need to run.

Regarding your new proposal: it is not trivial to implement a "include all" flag. There are lot of things happening in the developer's machine that create transient files. That is why git needs .gitignore and that is why VCS plugins to setuptools exist.

What would be the main value added to the developers if this feature is implemented? Please consider that:

  • ((include|exclude)_)?package_data are now working as expected, so they cover files inside the package directory (which are the ones you can really rely upon).
  • for simple pure-Python projects the default implementation should work fine
  • MANIFEST.in files have a special syntax that can be hightlighted in code editors and checked using check-manifest.
  • if the developer don't want to maintain separate MANIFEST.in files, there are VCS plugins such as setuptools-scm and setuptools-svn that can be used.
  • Without a MANIFEST.in file or a VCS plugin everything would be added to the distribution, including temporary files generated by your text editor or IDE, __pycache__, *.pyc, coverage files, etc...

PS: There is a nice article about this topic.

@mcarans
Copy link
Author

mcarans commented Jan 27, 2022

@abravalheri Thanks for your helpful reply. I was caught out by failing to manually clean before building (not for the first time :-( ), something I hope will be resolved soon by the tools doing the cleaning (although it seems there is debate about whether it is pypa build, wheel or setuptools that should do the cleaning: pypa/build#206 pypa/wheel#147 (comment))

Having tested again and cleaned, I can confirm that package_data works as you had outlined with sdist and wheel. I am sorry for the confusion.

I see the issue with working out what should be included by an include all flag that I had suggested. Scratch that idea, but I think that it would be nice to be able to specify what's in MANIFEST.in in setup.cfg somehow, although I understand that it might not provide the granularity that the syntax of MANIFEST.in provides. I could imagine just specifying paths to folders (like docs and tests) and/or files (like pyproject.toml and LICENSE), given that package_data covers additional files that are needed in src/hdx.

However, I do intend to use setuptools_scm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants