Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include more than one license file or other arbitrary files in metadata #138

Closed
agronholm opened this issue Apr 14, 2015 · 24 comments
Closed

Comments

@agronholm
Copy link
Contributor

Originally reported by: pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne)


With #47 it is now possible to get a SINGLE license file included in a built wheel by using setup.cfg

[metadata]
license_file = <some file>

This is however a severe limitation: this works for very simple licenses... but even for common license such as the Apache-2.0, you cannot include both the license file and the NOTICE file.
Beyond this, there could be other metafiles you want to include in a built wheel.
If they are not package-data, you could use data_files, but they will be installed in some generally unpleasant location.

We should generalize the notion of license-file to include more than one and include arbitrary files defined in a setup.cfg as metafiles.


@agronholm
Copy link
Contributor Author

Original comment by Daniel Holth (Bitbucket: dholth, GitHub: dholth):


Appending them into a single file is not an option?

The main concern here is that an arbitrary file in .dist-info should not share the same filename as any of the machine readable metadata. If you decided to call your arbitrary file 'METADATA' then the resulting wheel would either overwrite your file or it would not work.

@agronholm
Copy link
Contributor Author

Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne):


See schematics/schematics#420 (comment) on why appending them in a single file is not an (easy) option: for instance a license and authors file.

@agronholm
Copy link
Contributor Author

Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne):


@dholth you wrote:

The main concern here is that an arbitrary file in .dist-info should not share the same filename as any of the machine readable metadata. If you decided to call your arbitrary file 'METADATA' then the resulting wheel would either overwrite your file or it would not work.

Could the workaround be simply to add these files after all .dist-info files have been generated and raise an exception if you tried to add a file that already exist in there with a clear error message?

@agronholm
Copy link
Contributor Author

Original comment by Daniel Holth (Bitbucket: dholth, GitHub: dholth):


I would like the long-term solution to be for wheel to define additional categories of files in addition to purelib and platlib, making it easy to put files in places like /usr/share/doc/

@agronholm
Copy link
Contributor Author

Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne):


@dholth Can you elaborate a bit by what you mean exactly by categories of files?

@agronholm
Copy link
Contributor Author

Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne):


do you see these as sections in the setup.cfg where you define what?
I guess the difficulty for a generic solution would be to be very explicit and play well with system as well as virtualenv installations. Such that there is no guess work when packaging and no guess work where you will find the the files once installed.
Which is something data_files have been lacking IMHO.
So ideally for each "metafile" you need to state the exact source(s) and where it will go exactly.
And that where should be somewhere in the .dist-info ? but what aboutd your /usr/share/doc case?

@agronholm
Copy link
Contributor Author

Original comment by Daniel Holth (Bitbucket: dholth, GitHub: dholth):


So the model of installation in wheel is that you have categories of files
that are copied onto destination directories:

package-1.0.data/scripts/
package-1.0.data/platlib/

and so on, with either purelib or platlib going in the root of the wheel
archive instead of in the .data/ subdirectory.

If you added more categories like

package-1.0.data/docs/

then that folder in the wheel archive would be copied onto a configurable
target path, like /usr/share/docs/. Of course it would all be relative to
the virtualenv when using a virtualenv.

In review, the archive contains a set of directory trees each containing a
category of files. Each tree is copied onto a target path in the install
environment. This is how wheel works now but it only knows about a few
distutils-defined categories.

(We also have data_files which is copied to different places depending on
the installer, but generally is copied onto / or the root of the
virtualenv. It has been hard to use correctly for a long time and so is not
very useful.)

Questions remain. Which paths? How do you configure them? How does a
package know where its constituent pieces have been installed? How do you
specify which sources become which "category of data_file"? Is this
sufficiently flexible? Do we include a package-specific path like
/usr/share/docs/packagename-1.0/?

Tried to work this out in a PEP last year but it got stalled.

@agronholm
Copy link
Contributor Author

Original comment by pombredanne NA (Bitbucket: pombredanne, GitHub: pombredanne):


Daniel you wrote:

Tried to work this out in a PEP last year but it got stalled.

I am game. Do you have the start of some draft?

@agronholm
Copy link
Contributor Author

Original comment by Daniel Holth (Bitbucket: dholth, GitHub: dholth):


This section but removing any mention of "a new version of wheel" from the pep, rename pep to "fine grained installation of Python packages". https://www.python.org/dev/peps/pep-0491/#install-paths

@agronholm
Copy link
Contributor Author

Original comment by Daniel Holth (Bitbucket: dholth, GitHub: dholth):


@pombredanne please take a look at https://bitbucket.org/dholth/python-install-paths/src/tip/pep-0491.txt

We have a pull request with most of an implementation of the same for wheel, and probably in /dholth/wheel I have an alternative implementation of a very similar proposal.

@agronholm
Copy link
Contributor Author

The current plan would seem to be to concatenate all the license related files to .dist-info/LICENSE.txt. With upcoming packaging standards a better solution might emerge. Any objections?

@pombredanne
Copy link

@agronholm Thank you for keeping this alive! .dist-info/LICENSE.txt is what we have today, but it is limited and might need combining multiple texts in one (such as in the common Apache + NOTICE text) with some loss of precision in some cases... Yet I have no objection to the status quo, even if not perfect.

webknjaz added a commit to webknjaz/ansible that referenced this issue Mar 26, 2018
It's been misused + doesn't support multiple files (pypa/wheel#138)
webknjaz added a commit to webknjaz/ansible that referenced this issue Apr 6, 2018
It's been misused + doesn't support multiple files (pypa/wheel#138)
@njsmith
Copy link
Member

njsmith commented Apr 16, 2018

Another case where multiple license files would be useful is for packages with multiple licenses :-). For example, pyca/cryptography and trio are like this. Also any package that includes other vendored packages, like pip or requests.

I understand the concern about cluttering up the .dist-info directory with random files, but... filesystems are hierarchical, and namespaces are one honking great idea :-). Could we let users specify arbitrary files and then put them all in .dist-info/license/?

@pombredanne
Copy link

pombredanne commented May 27, 2018

@njsmith I am all for this. In the end wheels are today in a weird spot where they do not make it easy to create things that comply with licensing even though a corresponding sdist may be fine.

@agronholm
Copy link
Contributor Author

The question is, how to implement this? I'd rather not allow users to clutter the .dist-info/ directory freely, but having a licenses directory there is something I could live with. I'd like to run this through distutils-sig to see if any counterproposals pop up.

@agronholm
Copy link
Contributor Author

I'm leaning towards including LICENSE* from the project root, which should work for most people out of the box. Can someone think of plausible scenarios where this would be undesirable or where more flexibility is required?

@agronholm
Copy link
Contributor Author

Here are some more thoughts:

  • Allow users to specify the exact file names (via license_files or similar)
  • Allow users to specify a glob pattern

@sersorrel
Copy link

COPYING is another typical name for license files. Also, the Apache license requires that you include any NOTICE file, if there is one.

@sersorrel
Copy link

...and people do still occasionally use the UK spelling (LICENCE) too :)

@agronholm
Copy link
Contributor Author

So it sounds like license_files is the best option, yes?

@njsmith
Copy link
Member

njsmith commented Jul 8, 2018

license_files seems like a simple way to give people the ability to control what files are included, so +1.

I don't think explicit glob support is necessary – if people want to use a glob and a non-default naming scheme, they can write

from glob import glob

setup(...
    license_files=glob("licenses/*.txt"),
)

Since such an overwhelming proportion of wheels are currently violating licenses by not having this configured correctly, I think it would be good if the default were fairly inclusive. Maybe: any files matching any of the globs ["LICENSE*", "LICENCE*", "COPYING*", "NOTICE*"]? (Of course you'd default would be ignored if an explicit license_file or license_files was given.)

(Is there anything missing from my list there? E.g. do any popular licenses mandate including AUTHORS to achieve compliance?)

@agronholm
Copy link
Contributor Author

Sounds good. Of course the reason why I'm talking about explicitly supporting globbing is because these days I enter all my metadata in setup.cfg rather than in setup.py and hope that some day we can do away with setup.py entirely.

I will also deprecate license_file in favor of license_files. The latter will also work as a direct replacement for the former, and if both are present, the license_file will just be added to the list of license files.

@njsmith
Copy link
Member

njsmith commented Jul 9, 2018

I guess supporting globs for setup.cfg is harmless enough if that's something you care about. But I don't see the point myself. setup.cfg will never replace setup.py. Flit or other new systems might replace both of them, but I don't see any future where we have setup.cfg without setup.py.

@pombredanne
Copy link

You may be interested to chime in on this license-related PEP draft discussion https://discuss.python.org/t/improving-license-clarity-with-better-package-metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants