Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Interaction of hash pinning and subsequent uploading of new artifacts #564
I am pretty religious about using hash pinning in pip requirements files because of its security benefits.
Over the past several months, I've had a number of projects start failing hash checks out of the blue because a new artifact was uploaded to PyPI - usually a wheel. Essentially:
Lately, this has been happening a lot with Python 3.7. The package author initially produced wheels for up to 3.6 and those hashes were pinned. Python 3.7 was previously pulling down the source distribution. But as soon as the 3.7 wheel was uploaded, it started attempting to pull down the 3.7 wheel and hash verification failed.
This has been pretty frustrating because behavior isn't deterministic over time. e.g. one day my CI is working fine. The next, a new artifact appears in PyPI and my processes are busted. I have to pin the hashes of the new artifacts to unbust things. This is especially frustrating because it prevents trivial bisection and reproduction of old results.
An easy solution to this problem is to stop hash pinning. But that undermines security.
Another potential solution is to tell pip to never download wheels or to force the downloading of a specific artifact (such as the source distribution). But that's annoying too, as some packages are difficult to build due to required dependencies. I'm OK cherry-picking which packages can have wheels disabled. But I don't believe a
From an ivory tower, I want to say that the ideal solution is for package maintainers to not dynamically change the set of artifacts for a release. e.g. once you create a release, the set of artifacts is forever frozen and can't be changed. That would prevent a class of problems around non-deterministic behavior over time. But obviously creating a new release is less convenient than "just upload new artifacts [for new Python distributions when you produce them]." And I'm sure teaching PyPI to track the set of artifacts as "frozen" may not be trivial to implement.
Anyway, I wanted to write about my experiences in hopes of generating a discussion. Maybe
We occasionally face this circumstance with the dependencies for pypa/warehouse, so first of all: I feel your pain.
I'm curious if you're using any additional tools to manage your dependencies. For Warehouse, we use
I don't really think I need to explain this to you, but for the sake of discussion: this has one obvious disadvantage, which is that maintainers can't go back and add new distributions to old releases for runtimes that didn't exist when the release was already made. The addition of 3.7 wheels is a great example of this: we want people to be able to go back and add built distributions for this runtime now that it exists. Which makes the likelihood of this happening relatively slim.
It does. A little known fact: the syntax of
...or to only source distributions:
Note that this wouldn't guarantee that the hashes might change on you, because the maintainer could:
Another thing you could do is just specify links to individual distributions directly:
This would definitely ensure that the hash would never change. It would be quite tedious to do manually though. I'm not sure if
The only thing I think that could happen here: instead of
...it could do this:
However, I'm not a
I actually think the release should be immutable. If you decide to "append" a 3.7 wheel after the release was made I think it should not be allowed.
More importantly, and what's been a problem in the past, is when the release files are changed. Is that still possible with warehouse? I think it should be disallowed because that's exactly the kinds of nasty surprises with hashes that @indygreg is alluding to.
These days, I see fewer and fewer occurrences of hashes changing under my feet between local development and eventual CI. In fact, it's been so long that I dare to say I haven't seen it in many months. Granted, I don't use 3.7 in any active projects yet.
This is definitely not possible now, and I'm fairly sure it wasn't possible with legacy PyPI either.
Here we go again! :(
According to https://pypi.org/project/zope.interface/4.5.0/#history version 4.5.0 was released in April 2018.
What appears to have happened is that yesterday they added a bunch of new wheels for linux: https://pypi.org/project/zope.interface/4.5.0/#files (Note the Oct 9 2018) date.
What @indygreg is bringing up is real. This is just one of those examples. It's annoying and disruptive.
@di you say it's not possible to disallow appending release files to an existing version. Is that a technical thing or a "policy thing"?
I don't even know how the fundamentals of Warehouse work in terms of adding files but I can imagine that deep down in the business logic it probably does something like this:
def upload_release_files(version, files): release, created = Release.objects.get_or_create(version) logger.info("New release" if created else "Adding to release") for file in files: release.add_release_file(file)
What about changing it to this:
def upload_release_files(version, files): release, created = Release.objects.get_or_create(version) if not created: if datetime.utcnow() - release.date > 60 * 60 * 24: raise BadRequest("Can't add release files to a version older than 1 day") logger.info("New release" if created else "Adding to release") for file in files: release.add_release_file(file)
I didn't say it wasn't possible, I said that it has a disadvantage, which is that maintainers can't go back and add built distributions to old releases for runtimes that didn't exist when the release first made.
Consider that not every project is able to add every distribution for a new release all in one upload: for very complex builds with distributions for multiple platforms, these may come from multiple CI services, build environments, etc. Some projects don't even have this level of sophistication and instead have separate maintainers who are each responsible for uploading distributions for a specific platform. Telling them that they now need to upload all their distributions at once, or within some arbitrary window of time before the release is "frozen" would add extra maintenance burden for them.
Like I said in #564 (comment), for folks that want to avoid this "annoyance", I think there is sufficient ability to avoid this behavior by more specifically defining their requirements files.
referenced this issue
Oct 10, 2018
To be clear, there is no concept of "at once" in PyPI's upload API. You can only upload a single file at a time, even a command like
However I think that disallowing uploads here would be a net loss. It would be silly to, for instance, cause every C library that currently publishes wheels to need to rev their version number just so they can upload 3.7 wheels. It would be crummy if those projects just decided not to release 3.7 wheels at all until their next release and would just act as another impediment to people upgrading to a new version of Python.
I think that this is a pip problem, and pip can solve it as described in pypa/pip#5874 without adding a restriction to PyPI.