Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction of hash pinning and subsequent uploading of new artifacts #564

Closed
indygreg opened this issue Oct 8, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@indygreg
Copy link

commented Oct 8, 2018

I am pretty religious about using hash pinning in pip requirements files because of its security benefits.

Over the past several months, I've had a number of projects start failing hash checks out of the blue because a new artifact was uploaded to PyPI - usually a wheel. Essentially:

  1. I generate a requirements.txt with hashes for all available artifacts.
  2. A new wheel is uploaded to PyPI.
  3. pip downloads the new wheel and hash checking fails because the wheel's hash isn't recorded.

Lately, this has been happening a lot with Python 3.7. The package author initially produced wheels for up to 3.6 and those hashes were pinned. Python 3.7 was previously pulling down the source distribution. But as soon as the 3.7 wheel was uploaded, it started attempting to pull down the 3.7 wheel and hash verification failed.

This has been pretty frustrating because behavior isn't deterministic over time. e.g. one day my CI is working fine. The next, a new artifact appears in PyPI and my processes are busted. I have to pin the hashes of the new artifacts to unbust things. This is especially frustrating because it prevents trivial bisection and reproduction of old results.

An easy solution to this problem is to stop hash pinning. But that undermines security.

Another potential solution is to tell pip to never download wheels or to force the downloading of a specific artifact (such as the source distribution). But that's annoying too, as some packages are difficult to build due to required dependencies. I'm OK cherry-picking which packages can have wheels disabled. But I don't believe a requirements.txt allows per-package settings to control wheel vs non-wheel?

From an ivory tower, I want to say that the ideal solution is for package maintainers to not dynamically change the set of artifacts for a release. e.g. once you create a release, the set of artifacts is forever frozen and can't be changed. That would prevent a class of problems around non-deterministic behavior over time. But obviously creating a new release is less convenient than "just upload new artifacts [for new Python distributions when you produce them]." And I'm sure teaching PyPI to track the set of artifacts as "frozen" may not be trivial to implement.

Anyway, I wanted to write about my experiences in hopes of generating a discussion. Maybe pip and/or PyPI could be taught new features to mitigate this general problem? Maybe the user guide could say something about the impact of uploading new artifacts for an old release? I'm not too familiar with the current state of packaging discussions and I don't mean to prescribe any specific solution: I just want to ensure the experts are aware of the problem and can consider potential mitigations. Thanks for all your hard work on packaging!

@di

This comment has been minimized.

Copy link
Member

commented Oct 8, 2018

We occasionally face this circumstance with the dependencies for pypa/warehouse, so first of all: I feel your pain.

I have to pin the hashes of the new artifacts to unbust things.

I'm curious if you're using any additional tools to manage your dependencies. For Warehouse, we use hashin which is an excellent little library and makes it very easy to recompile the hashes of our dependencies when this happens. (cc @peterbe)

From an ivory tower, I want to say that the ideal solution is for package maintainers to not dynamically change the set of artifacts for a release. e.g. once you create a release, the set of artifacts is forever frozen and can't be changed.

I don't really think I need to explain this to you, but for the sake of discussion: this has one obvious disadvantage, which is that maintainers can't go back and add new distributions to old releases for runtimes that didn't exist when the release was already made. The addition of 3.7 wheels is a great example of this: we want people to be able to go back and add built distributions for this runtime now that it exists. Which makes the likelihood of this happening relatively slim.

But I don't believe a requirements.txt allows per-package settings to control wheel vs non-wheel?

It does. A little known fact: the syntax of requirements.txt is the same as the arguments you would pass to the pip install command. So you can restrict it to only binary distributions:

foo==1.2.3 --hash=sha256:<hash> --only-binary=foo

...or to only source distributions:

foo==1.2.3 --hash=sha256:<hash> --no-binary=foo

Note that this wouldn't guarantee that the hashes might change on you, because the maintainer could:

  • delete the .tar.gz source distribution and upload a .zip source distribution (or vice versa)
  • add a universal wheel when there used to be a platform-specific wheel.

Another thing you could do is just specify links to individual distributions directly:

https://files.pythonhosted.org/packages/.../foo-1.2.3-py3-none-any.whl --hash=sha256:<hash>

This would definitely ensure that the hash would never change. It would be quite tedious to do manually though. I'm not sure if hashin has a feature that does this, but it might be a worthwhile thing to add.

The only thing I think that could happen here: instead of pip following this process:

  • find the most ideal distribution
  • check if that distribution's hash is in the list of hashes
  • install if yes, fail if no

...it could do this:

  • find any installable distributions
  • sort them by "idealness" (i.e., platform-specific wheels before source distributions)
  • install the first one that has a matching hash
  • error if it can't install any of them

However, I'm not a pip maintainer, so I can't say how reasonable/doable that would actually be.

@peterbe

This comment has been minimized.

Copy link

commented Oct 9, 2018

[if releases were locked after a release was made] maintainers can't go back and add new distributions to old releases for runtimes that didn't exist when the release was already made.

I actually think the release should be immutable. If you decide to "append" a 3.7 wheel after the release was made I think it should not be allowed.

More importantly, and what's been a problem in the past, is when the release files are changed. Is that still possible with warehouse? I think it should be disallowed because that's exactly the kinds of nasty surprises with hashes that @indygreg is alluding to.

These days, I see fewer and fewer occurrences of hashes changing under my feet between local development and eventual CI. In fact, it's been so long that I dare to say I haven't seen it in many months. Granted, I don't use 3.7 in any active projects yet.

@di

This comment has been minimized.

Copy link
Member

commented Oct 9, 2018

More importantly, and what's been a problem in the past, is when the release files are changed. Is that still possible with warehouse? I think it should be disallowed because that's exactly the kinds of nasty surprises with hashes that @indygreg is alluding to.

This is definitely not possible now, and I'm fairly sure it wasn't possible with legacy PyPI either.

@peterbe

This comment has been minimized.

Copy link

commented Oct 10, 2018

Here we go again! :(

According to https://pypi.org/project/zope.interface/4.5.0/#history version 4.5.0 was released in April 2018.
But all of a sudden it breaks builds, with the error:

THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    zope.interface==4.5.0 from https://files.pythonhosted.org/packages/a8/1c/ebc5de89de058c7aaf8f5d1013020273449d7069bb4036287c52cea9a25f/zope.interface-4.5.0-cp36-cp36m-manylinux1_x86_64.whl#sha256=727aa737a48dd33d6859296f15edb54e85ccdfa5667513f7a50daf362b3df75b (from -c constraints.txt (line 312)):
        Expected sha256 21506674d30c009271fe68a242d330c83b1b9d76d62d03d87e1e9528c61beea6
        Expected     or 3d184aff0756c44fff7de69eb4cd5b5311b6f452d4de28cb08343b3f21993763
        Expected     or 467d364b24cb398f76ad5e90398d71b9325eb4232be9e8a50d6a3b3c7a1c8789
        Expected     or 57c38470d9f57e37afb460c399eb254e7193ac7fb8042bd09bdc001981a9c74c
        Expected     or 9ada83f4384bbb12dedc152bcdd46a3ac9f5f7720d43ac3ce3e8e8b91d733c10
        Expected     or a1daf9c5120f3cc6f2b5fef8e1d2a3fb7bbbb20ed4bfdc25bc8364bc62dcf54b
        Expected     or e6b77ae84f2b8502d99a7855fa33334a1eb6159de45626905cb3e454c023f339
        Expected     or e881ef610ff48aece2f4ee2af03d2db1a146dc7c705561bd6089b2356f61641f
        Expected     or f41037260deaacb875db250021fe883bf536bf6414a4fd25b25059b02e31b120
             Got        727aa737a48dd33d6859296f15edb54e85ccdfa5667513f7a50daf362b3df75b

What appears to have happened is that yesterday they added a bunch of new wheels for linux: https://pypi.org/project/zope.interface/4.5.0/#files (Note the Oct 9 2018) date.

What @indygreg is bringing up is real. This is just one of those examples. It's annoying and disruptive.

@di you say it's not possible to disallow appending release files to an existing version. Is that a technical thing or a "policy thing"?

I don't even know how the fundamentals of Warehouse work in terms of adding files but I can imagine that deep down in the business logic it probably does something like this:

def upload_release_files(version, files):
    release, created = Release.objects.get_or_create(version)
    logger.info("New release" if created else "Adding to release")
    for file in files:
        release.add_release_file(file)

What about changing it to this:

def upload_release_files(version, files):
    release, created = Release.objects.get_or_create(version)
    if not created:
        if datetime.utcnow() - release.date > 60 * 60 * 24:
           raise BadRequest("Can't add release files to a version older than 1 day")
    logger.info("New release" if created else "Adding to release")
    for file in files:
        release.add_release_file(file)
@di

This comment has been minimized.

Copy link
Member

commented Oct 10, 2018

@di you say it's not possible to disallow appending release files to an existing version. Is that a technical thing or a "policy thing"?

I didn't say it wasn't possible, I said that it has a disadvantage, which is that maintainers can't go back and add built distributions to old releases for runtimes that didn't exist when the release first made.

I don't even know how the fundamentals of Warehouse work in terms of adding files but I can imagine that deep down in the business logic it probably does something like this:

Consider that not every project is able to add every distribution for a new release all in one upload: for very complex builds with distributions for multiple platforms, these may come from multiple CI services, build environments, etc. Some projects don't even have this level of sophistication and instead have separate maintainers who are each responsible for uploading distributions for a specific platform. Telling them that they now need to upload all their distributions at once, or within some arbitrary window of time before the release is "frozen" would add extra maintenance burden for them.

Like I said in #564 (comment), for folks that want to avoid this "annoyance", I think there is sufficient ability to avoid this behavior by more specifically defining their requirements files.

@dstufft

This comment has been minimized.

Copy link
Member

commented Oct 10, 2018

To be clear, there is no concept of "at once" in PyPI's upload API. You can only upload a single file at a time, even a command like twine upload dist/* is really uploading one file at a time. So no matter what something that limits people to not be able to upload new wheels is going to have to have some arbitrary window of time when we would hypothetically allow it, and then after some cut off disallow it.

However I think that disallowing uploads here would be a net loss. It would be silly to, for instance, cause every C library that currently publishes wheels to need to rev their version number just so they can upload 3.7 wheels. It would be crummy if those projects just decided not to release 3.7 wheels at all until their next release and would just act as another impediment to people upgrading to a new version of Python.

I think that this is a pip problem, and pip can solve it as described in pypa/pip#5874 without adding a restriction to PyPI.

@theacodes

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

Since @dstufft indicated this is a pip problem, I'm going to close this issue here. Please let me know if this is a mistake and there is indeed something actionable for this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.