Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip wheel produces a "Hashes are required" when building a wheel from a local sdist #12942

Open
1 task done
alex opened this issue Aug 28, 2024 · 36 comments
Open
1 task done
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@alex
Copy link
Member

alex commented Aug 28, 2024

Description

An innovation like the following: pip wheel -c constraints-file-with-hashes.txt local-sdist.tar.gz produces an error like:

ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    file:///D:/a/cryptography/cryptography/cryptography-44.0.0.dev1.tar.gz --hash=sha256:e85c67eb1a045652bb850f443ae24004b618aca6df8c642a8e7a977f90f16afb

Note that the package which is missing the hash is the local sdist.

Expected behavior

pip should enforce hashes for any downloaded/remote packages, but should not require a hash for the local sdist.

pip version

24.2

Python version

3.11.9

OS

All

How to Reproduce

  1. Download a local sdist pip download --no-binary --no-deps cryptography
  2. Create a constraints file with hashes
  3. pip wheel -c constraints-file-with-hashes.txt cryptography*.tar.gz

Output

No response

Code of Conduct

@alex alex added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Aug 28, 2024
@notatallshaw
Copy link
Member

Does the constraints file contain --require-hashes?

@alex
Copy link
Member Author

alex commented Aug 28, 2024 via email

@alex
Copy link
Member Author

alex commented Aug 29, 2024

Sorry, to be more precise, the constraints file contains hashes. It doesn't have --require-hashes, but I believe those are equivalent.

@uranusjr
Copy link
Member

Is it possible to have a self-contained reprod?

@alex
Copy link
Member Author

alex commented Aug 30, 2024

(tempenv-6e562856703b6) ~/.v/tempenv-6e562856703b6 ❯❯❯ pip download --no-binary :all: --no-deps pretend
Collecting pretend
  File was already downloaded /Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Successfully downloaded pretend
(tempenv-6e562856703b6) ~/.v/tempenv-6e562856703b6 ❯❯❯ pip wheel --require-hashes ./pretend-1.0.9.tar.gz
Processing ./pretend-1.0.9.tar.gz
  File was already downloaded /Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    file:///Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz --hash=sha256:c90eb810cde8ebb06dafcb8796f9a95228ce796531bc806e794c2f4649aa1b10

@uranusjr
Copy link
Member

uranusjr commented Aug 30, 2024

pip wheel --require-hashes ./pretend-1.0.9.tar.gz

In this case the local sdist does not have a hash, so pip’s complaint is not groundless (whether a local sdist needs a hash is another question). I thought in the original issue the local sdist does have a hash but pip fails to recognise it?

@alex
Copy link
Member Author

alex commented Aug 30, 2024 via email

@notatallshaw
Copy link
Member

While I can see the argument one way or the other, I have worked in teams where the repository is a network attached storage, and if they were concerned about data integrity and hashed their contents before copying it to the network, they would expect --requires-hashes to enforce that check.

How practical this scenario is I don’t know, but usually every feature of pip is relied on by someone.

@alex
Copy link
Member Author

alex commented Aug 30, 2024

I'm reticent to suggest a flag, since that's just Yet Another Thing for users. But if the existing behavior is desired, then maybe this is a feature request for some way to disable this behavior, and only enforce hashes for PyPI packages.

@pfmoore
Copy link
Member

pfmoore commented Aug 30, 2024

only enforce hashes for PyPI packages

To clarify what you intend here, do you mean just PyPI, or any index (specified via --index-url and/or --extra-index-url)? What about "informal" repositories specified via --find-links? Also, there's the possibility of requirements specified by (local) file path or URL - would both of those be exempt from hash checks?

I sympathise with the idea that hashes are more important for some sources than for others, but I'm not at all clear where we draw the line - and I don't personally use hashes, so I have to be guided by what our users seem to want, which mostly feels like "hashes enforced everywhere, except for the occasional place that I don't want them to be enforced"1.

Footnotes

  1. Sorry, that comes across as a bit facetious or dismissive, but it's genuinely hard to pin down a clear rule that people agree with.

@alex
Copy link
Member Author

alex commented Aug 30, 2024

Sorry, I should have said index-provided.

I don't know what to do about the general case of local file path.

It seems clear (to me at least?) that "the sdist I'm building a wheel out of" is a distinct case than the more general pip install use case.

@notatallshaw
Copy link
Member

To understand your workflow a bit better, why are you using --requires-hashes with pip wheel for a local sdist?

If your goal is to build the sdist and you're not worried about its data integrity, why not just build it without resolving its dependencies? i.e. drop --requires-hashes and add --no-deps.

@alex
Copy link
Member Author

alex commented Aug 30, 2024

Because I want to pin the versions of build-system.requires. In my actual use case, the sdist is produced by a previous step in the CI system. See pyca/cryptography#11500

@notatallshaw
Copy link
Member

notatallshaw commented Aug 30, 2024

I wasn't aware that pip wheel passed -c constraints to the build requirements, is that correct? It doesn't for pip install, you have to use the environmental variable PIP_CONSTRAINT.

If I am reading this correctly, you are building the wheel(s), copying the wheel(s) to another location to be used, and then venv is not used further? Perhaps as a workaround, you could:

  1. Install your pinned build dependencies: python -m pip install --require-hashes -r ${{ env.BUILD_REQUIREMENTS_PATH }}
  2. Build your wheel(s) offline with no isolation: python -m pip wheel -v --no-deps --no-index --no-build-isolation cryptography*.tar.gz $PY_LIMITED_API -w dist/

I do something very similar for one of my build steps, I don't need build isolation because I can already create a reproducible pinned build environment via the docker steps, and I'm not reusing that environment for anything else. Hope this helps anyway.

@alex
Copy link
Member Author

alex commented Aug 30, 2024

I suppose it's possible -c doesn't work and I need to use PIP_CONSTRAINT, but that seems orthogonal to this. (It's also a fairly significant foot-gun, but that's also orthogonal!)

I agree that it's possible to work around this by simply not relying on build isolation, but this increases the complexity of the build. It really should be possible to build a wheel from an sdist while exerting precise control over all dependencies to be built. (If there's a better tool than pip for this, I'm happy to hear it, but I'm not aware of another.)

@pfmoore
Copy link
Member

pfmoore commented Aug 30, 2024

(If there's a better tool than pip for this, I'm happy to hear it, but I'm not aware of another.)

If you're building a wheel, rather than installing, build might be better for you.

@alex
Copy link
Member Author

alex commented Aug 30, 2024 via email

@notatallshaw
Copy link
Member

It really should be possible to build a wheel from an sdist while exerting precise control over all dependencies to be built.

I think it is (though I've not tried it with this sdist example), but it requires constructing a constraints.txt which includes a hash of your sdist and point the env variable PIP_CONSTRAINT to that file.

@notatallshaw
Copy link
Member

notatallshaw commented Aug 30, 2024

You probably also need to set the env var for require hashes, I think PIP_REQUIRE_HASHES=1?

@notatallshaw
Copy link
Member

notatallshaw commented Aug 30, 2024

-c doesn't work and I need to use PIP_CONSTRAINT, but that seems orthogonal to this. (It's also a fairly significant foot-gun, but that's also orthogonal!)

My understanding is constraints pre-date isolated builds, and further pip has no user facing way to find out the required build dependencies of a package, therefore workflows which involve using pip freeze to generate pinned constraints might break if -c was passed to the isolated build environment depending on containts generated and the options the user is using.

Pip-tools has a way of extracting build dependencies: https://github.com/jazzband/pip-tools?tab=readme-ov-file#maximizing-reproducibility, but relies on the same PIP_CONSTRAINT env variable when you sync your environment.

uv improves the situation by separating out regular constraints and build constraints: https://docs.astral.sh/uv/pip/compatibility/#build-constraints, but it's not clear to me from the docs if the CLI option is applied recursively or you need to use UV_BUILD_CONSTRAINT to ensure a build dependencies build dependencies are pinned (but I don't think uv provides a wheel option).

I've heard Bazel supports reproducible fully pinned Python projects, but I don't understand the tool well enough that looking at their documentation tells me if this is true or not.

@alex
Copy link
Member Author

alex commented Aug 31, 2024

To take a step back here: My overall goal is to take a local sdist, build a wheel from it, and do so with any downloaded artifacts pinned to a version and hash verified.

The last element is presently an impediment because pip attempts to verify the hash of the sdist itself, which is not in the constraints file.

I have a lack of clarity about whether this is desired behavior by the pip maintainers, so I want to lay out three possible directions here:

  1. Verifying the hash of the local sdist is not intended or desired behavior: pip should stop checking the hash of a local sdist.
  2. Verifying the hash of a local sdist is either a) desired or b) not desired, but now part of the backwards compatibility surface for pip: pip should add a flag to disable performing this verification
  3. Verifying the hash of a local sdist is both intended, desired, and there is no interest in allowing it to be disabled: This issue should be wontfixed

Perhaps there's other options too, but I'd be interested in which direction the maintainers prefer.

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2024

Perhaps there's other options too, but I'd be interested in which direction the maintainers prefer.

I can't speak for the other maintainers, but my personal view is somewhere between (2) and (3). I think that --require-hashes should mean what it says, and require hashes for everything. We document that --require-hashes "is implied when any package in a requirements file has a --hash option", and while constraints files aren't mentioned explicitly, we don't document much about constraint files in general, and I'd expect people to assume they work similarly to requirement files. So we'd be potentially breaking compatibility to change this, even if we wanted to.

In addition, as I said above, I think that pinning down precisely what the semantics of any potential "disable hash verification for local sdists" are would be both difficult to do, and difficult to document. So even if the consensus was (2), I'm against having an option unless someone can prove me wrong by specifiying the behaviour clearly and unambiguously.

Having said all of this, I have little or no experience of acually using hash checking mode, so I'd defer to someone with real world experience if they said otherwise.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Aug 31, 2024

My experience working at a place where hash checking is strongly recommended by security but we also have several local requirements is I eventually wound up adding a flag to pip compile to work around this issue. What I do now is I have,

requirements.in file which has a list of dependencies to install of local packages. I use pip compile (now uv pip compile) to convert requirements.in to requirements.txt and I include flag --exclude-package/--unsafe-package (name varies by uv vs pip tools) to exclude local packages from .txt file. Then I do pip install --no-dependencies requirements.txt and pip install --no-dependencies requirements.in (second one installs local packages).

A little convoluted, but I think current hash checking mode mostly annoying with local/editable dependencies mixed in and forces some tricks like this. The pip tools issue about --unsafe-package also had other people comment using this kind of trick to work around --require-hashes behavior.

So my own preference is 1 would make usage of editable/local easier, but today I've found a workable alternative with multiple install commands/files that deals with this issue.

Before I found this solution security recommendation boiled down to we lack a good way to handle this case and only see awkward choices.

edit: Glancing at how other team's in my company deal with this kind of issue, it either is multiple install commands/requirement files or not use hashes. Although for latter I'm unsure if it's avoid for this issue or they are unaware of using hashes/workaround paths.

edit 2: Also one suggested possible solution is flag like —no-hashes package-name that can be specified multiple times and explicitly specify which packages to not check hashes for. That’s roughly how exclude package way. No special logic for local/editable but allow user/script running install to explicitly mark some as fine without hash.

@notatallshaw
Copy link
Member

notatallshaw commented Sep 5, 2024

Okay, I tried to create a workflow for OP without any workarounds or using other tools, and it's not clear to me it's even possible to pin build requirements in an isolated build environment with hashes? In short:

  1. Pip can only enforce pinned build dependencies with PIP_CONSTRAINT
  2. Pip will not take the hashes from a constraint file
  3. pyproject.toml build requirements can not include hashes

Example of trying to install an unhashed requirement with a hashed constraint:

  1. Create constraints.txt with the contents:
setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
  1. Run pip install setuptools==74.1.1 -c constraints.txt, and get error:
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766

Full example of minimal workflow

I'm using bash for this example, you'll need to adapt to whatever shell you use:

  1. mkdir minimal_project
  2. cd minimal_project
  3. Create pyproject.toml with following contents:
[build-system]
requires = ["setuptools==74.1.1", "wheel==0.44.0"]
build-backend = "setuptools.build_meta"

[project]
name = "minimal_project"
version = "0.1.0"
  1. mkdir -p src/minimal_project
  2. touch src/minimal_project/__init__.py
  3. python -m build --sdist
  4. Create build-constraints.txt with the contents:
setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
wheel==0.44.0 --hash=sha256:2376a90c98cc337d18623527a97c31797bd02bad0033d41547043a1cbfbe448f
  1. Create requirements file list so: echo "file://$(realpath dist/minimal_project-0.1.0.tar.gz) --hash=sha256:$(sha256sum dist/minimal_project-0.1.0.tar.gz | cut -d' ' -f1)" > sdist-requirements.txt
  2. Export build constraints: export PIP_CONSTRAINT="$PWD/build-constraints.txt"
  3. Attempt to build wheel: python -m pip wheel --no-deps -r sdist-requirements.txt and get error:
Processing ./dist/minimal_project-0.1.0.tar.gz
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      Collecting setuptools==74.1.1
        Using cached setuptools-74.1.1-py3-none-any.whl (1.3 MB)
      ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
          setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Am I missing something? Is it possible to use hashes with pip for build requirements at all? I think if hashes from constraints were accepted, then this workflow would work.

@notatallshaw
Copy link
Member

notatallshaw commented Sep 5, 2024

On a side note for OP, uv just added uv build and can build from sdists: astral-sh/uv#6898, which combined with UV_BUILD_CONSTRAINT you should be able to get what you want, but I haven't tried it.

@hmc-cs-mdrissi
Copy link

For build requirements you can have separate build_requirements.txt file with hashes install that file with no-dependencies and then afterwards for main install do no-build-isolation as a workaround.

How do you even determine build requirements is another problem as pip compile/similar tool only produce resolution of install requirements not build ones although in practice my experience is build requirement list is usually very short so I’ve just manually made it.

@notatallshaw
Copy link
Member

For build requirements you can have separate build_requirements.txt file with hashes install that file with no-dependencies and then afterwards for main install do no-build-isolation as a workaround.

Yes, I gave an example of that workflow earlier (#12942 (comment)), but OP was unhappy with --no-build-isolation so I was seeing if it was possible to come up with some workflow that could pin build requirements with hashes using build isolation, and I guess the answer is no.

@pfmoore
Copy link
Member

pfmoore commented Sep 5, 2024

This is basically a standards issue at the core. Hashes are a form of locking, and currently we have no package locking standard. In particular:

  1. The specification for requirements does not include a standardised way to include a file hash.
  2. The specification for pyproject.toml requires you to define the build dependencies as requirements.

The proposed lockfile standard, PEP 751, includes a section for specifying locked build requirements. That may help with this workflow, once the PEP gets approved and implemented.

@notatallshaw
Copy link
Member

notatallshaw commented Sep 5, 2024

This is basically a standards issue at the core. Hashes are a form of locking, and currently we have no package locking standard

It's only a standards issue in the sense that pip's existing installer features don't currently work in this scenario, but if pip would allow a constraints file to constrain requirements via hashes this would solve this workflow without the need for a new standard.

The specification for requirements does not include a standardised way to include a file hash.

No, but pip documents how to: https://pip.pypa.io/en/stable/topics/secure-installs/ to pin a requirement via hashes, and it allows a user to specify them in a constraints file, but then it doesn't functionally let the user constrain to those hashes.

The proposed lockfile standard, PEP 751, includes a section for specifying locked build requirements. That may help with this workflow, once the PEP gets approved and implemented.

For a lock file this seems a little under specified. Specifically I would expect an actual standard around locking to let you lock build requirements per requirement, I guess I'll have to chime in on that very very long discuss thread 🙁

@pfmoore
Copy link
Member

pfmoore commented Sep 5, 2024

It's only a standards issue in the sense that pip's existing installer features don't currently work in this scenario

I guess that's true, yes. Given that pip is tending to add features based on standards these days, rather than innovating functionality, I think I'd rather see a standards-based solution for this, though.

No, but pip documents how to: https://pip.pypa.io/en/stable/topics/secure-installs/ to pin a requirement via hashes, and it allows a user to specify them in a constraints file, but then it doesn't functionally let the user constrain to those hashes.

As you noted, though, constraints files don't allow hashes. Constraint files have historically been very under-documented and prior to the new resolver implementation, had some odd behaviours. They were streamlined and clarified when we implemented them for the new resolver, to act in the "package finding" phase to limit what files the finder could see. With that design, I'm not sure that including hashes in a constraint file makes sense (hash checking happens much later in the install process, if I recall the details correctly).

Also, the details of what configuration is shared between the main pip process and the (recursive1) build environment construction is fairly underspecified, having been based on some quite simplified assumptions and then extended as needed.

With all of that in mind, re-working the build environment creation process to correctly pass through and respect hashes is likely to be a complicated design and implementation task, and with pip's limited maintainer base, I'm not sure it's the most important thing for us to tackle. All of which is why I'd prefer it if we had a standards-based solution, so the design is done for us, up front.

For a lock file this seems a little under specified. Specifically I would expect an actual standard around locking to let you lock build requirements per requirement, I guess I'll have to chime in on that very very long discuss thread 🙁

I view that section of the PEP as indicative that the intention is to cover this area, but it's not something that has had extensive discussion, so yes, it may be under specified. I'd strongly advise you to point out any issues if you think that's the case, as there's a risk otherwise that it'll get missed with all of people's energy having been used up by questions like portability of lockfiles.

Footnotes

  1. It's all very well supplying hashes for all of your build dependencies, but what about your build backend's build dependencies? Will we hit a "hash has not been supplied" error on the next level down?

@notatallshaw
Copy link
Member

notatallshaw commented Sep 5, 2024

With that design, I'm not sure that including hashes in a constraint file makes sense (hash checking happens much later in the install process, if I recall the details correctly).

I think pip should immediately error out when hashes are included in the constraints file then, the current user experience is poor as it appears to the user that constraints are allowed, as the error pip produces is much later than reading the constraints file and is not immediately clear why it produces it.

With all of that in mind, re-working the build environment creation process to correctly pass through and respect hashes is likely to be a complicated design and implementation task, and with pip's limited maintainer base, I'm not sure it's the most important thing for us to tackle. All of which is why I'd prefer it if we had a standards-based solution, so the design is done for us, up front.

How would these issues be any different adopting this standard vs. supporting hashes in constraints files 😉?

@pfmoore
Copy link
Member

pfmoore commented Sep 5, 2024

I think pip should immediately error out when hashes are included in the constraints file then

I'm surprised that hashes are allowed in constraints files. It's possible I've misremembered and they are somhow used in some places? But if not, then yes, we shouldn't be allowing the user to enter plausible-looking data that we ignore.

How would these issues be any different adopting this standard vs. supporting hashes in constraints files 😉?

I'm hoping that "the community" will thrash out the various design issues, so we don't have to 🙂 In particular, I'm hoping the people participating in the lockfile discussion are more motivated to work through the various questions, particularly around recursive build requirements, than we are here (I'm personally only looking at this as a theoretical problem, I have no need myself for locking build requirements).

@notatallshaw
Copy link
Member

notatallshaw commented Sep 5, 2024

Okay, so summarising this:

  • OPs use case would not work even if pip ignored required hashes for local sdists
  • Pip currently has no way to specify hashes for build requirements in an isolated build environment
  • This may be solved in the future with PEP 751 if it's accepted and if someone implements it in pip
  • Currently the only way to use pinned hashes for a build environment is to create your own build environment and use --no-isolated-build to build the source/distribution
  • Providing a hash in a constraints file should probably at least produce a warning to the user that they can't constraint requirements via hashes (I'll take a look later if this is an easy PR)

Given this, does someone still have a scenario where skipping required hashes for local sdists is required for their workflow? And if so, why providing a hash as part of a requirements file doesn't work for them? e.g.

echo "file://$(realpath {sdist}.tar.gz) --hash=sha256:$(sha256sum {sdist}.tar.gz | cut -d' ' -f1)" > sdist-requirements.txt
pip {command} -r sdist-requirements.txt

@notatallshaw
Copy link
Member

It should be noted, since opening this issue, uv has gone from not being able to build wheels to fully supporing @alex workflow (as I beleive he is aware):

So, the answer to "How can I use pinned build requirement hashes in an isolated build environment?", is to use uv 0.4.6+.

@alex
Copy link
Member Author

alex commented Sep 5, 2024 via email

@pfmoore
Copy link
Member

pfmoore commented Sep 5, 2024

It should be noted, since opening this issue, uv has gone from not being able to build wheels to fully supporing @alex workflow

Is it just me, or is it true that the correct answer to any feature request for pip these days is "wait 15 minutes until uv implements it"? 🙂

(To be clear, I'm not upset by this, if anything I'm just impressed and at the same time painfully aware of how much more it's possible to achieve with proper funding and reduced overhead from backward compatibility concerns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants