Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid hashes when running multiple Poetry installs simultaneously #5142

Closed
3 tasks done
hopper-signifyd opened this issue Feb 2, 2022 · 11 comments · Fixed by #6186
Closed
3 tasks done

Invalid hashes when running multiple Poetry installs simultaneously #5142

hopper-signifyd opened this issue Feb 2, 2022 · 11 comments · Fixed by #6186
Labels
kind/bug Something isn't working as expected

Comments

@hopper-signifyd
Copy link

^(please note that projects are in sibling directories named project_a and project_b)

Issue

When running multiple poetry installs simultaneously with a shared Poetry cache directory, the operation commonly fails with the infamous "invalid hashes" error. This is common in CI and monorepo environments. Here's a sample:

RuntimeError

  Invalid hashes (sha256:c7a7026632f45188f4a4548cc308c5c0683d9b8259da5cbfe0301f7527843eb4) for pandas (1.0.5) using archive pandas-1.0.5-cp36-cp36m-manylinux1_x86_64.whl. Expected one of
[omitted the other hashes for the sake of brevity]
sha256:faa42a78d1350b02a7d2f0dbe3c80791cf785663d6997891549d0f86dc49125e.

  at ~/.local/share/pypoetry/venv/lib/python3.9/site-packages/poetry/installation/executor.py:627 in _download_link
      623│                     )
      624│                 )
      625│
      626│             if archive_hashes.isdisjoint(hashes):
    → 627│                 raise RuntimeError(
      628│                     "Invalid hashes ({}) for {} using archive {}. Expected one of {}.".format(
      629│                         ", ".join(sorted(archive_hashes)),
      630│                         package,
      631│                         archive_path.name,

After receiving this error, if I run find . -name pandas-1.0.5-cp36-cp36m-manylinux1_x86_64.whl, and then run a checksum on the file, I usually get a SHA256 from the "expected" list in the error message. If I don't get an expected hash, it seems to be related to another poetry install process that's still running and downloading that artifact.

My current working theory is something like this:

  • Poetry install process A checks cache for an arbitrary package (say pandas, since that's what's in the example error above). The process get a cache miss and starts downloading pandas.
  • Poetry install process B tries to install pandas. It checks the cache and find's A's pandas. However, this download is incomplete, so when process B does the hash check, it's wrong.
  • Process B fails
  • Process A finishes the download, checks the cache and checksum and succeeds.
  • I manually check the SHA256 of the file and see that it's correct because Process A has finished and I, as a human, am inherently slower than a computer.

Is there a way we can fix this so that multiple Poetry projects with a common cache directory can safely run simultaneously on the same machine? My initial proposed solution is to simply update the download process to download artifacts directly to the system's temp directory and only copy them into the cache once the download is complete. That way, all processes either get a cache miss, or a cache hit with a correct checksum.

Thoughts on this?

@hopper-signifyd hopper-signifyd added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Feb 2, 2022
@ehiggs
Copy link

ehiggs commented Feb 3, 2022

This can happen if you ^C poetry when it's installing. When you rerun it, it finds the local file in the cache and calculates the hash of the truncated file.

Need to download to a . file or .download or something and move it into place to avoid this.

My initial proposed solution is to simply update the download process to download artifacts directly to the system's temp directory and only copy them into the cache once the download is complete.

This means copying across file systems which can be expensive. Would be best to use the same dir as where it's being downloaded to make the 'move into place' ~atomic.

@radiophysicist
Copy link

Today I've faced the same problem with poetry 1.1.12 not using parallel installs.
Tried to clear cache with poetry cache clear without any success.
After removing file manually, installation has been finished successfully

@NixBiks
Copy link

NixBiks commented Apr 20, 2022

After removing file manually, installation has been finished successfully

Removing which file? I have the same issue in a Dockerfile and can't figure out the workaround

@radiophysicist
Copy link

After removing file manually, installation has been finished successfully

Removing which file? I have the same issue in a Dockerfile and can't figure out the workaround

I mean locating broken file in the cache dir and removing it (poetry cache clear hasn't helped).
But it is not the case for building docker image, I believe

@mrkeuz
Copy link

mrkeuz commented Jun 12, 2022

Same issue, but after interrupting poetry add downloading.

Found .whl file cached in "artifacts" cache with wrong hash in ~/.cache/pypoetry/artifacts. Seems incomplete download give wrong hash and not cleaned after interrupt.

Workaround that helps me, inspired by, tested on linux:

rm poetry.lock
poetry cache clear . --all
rm -rf "$(poetry env info --path)"
rm -rf "$(poetry config cache-dir)/artifacts"

poetry install

Poetry version 1.1.13

@wuyuanyi135
Copy link

Same issue when installing pyside6 and network went down. I have to remove the artifacts directory to make it work again. poetry cache clear did not help.

@earonesty
Copy link

earonesty commented Jul 12, 2022

This PR will resolve this issue: #3301. Looks ready to go.

@hoopengo
Copy link

poetry cache clear pypi --all

@zweger
Copy link
Contributor

zweger commented Aug 17, 2022

I've opened PR #6186 to address this issue.

@whg517
Copy link

whg517 commented Aug 18, 2022

Hello, I provide another recurrence of the situation that appears in this error. This condition results in the same error Invalid hashes, but with a different premise.

When installing a dependency using poetry add, such as poetry add pandas, because it relies on Numpy, package size 13MB. When the network is very slow, I might use Ctrl+c to stop the process. Then configure other repository to reinstall,

The Numpy file was only 10% downloaded. When I run it again, poetry hash the incomplete file and the result is Invalid Hashes.

I'm sure I already have this situation when I look at the cache directory numpy... The WHL package is only 133KB.

So the idea is that in poetry you can download the cache file to the/TMP directory and move it around when it's finished downloading.

My Environment:

  • mac m1
  • Python 3.10

neersighted pushed a commit that referenced this issue Sep 15, 2022
…e cache (#6186)

If one poetry installation is writing to the cache while a second
installer is attempting to read that cache file, the second installation
will fail because the in-flight cache file is invalid while it is still
being written to by the first process.

This PR resolves this issue by having Poetry write to a temporary file
in the cache directory first, and then rename the file after it's
written, which is ~atomic.

Resolves: #5142 

I'm not sure how to test this change, as the conditions which cause this
bug to appear are a little hard to reproduce.
@mkniewallner mkniewallner removed the status/triage This issue needs to be triaged label Sep 18, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.