Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modern installer eats a lot of RAM when installing big wheels #7983

Closed
4 tasks done
ralbertazzi opened this issue May 22, 2023 · 7 comments · Fixed by #7987
Closed
4 tasks done

Modern installer eats a lot of RAM when installing big wheels #7983

ralbertazzi opened this issue May 22, 2023 · 7 comments · Fixed by #7987
Labels
area/installer Related to the dependency installer kind/bug Something isn't working as expected

Comments

@ralbertazzi
Copy link
Contributor

  • Poetry version: 1.5.0
  • Python version: 3.9.8
  • OS version and name: Ubuntu 20.04 (also observed on other platforms)
  • pyproject.toml:
[tool.poetry]
authors = ["Author <foo@bar.com>"]
description = "Project"
name = "project"
version = "1.0.0"

[tool.poetry.dependencies]
python = "~3.9"

torch = { url = "https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp39-cp39-linux_x86_64.whl" }
  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

Poetry's modern installer is using a lot of RAM when installing big wheels, such as the PyTorch one (>1 GB) that I provided in the example. The RAM usage starts increasing while Poetry is Installing... the wheel, and it spikes over 10 GB of RAM used while performing the operation. The RAM usage goes back to acceptable values before the Installing... operation is done.

Deactivating the modern installer (poetry config installer.modern-installation false) fixes the problem. The RAM usage does not go beyond 1 GB there.

I haven't debugged deeply enough to tell if it's a Poetry issue or an pypa/installer issue, but for sure Poetry is affected by it. This is bad for CI environments where resources are more limited and currently cause poetry install to go OOM.

cc #6409

@ralbertazzi ralbertazzi added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels May 22, 2023
@ralbertazzi ralbertazzi changed the title Modern installer eats a lot of RAM for when installing big wheels Modern installer eats a lot of RAM when installing big wheels May 22, 2023
@ralbertazzi
Copy link
Contributor Author

Found the culprit being in validate_record. More specifically, it looks like the garbage collector can't stand behind self._zipfile.read(item). The RAM increases while the line is executed but it also seems to go quite down by placing a breakpoint on the next line.

I'm ensure what's the best fix for this, although a quick workaround for Poetry would be to call validate_record(validate_contents=False). By changing that line the RAM increase completely disappears.

@ralbertazzi
Copy link
Contributor Author

ralbertazzi commented May 22, 2023

If I'm not mistaken, Poetry already computes a hash on each wheel record in a more memory-friendly way (whether that's validated or not is yet unclear to me), so it looks like it's executing operations twice. Setting validate_contents=False in fact makes the installation process much faster (as we are not computing hashes twice)!

My proposal is then to permanently set validate_contents=False while possibly making use of the hashes in the WheelDestination.finalize_installation callback

@dimbleby
Copy link
Contributor

dimbleby commented May 22, 2023

I would think that pypa/installer can be taught not to eat all that RAM. It reads the files in the wheel for two reasons

  • to get their size
  • to calculate their hash

The size can be read directly from the zipfile's ZipInfo, there's no need to open the file at all for that.

The hash can be calculated incrementally, there's no need to read the whole file into memory.

@radoering
Copy link
Member

I'm ensure what's the best fix for this, although a quick workaround for Poetry would be to call validate_record(validate_contents=False).

It might be ok to turn it off until there is an improvement in installer. IMO, at the moment validate_record() is only a "nice to have" to print a warning. We do not depend on valid RECORD files and afaik no other installer (first of all pip) even bothers to validate them.

If I'm not mistaken, Poetry already computes a hash on each wheel record in a more memory-friendly way (whether that's validated or not is yet unclear to me), so it looks like it's executing operations twice.

Interesting. IIUC, the calculation is done there to create a RECORD file from the content of the wheel (because we can't rely on the included RECORD file 😉). I suppose it's not validated. Either, we are using installer the wrong way or that's another thing that could be improved at their end.

@ralbertazzi
Copy link
Contributor Author

ralbertazzi commented May 23, 2023

IIUC, the calculation is done there to create a RECORD file from the content of the wheel (because we can't rely on the included RECORD file 😉)

That's exactly what's happening. But I also confirm that the sha256 of files inside the archive is currently computed twice:

  • On validation: to compare it against the hash in the RECORD file
  • On installation: to write the RECORD file

To get the fastest installation, I would keep hash computation disabled on validation and find a way to use the computed hashes during install to raise warnings. It should be possible hacking around the SchemeDictionaryDestination class by overriding its finalize_installation callback. It looks to me like Poetry is already hacking its way around that class anyways 🤷

@dimbleby
Copy link
Contributor

dimbleby commented May 23, 2023

let's prefer to try and propose improvements to pypa/installer that work for poetry. That's playing a longer game, but gets to a better place.

eg perhaps it could expose an API for wheel installation that trusted the RECORD already in the wheel - that would seem a very reasonable thing to do for users who have validated that the RECORD is correct

(Perhaps the second calculation is nearly-free anyway, given that it happens while already copying the bytes around?)

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/installer Related to the dependency installer kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants