Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pip download to fetch package sources seems to trigger building wheels for some packages. #8387

Closed
tvalentyn opened this issue Jun 2, 2020 · 10 comments

Comments

@tvalentyn
Copy link

Environment

  • pip version: 20.1.1
  • Python version: 3.7
  • OS: Debian Linux

Description

In numpy/numpy#14053 we observed that

python3 -m pip download --dest /tmp numpy==1.18.0 --no-binary :all: --no-use-pep517

finishes immediately, while:

python3 -m pip download --dest /tmp numpy==1.18.0 --no-binary :all:

takes a while to complete.

We originally noticed the slowdown in https://issues.apache.org/jira/browse/ARROW-8983.
The command python -m pip download --dest /tmp pyarrow==0.17.0 --no-binary :all: also gets stuck for a few minutes with "Installing build dependencies ... ", and increased CPU usage when pip downloads sources of numpy, a dependency of pyarrow. Interestingly, in case of pyarrow, adding --no-use-pep517 does not help.

Is there an inefficiency in pip download, or it's a known behavior and is considered WAI?

Thank you.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Jun 2, 2020
@sbidoul
Copy link
Member

sbidoul commented Jun 3, 2020

I made a quick test in verbose mode, and it appears that part of the slowdown comes from --no-binary :all: which extends to build dependencies. Since build dependencies include cython, pip first builds cython from source before attempting to obtain metadata.

python3 -m pip download --dest /tmp numpy==1.18.0 --no-binary numpy (or --no-binary :all: --only-binary cython) is faster because it installs cython from binary wheel.

Nevertheless, that still involves some C compilation which does not occur with --no-use-pep517. I'm not sure why yet.

There is ongoing discussion on how/whether --no-binary should apply to build dependencies. I don't think that has been conclusive yet.

@jorisvandenbossche
Copy link

Naive question: why is it needed to build numpy or create its build environment when you only want to download the source code? Is it to get some metadata that is only available from a wheel?

@pfmoore
Copy link
Member

pfmoore commented Jun 3, 2020

Is it to get some metadata that is only available from a wheel?

Precisely that.

@jorisvandenbossche
Copy link

And how do you get that metadata in case of --no-use-pep517 ?

@sbidoul
Copy link
Member

sbidoul commented Jun 3, 2020

And how do you get that metadata in case of --no-use-pep517 ?

setup.py egg_info

@sbidoul
Copy link
Member

sbidoul commented Jun 7, 2020

I looked into this a bit further and it appears that setuptools' implementation of PEP 517 prepare_metadata_for_build_wheel() uses setup.py dist_info. In the case of numpy 1.18.0, this involves cython and some compilation (of _configtest.c).

setup.py egg_info runs instantly.

That explains the performance difference, although I can't explain myself why dist_info and egg_info behave differently, the resulting metadata (METADATA vs PKG-INFO) being very similar.

@uranusjr
Copy link
Member

uranusjr commented Jun 7, 2020

Numpy has a highly customised setup.py that triggers said compilation, unless explicitly excluded. egg_info is one of those commands.

See parse_setuppy_commands() here: https://github.com/numpy/numpy/blob/v1.18.0/setup.py

@zeha
Copy link

zeha commented Jul 7, 2020

Why does downloading an sdist need metadata from a wheel in the first place?

@zeha
Copy link

zeha commented Jul 7, 2020

Appears to be related to (duplicate of?) #7995.

@uranusjr
Copy link
Member

uranusjr commented Jul 9, 2020

Closing this in favour of #7995.

@uranusjr uranusjr closed this as completed Jul 9, 2020
@pradyunsg pradyunsg removed the S: needs triage Issues/PRs that need to be triaged label Feb 12, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants