New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking is slow (and performs redundant downloads) #2284

Open
colllin opened this Issue May 31, 2018 · 32 comments

Comments

Projects
None yet
@colllin
Copy link

colllin commented May 31, 2018

Is this an issue with my installation? It happens on all of my machines... Is there anything I/we can do to speed it up?

I install one package and the locking seems to take minutes.

Locking [packages] dependencies…
$ python -m pipenv.help output

Pipenv version: '2018.05.18'

Pipenv location: '/Users/colllin/miniconda3/lib/python3.6/site-packages/pipenv'

Python location: '/Users/colllin/miniconda3/bin/python'

Other Python installations in PATH:

  • 2.7: /usr/bin/python2.7

  • 2.7: /usr/bin/python2.7

  • 3.6: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6m

  • 3.6: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6

  • 3.6: /Users/colllin/miniconda3/bin/python3.6

  • 3.6: /Users/colllin/.pyenv/shims/python3.6

  • 3.6: /usr/local/bin/python3.6

  • 3.6.3: /Users/colllin/miniconda3/bin/python

  • 3.6.3: /Users/colllin/.pyenv/shims/python

  • 2.7.10: /usr/bin/python

  • 3.6.4: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3

  • 3.6.3: /Users/colllin/miniconda3/bin/python3

  • 3.6.4: /Users/colllin/.pyenv/shims/python3

  • 3.6.4: /usr/local/bin/python3

PEP 508 Information:

{'implementation_name': 'cpython',
 'implementation_version': '3.6.3',
 'os_name': 'posix',
 'platform_machine': 'x86_64',
 'platform_python_implementation': 'CPython',
 'platform_release': '17.5.0',
 'platform_system': 'Darwin',
 'platform_version': 'Darwin Kernel Version 17.5.0: Mon Mar  5 22:24:32 PST '
                     '2018; root:xnu-4570.51.1~1/RELEASE_X86_64',
 'python_full_version': '3.6.3',
 'python_version': '3.6',
 'sys_platform': 'darwin'}

System environment variables:

  • TERM_PROGRAM
  • NVM_CD_FLAGS
  • TERM
  • SHELL
  • TMPDIR
  • Apple_PubSub_Socket_Render
  • TERM_PROGRAM_VERSION
  • TERM_SESSION_ID
  • NVM_DIR
  • USER
  • SSH_AUTH_SOCK
  • PYENV_VIRTUALENV_INIT
  • PATH
  • PWD
  • LANG
  • XPC_FLAGS
  • PS1
  • XPC_SERVICE_NAME
  • PYENV_SHELL
  • HOME
  • SHLVL
  • DRAM_ROOT
  • LOGNAME
  • NVM_BIN
  • SECURITYSESSIONID
  • _
  • __CF_USER_TEXT_ENCODING
  • PYTHONDONTWRITEBYTECODE
  • PIP_PYTHON_PATH

Pipenv–specific environment variables:

Debug–specific environment variables:

  • PATH: /Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/colllin/miniconda3/bin:/Users/colllin/.pyenv/plugins/pyenv-virtualenv/shims:/Users/colllin/.pyenv/shims:/Users/colllin/.pyenv/bin:/Users/colllin/.nvm/versions/node/v8.1.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
  • SHELL: /bin/bash
  • LANG: en_US.UTF-8
  • PWD: /Users/.../folder

Contents of Pipfile ('/Users/.../Pipfile'):

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
gym-retro = "*"

[dev-packages]

[requires]
python_version = "3.6"
@bjarchi

This comment has been minimized.

Copy link

bjarchi commented May 31, 2018

@colllin Have you checked to see whether pip commands that contact the server - like pip search (I think) - are also slow?

I see similar behavior, but it's kind of a known issue and network dependent. For some reason, access to pypi.org from my work network is incredibly slow but it is normally fast from my home network. I think locking does a lot of pip transactions under the hood, so slow access to the server slows the operation a lot.

EDIT: It may also be that you just have a lot of sub-dependencies to resolve - how big is the environment once created (e.g. how many top-level packages in your pipfile, and how many packages returned by pip list once the environment is bootstrapped)?

@colllin

This comment has been minimized.

Copy link
Author

colllin commented Jun 1, 2018

Thank you for the thoughtful response.

pip search isn't especially fast or slow for me... ~1 second?

Forgive me for my lack of domain knowledge: Does it really need to pip search? Didn't it just install everything? Doesn't it just need to write down what is already installed? Or... since it ensures the existence of the lock file anyway, could it do this as it installs the packages, or before?

I'm guessing... pipenv uses pip under the hood? so the installation process is a black box, and it can't know the dependency graph of what was/will be installed without doing a pip search its own pip queries?

EDIT: There is 1 top-level package, and ~65 packages returned by pip list in this particular repo.

@bjarchi

This comment has been minimized.

Copy link

bjarchi commented Jun 1, 2018

I'm not a contributor to the project and at the moment I don't know all the specifics, but my understanding is that the locking phase is where all of the dependencies get resolved and pinned. So if you have one top-level package with ~65 dependencies, it's during the locking phase that all of the dependencies of that first package are (recursively) discovered, and then the dependency tree is used to resolve which packages need to be installed and (probably) in what rough order they should be installed in. Not as sure about the last part.

If you pip install from a Pipfile without a lockfile present, you'll notice that it does the locking phase before installing the packages into the venv. Similarly if you have a lockfile but it's out of date. I suspect having a lockfile and installing using the --deploy option would be faster, as would the --no-lock option; in the former case you get an error if the lockfile is out of date, in the latter you lose the logical splitting of top-level packages (declared environment) and the actual installed (locked) environment of all packages. At least this is how I understand it.

Whether or not pipenv uses pip under the hood - I think it does - it still needs to get the information from the pypi server(s) about package dependencies and the like, so my question about pip search was more a proxy for how fast or slow your path to the pypi server is than a direct implication about the mechanism by which pipenv does its thing.

An interesting experiment might be to compare the time required for locking the dependency tree in pipenv, and installing requirements into a new venv using pip install -r requirements.txt. I think they should be doing pretty similar things during the dependency resolution phase.

@uranusjr uranusjr added the future label Jun 6, 2018

@uranusjr uranusjr changed the title Why is locking so slow? Locking is slow (and performs redundant downloads) Jun 6, 2018

@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Jun 7, 2018

Have we established somewhere that there are redundant downloads happening btw? I suspect that is the case but proving it would be really helpful

FYI comparing pip install -r requirements.txt to the time it takes to lock a dependency graph is not going to be informative as a point of comparison. Pip doesn't actually have a resolver, not in any real sense. I think I can describe the difference. When pip installs your requirements.txt, it follows this basic process:

  • Find the first requirement listed
    • Find all of its dependencies
    • Install them all
  • Find the second requirement listed
    • Find all of its dependencies
    • Install them all
  • Find the third requirement listed
    • Find all of its dependencies
    • Install them all

This turns out to be pretty quick because pip doesn't really care if the dependencies of package 1 conflicted with the dependencies of package 3, it just installed the ones in package 3 last so that's what you get.

Pipenv follows a different process -- we compute a resolution graph that attempts to satisfy all of the dependencies you specify, before we build your environment. That means we have to start downloading, comparing, and often times even building packages to determine what your environment should ultimately look like, all before we've even begun the actual process of installing it (there are a lot of blog posts on why this is the case in python so I won't go into it more here).

Each step of that resolution process is made more computationally expensive by requiring hashes, which is a best practice. We hash incoming packages after we receive them, then we compare them to the hashes that PyPI told us we should expect, and we store those hashes in the lockfile so that in the future, people who want to build an identical environment can do so with the contractual guarantee that the packages they build from are the same ones you originally used.

Pip search is a poor benchmark for any of this, in fact any of pip's tooling is a poor benchmark for doing this work -- we use pip for each piece of it, but putting it together in concert and across many dependencies to form and manage environments and graphs is where the value of pipenv is added.

One point of clarification -- once you resolve the full dependency graph, installation order shouldn't matter anymore. Under the hood we actually pass --no-deps to every installation anyway.

As a small side-note, pip search is currently the only piece of pip's tooling that relies on the now deprecated XMLRPC interface, which is uncacheable and very slow. It will always be slower than any other operation.

@jhrmnn

This comment has been minimized.

Copy link

jhrmnn commented Jun 17, 2018

Locking numpy (and nothing else) takes 220 s on my machine (see below). Most of the time seems to be spent downloading more than 200MB of data, which is quite puzzling given that the whole numpy source has 4 MB. Though clearly even if that was instant, there's still 25 s of actual processing, and even that seems excessive to calculate a few hashes. Subsequent locking, even after deleting Pipenv.lock, takes 5 s.

11:46 ~/Co/Ce/torchdft time pipenv install
Creating a virtualenv for this project…
Using /usr/local/Cellar/pipenv/2018.5.18/libexec/bin/python3.6 (3.6.5) to create virtualenv…
⠋Already using interpreter /usr/local/Cellar/pipenv/2018.5.18/libexec/bin/python3.6
Using real prefix '/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6'
New python executable in /Users/hermann/.local/share/virtualenvs/torchdft-mABBUp_t/bin/python3.6
Also creating executable in /Users/hermann/.local/share/virtualenvs/torchdft-mABBUp_t/bin/python
Installing setuptools, pip, wheel...done.

Virtualenv location: /Users/hermann/.local/share/virtualenvs/torchdft-mABBUp_t
Creating a Pipfile for this project…
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Updated Pipfile.lock (ca72e7)!
Installing dependencies from Pipfile.lock (ca72e7)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0/0 — 00:00:00
To activate this project's virtualenv, run the following:
 $ pipenv shell
        7.81 real         6.39 user         1.64 sys
11:46 ~/Co/Ce/torchdft time pipenv install numpy --skip-lock
Installing numpy…
Collecting numpy
  Using cached https://files.pythonhosted.org/packages/f6/cd/b2c50b5190b66c711c23ef23c41d450297eb5a54d2033f8dcb3b8b13ac85/numpy-1.14.5-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.14.5

Adding numpy to Pipfile's [packages]…
        4.97 real         2.88 user         1.81 sys
11:46 ~/Co/Ce/torchdft time pipenv lock --verbose
Locking [dev-packages] dependencies…
Using pip: -i https://pypi.org/simple

                          ROUND 1                           
Current constraints:

Finding the best candidates:

Finding secondary dependencies:
------------------------------------------------------------
Result of round 1: stable, done

Locking [packages] dependencies…
Using pip: -i https://pypi.org/simple

                          ROUND 1                           
Current constraints:
  numpy

Finding the best candidates:
  found candidate numpy==1.14.5 (constraint was <any>)

Finding secondary dependencies:
  numpy==1.14.5 not in cache, need to check index
  numpy==1.14.5             requires -
------------------------------------------------------------
Result of round 1: stable, done

Updated Pipfile.lock (4fccdf)!
      219.24 real        25.14 user         5.77 sys
@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Jun 26, 2018

Numpy should be substantially faster now (I have been using your example as a test case in fact!). As of my most recent test, I had it at ~30s on a cold cache on a vm.

Can you confirm any improvements with the latest release?

@jhrmnn

This comment has been minimized.

Copy link

jhrmnn commented Jun 27, 2018

It has improved substantially for me as well. I'm now sitting on a very fast connection, and got as low as 14 s, but that was when the downloading went at 30 MB/s. What is being downloaded besides a single copy of the source code of numpy?

@uranusjr

This comment has been minimized.

Copy link
Member

uranusjr commented Jun 27, 2018

I think we’re downloading redundant wheels (not sure). We’re already evaluating the situation.

@abhi-jha

This comment has been minimized.

Copy link

abhi-jha commented Jun 29, 2018

I changed my Pipfile.lock drastically by uninstalling a fe changes and now deploying that on a different machine is freezing. Any particular fix for this?

@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Jun 29, 2018

It’s not recommended that you manually edit your lockfile. Without more information it’s not possible to help. Please open a separate issue.

@gviot

This comment has been minimized.

Copy link

gviot commented Jul 2, 2018

If you want to benchmark the performance of pipenv lock, you should try to add pytmx to your dependencies...
pipenv lock used to take 1 hour or more for us (we have a pretty slow internet), and after removing pytmx, we got down to about 5 minutes and finally pipenv is more usable.
I know pytmx is a large package because it's a big monolithic lib and depends on opengl/pygame and other related things, but it shouldn't take 1 hour to pipenv lock no matter how big the package

@uranusjr

This comment has been minimized.

Copy link
Member

uranusjr commented Jul 2, 2018

It doesn’t take one hour for me

$ cat Pipfile
[packages]
pytmx = "*"

$ time pipenv lock --clear
Locking [dev-packages] dependencies...
Locking [packages] dependencies...
Updated Pipfile.lock (eb50ab)!

real	0m2.827s
user	0m2.287s
sys	0m0.390s
@uranusjr

This comment has been minimized.

Copy link
Member

uranusjr commented Jul 2, 2018

Also PyTMX is less than 20kb on PyPI, and only has one dependency to six (which is super small), so networking shouldn’t be an issue. There is likely something else going on in your environment.

@gviot

This comment has been minimized.

Copy link

gviot commented Jul 2, 2018

you're right it's smaller than I thought it does not depend explicitly on pygame and such, not sure why it was taking so long then !
I will try to find more information but i have a top CPU and SSD so I still think the issue is related to our slow internet

@abhi-jha

This comment has been minimized.

Copy link

abhi-jha commented Jul 4, 2018

@techalchemy I didn't edit the file manually. I uninstalled a lot of dependencies using pipenv uninstall package_name and afterwards ran it on the server. It stayed in the lock state for a very long time.

@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Jul 4, 2018

I am not interested in spending energy on this discussion with random shots in the dark. Please provide a reproducible test case.

@Mathspy

This comment has been minimized.

Copy link

Mathspy commented Jul 21, 2018

Here's what I hope is a reproducible test case
https://github.com/Mathspy/tic-tac-toe-NN/tree/ab6731d216c66f5e09a4dabbe383df6dc745ba18

Attempting to do
pipenv install
in this lock-less repository have so far downloaded over 700MBs or so while it displayed
Locking [packages] dependencies...

Will give up in a bit and rerun with --skip-lock until it's fixed

Mathspy added a commit to Mathspy/tic-tac-toe-NN that referenced this issue Jul 22, 2018

Locking down packages manually until pipenv matures enough
Sadly due to several issues with Pipfile.lock, it seems to either hang or take forever as seen here pypa/pipenv#2284 here pypa/pipenv#1816 and here pypa/pipenv#356 making it extremely unusuable
For that reason instead of switching to an alternative solution, I have decided to pin all the dependencies so that this project doesn't start failing with breaking changes in the future
I will probably come back here and generate lock and use semver minors/patches only once pipenv have matured enough, good luck to its developers!

Mathspy added a commit to Mathspy/tic-tac-toe-NN that referenced this issue Jul 23, 2018

Locking down packages manually until pipenv matures enough
Sadly due to several issues with Pipfile.lock, it seems to either hang or take forever as seen here pypa/pipenv#2284 here pypa/pipenv#1816 and here pypa/pipenv#356 making it extremely unusuable
For that reason instead of switching to an alternative solution, I have decided to pin all the dependencies so that this project doesn't start failing with breaking changes in the future
I will probably come back here and generate lock and use semver minors/patches only once pipenv have matured enough, good luck to its developers!

Mathspy added a commit to Mathspy/tic-tac-toe-NN that referenced this issue Jul 23, 2018

Locking down packages manually until pipenv matures enough
Sadly due to several issues with Pipfile.lock, it seems to either hang or take forever as seen here pypa/pipenv#2284 here pypa/pipenv#1816 and here pypa/pipenv#356 making it extremely unusuable
For that reason instead of switching to an alternative solution, I have decided to pin all the dependencies so that this project doesn't start failing with breaking changes in the future
I will probably come back here and generate lock and use semver minors/patches only once pipenv have matured enough, good luck to its developers!
@ushuz

This comment has been minimized.

Copy link

ushuz commented Aug 30, 2018

I noticed that lock was really slow and downloaded huge amount of data from files.pythonhosted.org, more than 800MB for a small project that depends on scipy flask etc.

So I sniffed the requests made to files.pythonhosted.org, and it turns out that pip or pipenv were doing completely unnecessary downloads, which makes lock painfully slow.

1535625096148

For example, same version numpy had been downloaded several times in full. And it downloaded wheels for windows / linux, although I was using a Mac.

My setup:

$ pipenv --version
pipenv, version 2018.05.18

$ pip -V
pip 18.0 from /usr/local/lib/python2.7/site-packages/pip (python 2.7)
@AlJohri

This comment has been minimized.

Copy link
Contributor

AlJohri commented Oct 15, 2018

are additional Pipfiles helpful for debugging here?

@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Oct 16, 2018

Most likely @AlJohri, also any info about running processes / locks / io would help

@megravity

This comment has been minimized.

Copy link

megravity commented Oct 25, 2018

screenshot 2018-10-25 at 12 27 07

Been stuck here for about 5 minutes already. First thought it might have been some sort of pip install issues and reinstalled everything fresh via Homebrew, but still the same problem. Any ideas why?
@megravity

This comment has been minimized.

Copy link

megravity commented Oct 25, 2018

Finally finished after about 6 - 7 minutes. Pretty new to Python and Pipenv, so a little help about where to find the necessary files for debugging would be great! :)

@vamseekm

This comment has been minimized.

Copy link

vamseekm commented Oct 30, 2018

this is pretty bad to the point I am afraid to install new python libs or upgrade existing ones.

@umgupta

This comment has been minimized.

Copy link

umgupta commented Dec 12, 2018

After watching one of the talks from the creator, I decided to use pipenv. But it is too slow.

@techalchemy

This comment has been minimized.

Copy link
Member

techalchemy commented Dec 13, 2018

Thanks for your insightful feedback.

@umgupta

This comment has been minimized.

Copy link

umgupta commented Dec 13, 2018

@techalchemy If there is something I could do to help and fix this. I am very happy to contribute.

@samhavens

This comment has been minimized.

Copy link

samhavens commented Dec 18, 2018

I noticed that lock was really slow and downloaded huge amount of data from files.pythonhosted.org, more than 800MB for a small project that depends on scipy flask etc.

I have a suspicion, though not conclusive evidence, that scipy is correlated with very long pipenv lock times.

@zkhan93

This comment has been minimized.

Copy link

zkhan93 commented Jan 26, 2019

really painful at times, I am installing PyPDF2 and textract; pipenv took ~10 mins to lock.

@earshinov

This comment has been minimized.

Copy link

earshinov commented Jan 27, 2019

The slowness of pipenv really hinders dev process for us. I now advise everyone to stick with pip + virtualenv until this issue is resolved.

@black-snow

This comment has been minimized.

Copy link

black-snow commented Feb 3, 2019

Any news on this? Any way to help?
dupe of #1914

/ edit: btw, why does pipenv install update the versions in the lockfile? o.Ò I just ran it after locking timed out and now that I look at the new lock file I see pandas was updated from 0.23.4 to 0.24.0, numpy from 0.16.0 to 0.16.1, etc... Didn't expect that to happen unless I did pipenv update ...

@awhillas

This comment has been minimized.

Copy link

awhillas commented Feb 5, 2019

I find it install quickly and locks slowly, so as soon as you get the Installation Succeeded message your good to continue working... unless you want to install something else...

@black-snow

This comment has been minimized.

Copy link

black-snow commented Feb 5, 2019

... or need to push the lock file into some repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment