Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up unpack_file_url #2196

Merged
merged 1 commit into from Dec 16, 2014
Merged

Conversation

msabramo
Copy link
Contributor

by ignoring ('.tox', '.git', 'build') when doing shutil.copytree.

Fixes: GH-2195

Before
$ time pip install --no-install ~/dev/git-repos/pip
DEPRECATION: --no-install and --no-download are deprecated. See https://github.com/pypa/pip/issues/906.
Processing /Users/marca/dev/git-repos/pip
  Requirement already satisfied (use --upgrade to upgrade): pip==6.0.dev1 from file:///Users/marca/dev/git-repos/pip in /Users/marca/dev/git-repos/pip
pip install --no-install ~/dev/git-repos/pip  2.80s user 5.86s system 50% cpu 17.205 total
After
$ time pip install --no-install ~/dev/git-repos/pip
DEPRECATION: --no-install and --no-download are deprecated. See https://github.com/pypa/pip/issues/906.
Processing /Users/marca/dev/git-repos/pip
  Requirement already satisfied (use --upgrade to upgrade): pip==6.0.dev1 from file:///Users/marca/dev/git-repos/pip in /Users/marca/dev/git-repos/pip
pip install --no-install ~/dev/git-repos/pip  0.80s user 0.87s system 70% cpu 2.392 total

From 17 seconds to less than 3 seconds; not bad! 😄

@msabramo
Copy link
Contributor Author

I wonder if it would be even better to ignore everything that is not in the manifest...? Though we could certainly start with this and add that later...

@tomprince
Copy link

It would perhaps be better to make an sdist, instead of duplicating that logic in pip.

@msabramo
Copy link
Contributor Author

My inclination is to maybe add the sdist thing in another PR and lobby to have this PR merged as-is, as I'm not clear yet on how making the sdist would work and this speeds up things considerably as-is.

@msabramo
Copy link
Contributor Author

@dstufft, @pfmoore: What do you guys think?

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

Seems reasonable, although tbh it makes me slightly nervous for some reason (specifically including "build"). What happens if a project has the structure

foo
|--- setup.py
|--- foo
     |--- __init__.py
     |--- build
          |--- __init__.py

Would foo/build be skipped? (I'm not 100% sure how shutil.ignore_patterns works).

OTOH, I like the idea of building via a sdist. That would be a nice change (but it's more disruptive, so I agree it should be a separate PR).

@dstufft
Copy link
Member

dstufft commented Dec 16, 2014

Honestly the thing that worries me the most about this change is that I don't like the fact that it just hard-codes a bunch of common directories. I mean, probably it'll be ok, especially the .git and .tox rules, but it makes me feel a little gross. I'm not sure if it's gross enough to make me be against merging it though!

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

@dstufft Yeah, that's basically my problem with it (assuming it doesn't have a bug with lower-level directories called build like I said above). But improving the slow copy step is a definite plus...

@msabramo
Copy link
Contributor Author

Yeah, it's definitely a bit gross (though waiting 17 seconds to install something local is gross too 😄).

I wonder if there's some middle ground, like what if I read the list of paths out of .gitignore or .hgignore (but using the .hgignore only if it uses glob patterns; not regexes?)?

@msabramo
Copy link
Contributor Author

         shutil.copytree(
             link_path, location, symlinks=True,
-            ignore=shutil.ignore_patterns('.tox', '.git', 'build'))
+            ignore=shutil.ignore_patterns('.*'))

Less gross?

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

I wonder if it's best to leave it for now (it's been around forever, so there's not that much urgency) and focus more on the idea of building a sdist and installing that. That should be a lot faster, I'd have thought... Actually, I just did a quick test on a random pip checkout - python -m pip install -U . took 53.8s whereas python setup.py sdist; python -m pip install -U dist\pip-6.0.dev1.zip took 15.7s. A very quick hack to test the principle behind this patch (copy the pip directory, delete .tox and .git, run python -m pip install -U .) took 27.1s. Deleting another big file (pip.zip) that I had round brought it down to 25.4s, but there's no way heuristics could do that.

So going via sdist is a major win. The heuristics are not as good, but still significant, but they are ugly and there's a small chance of breakage.

@msabramo
Copy link
Contributor Author

@pfmoore: Thanks for doing that quick little benchmark. Building an sdist sounds even better. It hadn't occurred to me that folks could have files laying around in their directory like pip.zip that simple heuristics don't have a chance of handling.

I'd be up for taking a stab at the sdist -- how are folks envisioning building it? Shelling out to python setup.py sdist? Invoking setuptools/distutils? Assuming the latter would be nicer.

You don't want to merge this PR in the meantime? Now it just filters out .*, which should cover a lot of the big things like .tox, .git, and .hg and I think is unlikely to cause problems, unless people package dotfiles in their packages, which seems doubtful.

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

@msabramo I'd assume we would simply shell out to python setup.py sdist the same way we shell out to python setup.py install for a source install (that is, with some hacking to inject setuptools, etc). Basically, setuptools/distutils don't support in-process building, the shell interface is the only supported one.

I'm not against merging this, just a bit nervous it could cause breakages. I'd like some of the other devs to confirm they are OK with it before doing so. @dstufft @jezdez @qwcode ?

I just have this nervous concern that with the current heuristic of .*, someone might have created a .version.txt file that they read in setup.py to get the package version. I don't mind breaking code like that, personally, but let's see what the others say.

@msabramo
Copy link
Contributor Author

Ah, .version.txt -- that's interesting. I've never seen one but it's a possibility.

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

I don't think it's a sane idea (it's hidden on Unix, after all!) hence why I don't mind breaking it...

@dstufft
Copy link
Member

dstufft commented Dec 16, 2014

I think a .* rule makes me nervous, but a rule for .tox and the VCS directories only makes me feel a little bad, so I'm not opposed to the limited scope one, I think I am opposed to the .* rule though.

by ignoring .tox, .git, .hg, .bzr, and .svn  when doing
`shutil.copytree` in unpack_file_url in pip/download.py.

Fixes: pypaGH-2195
@msabramo
Copy link
Contributor Author

diff --git a/pip/download.py b/pip/download.py
index 2f4a349..50b81b3 100644
--- a/pip/download.py
+++ b/pip/download.py
@@ -634,7 +634,8 @@ def unpack_file_url(link, location, download_dir=None):
             rmtree(location)
         shutil.copytree(
             link_path, location, symlinks=True,
-            ignore=shutil.ignore_patterns('.*'))
+            ignore=shutil.ignore_patterns(
+                '.tox', '.git', '.hg', '.bzr', '.svn'))
         if download_dir:
             logger.info('Link is a directory, ignoring download_dir')
         return

@xavfernandez
Copy link
Member

I guess the best would be to follow the MANIFEST.in by using https://github.com/pypa/pip/blob/develop/pip/_vendor/distlib/manifest.py

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2014

@msabramo Yep, I'd go with that.

@xavfernandez If we're going to respect MANIFEST.in let's just bite the bullet and do it by building via sdist. After all, it's only a speedup (a big one, certainly, but for a relatively uncommon case) and I'd prefer not to add complexity (and hence risk). Building from sdist, on the other hand, has the advantage of failing only if your sdist is getting built wrongly, which is probably a good thing anyway.

@dstufft
Copy link
Member

dstufft commented Dec 16, 2014

I agree with @pfmoore.

@msabramo
Copy link
Contributor Author

@msabramo Yep, I'd go with that.

OK, that's what I have now (https://github.com/pypa/pip/pull/2196/files) so I guess that this is ready to be merged.

pfmoore added a commit that referenced this pull request Dec 16, 2014
@pfmoore pfmoore merged commit 478f784 into pypa:develop Dec 16, 2014
@msabramo msabramo deleted the speed_up_unpack_file_url branch December 16, 2014 20:10
@fungi
Copy link
Contributor

fungi commented Dec 22, 2014

PBR actually makes use of .git in the build dir, so as to be able to determine versions from Git tags present, generate AUTHORS files from the revision history, et cetera. It will at least need some option to be able to override this behavior.

@rbtcollins
Copy link

Seconding that, please make it opt-in (or better yet, just build an sdist in-place, which will dtrt with pbr afaict).

@dstufft
Copy link
Member

dstufft commented Dec 22, 2014

I'm going to revert this and drop the speed up until it can be done by building a sdist.

On Dec 22, 2014, at 4:17 PM, rbtcollins notifications@github.com wrote:

Seconding that, please make it opt-in (or better yet, just build an sdist in-place, which will dtrt with pbr afaict).


Reply to this email directly or view it on GitHub.

@bukzor
Copy link

bukzor commented Dec 31, 2014

@dstufft: It sounds like there should be an open issue to speed up pip install via an sdist intermediate step. Am I reading correctly?

@dstufft
Copy link
Member

dstufft commented Dec 31, 2014

Yea, sorry I forgot to open an issue about it.

@msabramo
Copy link
Contributor Author

I couldn't find an issue; I probably missed it, but in any case, I reopened #2195

@bdangit
Copy link

bdangit commented Apr 11, 2017

sdist way is not very ideal, especially when users want to use --editable which would allow the py code to be modified while stuff has been "built". Useful during development/debugging.

@dstufft
Copy link
Member

dstufft commented Apr 12, 2017

I don't think we use unpack_file_url in the -e path.

@vladfi1
Copy link

vladfi1 commented May 31, 2017

I'm not sure what the cause is, but pip install -e is super slow for me. It spends minutes waiting on Obtaining file://... even on relatively small directories.

@bukzor
Copy link

bukzor commented May 31, 2017 via email

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants