New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] internal wheel cache #1572

Closed
wants to merge 2 commits into
base: develop
from

Conversation

Projects
None yet
10 participants
@qwcode
Contributor

qwcode commented Feb 16, 2014

here's a WIP implementation of an inline wheel build cache, that's opt-in.

here's the new pip install options:

--wheel-cache                  Build and Cache wheels
--wheel-cache-rebuild       Rebuild pre-existing wheels in the cache.
--wheel-cache-dir <dir>     Directory for cached wheels. Defaults to <...>
--wheel-cache-exclude <pkg> A Package to exclude from wheel building and caching.

much of the diff is shuffle. the new logic is primarily in 2 new functions:

  • InstallRequirement.build :this builds wheels into the cache
  • RequirementSet.find_requirement: this knows how to look in the cache first.

I still need to cleanup some things, fix/add tests, update docs, but wanted to put the basic implementation out there for review.

Over email, I mentioned doing the cache in a general way for sdists as well,
but atm, thinking that would just be more complex, and mostly pointless in the long run. (i.e. the --package-cache idea, #878, would go away)

A historial note: an early version of the wheel support, had a --build-wheel=DIR option, but it wasn't a full cache solution. see the discussion in #684 as to how we ended up with pip wheel. btw, I think pip wheel still has a place, after we have wheel caching.

qwcode added some commits Feb 16, 2014

@patcon

This comment has been minimized.

patcon commented Feb 27, 2014

Nice.

@pfmoore

This comment has been minimized.

Member

pfmoore commented Feb 27, 2014

I've not had a chance to try this out yet, but the UI looks good to me. I guess --wheel-cache-rebuild would be for something like a VCS URL or a local development directory, where the code might change with no version change. That's an obscure enough case that it's probably worth explaining in the docs.

Agreed that this doesn't make pip wheel obsolete.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Feb 27, 2014

I still need to come back to this, and work on a few things: logging, tests etc... just been busy.

My high level concerns were:

  • getting feedback on the new options
  • if others had more radical ideas about how to do this that involved more refactor
  • when was the right time to introduce such a major change? it is opt-in, but still, something like this could increase the use of wheels, and maybe wheels still need more time to mature?
@Ivoz

This comment has been minimized.

Member

Ivoz commented Mar 12, 2014

So what exactly is more awesome about this than the existing --download-cache?

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 12, 2014

Download Cache == you rebuild lxml every time you pip install it, Wheel Cache == you build it once and install from wheel from there on out.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Mar 12, 2014

and supposing theoretically, everything you needed was already a wheel on pypi, this is still faster than --download-cache, because the wheel cache is checked prior to walking pypi, which is not true for --download-cache.

it has been a frequent compaint, that --download-cache was strictly a download cache, and not a pre-finder "package cache", like many people wanted (see #878)

# TODO: quiet the failure logging from the finder
cache_finder = PackageFinder(find_links=[self.wheel_cache_dir], index_urls=[])
try:
url = cache_finder.find_requirement(req, upgrade=upgrade)

This comment has been minimized.

@qwcode

qwcode Mar 12, 2014

Contributor

upgrades and non-versioned reqs need to go through the normal finder

This comment has been minimized.

@qwcode

qwcode Mar 12, 2014

Contributor

well, I guess it's debateable whether a non-versioned req should just use what's in the cache, or check for the latest. maybe just use cache, unless --upgrade is set. that matches the way #878 was written up.

This comment has been minimized.

@pfmoore

pfmoore Mar 21, 2014

Member

I'm not sure I follow this. If I do pip install foo and foo 1.2 is the latest version on PyPI (sdist only), I'd expect to build a wheel in the cache and install it. If I later did the same install to a different venv, I'd expect to check PyPI, find there's no newer version, and use the wheel from cache (even if the project had added a 1.2 wheel to PyPI in the meantime). But if there is a newer version, why wouldn't I use that?

Your comment "it's debateable whether a non-versioned req should just use what's in the cache, or check for the latest" implies that you're thinking that pip install foo will get a 1.2 wheel from the cache even if 1.3 has been released on PyPI. I don't think that's right.

Looking at #878 I see that this may well be what you mean. Maybe there's a use case for it, but I do not think it's the right default. A "cache" should speed things up but not change behaviour.

This comment has been minimized.

@qwcode

qwcode Mar 21, 2014

Contributor

noted. I think I agree with you. just trying to cover all the angles. just realize that if foo is installed, pip install foo doesn't check for the latest version, so that's what I was mulling over.

pip install --wheel-cache --no-index foo is probably the answer for re-using cache without an expensive pypi check., or maybe a more explicit --use-cache is needed

@qwcode

This comment has been minimized.

Contributor

qwcode commented Mar 12, 2014

and moreover, the idea is that eventually pip will only know how to install from wheel. this is the opt-in step towards that.

@pfmoore

This comment has been minimized.

Member

pfmoore commented Mar 21, 2014

... which makes it even more important that enabling the cache doesn't result in changed behaviour.

This was referenced Apr 18, 2014

@dstufft dstufft added this to the Improve our User Experience milestone Apr 19, 2014

@pfmoore

This comment has been minimized.

Member

pfmoore commented May 12, 2014

Just a note here, Stefan Krah and Marc-Andre Lemburg have commented in a thread on distutils-sig (the relevant portion of the thread starts at around https://mail.python.org/pipermail/distutils-sig/2014-May/024251.html) that this may be an issue for some projects (that's heavily summarised, read the thread for details if it matters).

In my view:

  1. Not many projects will be affected in practice (only those that hack distutils in setuptools-incompatible ways, and we'll probably break on them already for other reasons)
  2. The feature is opt-in so if it won't work for a particular project, just don't use it.
  3. It's not an issue with this patch, it's an incompatibility between the projects in question and wheels.

But we should probably announce this feature on distutils-sig (I'll write the email) before it goes in, and we should definitely ensure there's chance for a discussion there before it becomes the default behaviour (assuming it does at some point).

@Ivoz

This comment has been minimized.

Member

Ivoz commented May 12, 2014

Not many projects will be affected in practice (only those that hack distutils in setuptools-incompatible ways, and we'll probably break on them already for other reasons)

I've found that although not many do this, only most of the bigger-and-extremely-well-used ones do this, like half of the numpy stack, and twisted :D

@qwcode

This comment has been minimized.

Contributor

qwcode commented May 13, 2014

The feature is opt-in so if it won't work for a particular project, just don't use it.

note that one of the proposed options (from the description) was --wheel-cache-exclude <pkg> for dealing with projects like this.

@dstufft dstufft referenced this pull request May 15, 2014

Closed

Obsolete pip2014.com #1731

4 of 5 tasks complete
# 'pip install --wheel-cache'
elif (url.filename.endswith(wheel_ext)
and self.wheel_cache_dir
):

This comment has been minimized.

@xavfernandez

xavfernandez May 17, 2014

Contributor

Since wheel_cache_dir has a default value, self.wheel_cache should also be checked.
This also asks what should happen when pip install --wheel-cache --download-dir=some_place
Should the wheel go to the default wheel-cache-dir or to "some_place" or to both ?

default_wheel_cache = os.path.join(default_storage_dir, 'wheel_cache')
if not os.path.isdir(default_wheel_cache):
os.makedirs(default_wheel_cache)

This comment has been minimized.

@bukzor

bukzor May 17, 2014

Contributor

Side effects at import time is a smell. It would be better to wait till after the final setting is known

@qwcode

This comment has been minimized.

Contributor

qwcode commented May 17, 2014

closing this to signal that I don't have the time (or motivation) right now to take this from a POC to a ready-to-merge feature. good luck to whoever picks up this idea.

@qwcode qwcode closed this May 17, 2014

@thedrow

This comment has been minimized.

thedrow commented May 18, 2014

:(

@xavfernandez

This comment has been minimized.

Contributor

xavfernandez commented May 21, 2014

I'm quite interested in implementing this feature.
The API proposed by qwcode (with --wheel-cache and co) seems to fit quite nicely.
Any idea of what should happen when combined with --download-dir ?
Should the wheel go to the default wheel-cache-dir or to the specified download-dir or to both ?

@pfmoore

This comment has been minimized.

Member

pfmoore commented Aug 26, 2014

@qwcode the lack of this annoyed me yet again today, so I have some motivation to work on it (maybe not enough to complete it, but who knows? ;-)) What in your view remains to be done? You mention docs and tests in the original comment, and there will obviously be work around any bugs that may appear as tests get written and/or users play with it, but is there anything more that you're aware of?

Also, what's the best way to get the current state of this work? Should I just pull the qwcode:wheel_cache branch and work from there?

@pfmoore

This comment has been minimized.

Member

pfmoore commented Aug 26, 2014

@qwcode Heh. The other key bit is probably "fix all the merge conflicts". Might take a bit longer than I was hoping :-)

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 26, 2014

Hmm, personally (without reading the code at all), I think it would be smart to refactor pip so that pip always builds Wheels if the only thing that is available is a sdist, and then installs from that built Wheel. This has a few benefits:

  1. We logically separate the "build" and "install" steps which should lead to saner, easier to rationalize code.
  2. We reduce the total number of paths for install down to 2 from 3 (sdist, wheel, develop -> wheel, develop).
  3. This minimizes the API contract that a build tool needs to implement, which makes it easier for a future where setuptools is just another optional build tool.

And some downsides:

  1. Not everything will install correctly from Wheels :(
  2. This changes what we expect from a setup.py. Instead of setup.py egg_info and setup.py install <blah blah> we only care about setup.py bdist_wheel.
@qwcode

This comment has been minimized.

Contributor

qwcode commented Aug 26, 2014

@dstufft btw, what you said would be smart, is exactly what the PR was trying to do. : )

as for everything not installing correctly, that was the intent of the --wheel-cache-exclude parameter.

@qwcode

This comment has been minimized.

Contributor

qwcode commented Aug 26, 2014

@pfmoore one specific thing that needed work was the logging, but you would see that once you got into it.

@pfmoore

This comment has been minimized.

Member

pfmoore commented Aug 26, 2014

@dstufft @qwcode the problem with the smart approach is that you don't really get the benefits unless the wheel cache is required, rather than opt-in (if it's opt-in you need to retain the current code paths as well as the simplified one).

I'd very much like to know how things ("half of the numpy stack, and twisted" were the ones @Ivoz mentioned) fail, because I think we should be working on addressing those failures independently of this PR. We'll never switch to a "via wheels only" mode until the failures are sorted, after all. For example, I thought the numpy issues were around architecture tags and things like that, which only really apply for distributed wheels, not for a cache (even if the tags aren't specific enough, the cache will never be used anywhere else anyway).

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 26, 2014

No, you'd still get the benefits because we'd build a wheel regardless of if the wheel cache is enabled or not. You always build wheels and install from that. The cache just controls whether we need to build a wheel everytime someone does pip install lxml or if it builds a wheel the first time, and then caches it.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 26, 2014

I also think that the wheel cache should be enabled by default, but we might want to wait a release or two to enable it by default so we don't get bad wheels cached that we want to purge since this is a fairly new/big change in steps.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 26, 2014

@qwcode WELP, I should probably read PRs before I comment, but I just wanted to toss that out there :)

@qwcode

This comment has been minimized.

Contributor

qwcode commented Aug 26, 2014

@dstufft no worries

@pfmoore

This comment has been minimized.

Member

pfmoore commented Aug 26, 2014

Oh, I see. Yes, that's a good idea. I can't tell from reading the diff if that's what the code here does - I'll have to read it in context.

Actually, build-via-wheel is likely the easiest bit (modulo issues with packages that don't play nicely with wheels) because it avoids all the complexities of managing/rebuilding/etc the cache. I'd be tempted to start by adding a --via-wheel option that did the install via a wheel, but made no attempt to cache the wheel (just build it in a temp dir). Then as phase 2, add cache management, and make via wheel the norm. I don't like the temporary --via-wheel option, though.

@dstufft

This comment has been minimized.

Member

dstufft commented Aug 26, 2014

I wouldn't bother with the --via-wheel option, i'd just make it so we always build wheels and maybe add an (temporary?) option to exclude from the Wheel building. Opt in means not many people will use it, opt out per package means that people will use it, and there will be a drive for the projects that can't be successfully built with Wheels to make it possible.

We probably need to figure out how to handle Numpy and friends in that though. If those don't work with the new path we might want to add those to the list of exclusions inside the code, and raise a warning about it that in the future they'll need to exclude them. We could even go one better, depending on what the failure modes look like, we could attempt to build a Wheel, and if that fails fall back to the old way with a deprecation warning?

@pfmoore

This comment has been minimized.

Member

pfmoore commented Aug 27, 2014

Ugh, that code's ugly... I thought of just patching InstallRequirement.install so that if it finds a sdist, it builds a wheel then creates a new InstallRequirement from the wheel and installs that instead. But you can't call install() on a requirement that's a wheel - you have to put it in a RequirementSet and call prepare_files. Because - well because something, I guess :-(

I can't even describe what (conceptually) an InstallRequirement is. It's not "something that can be installed" as I've just found out.

OK, rant over. I'll post this as a record should I get back to looking at this and need a reminder. But I doubt I'll be able to make much progress other than simply rebasing @qwcode's patch and seeing how we go from there. Time to give up on grand plans...

@bukzor

This comment has been minimized.

Contributor

bukzor commented Aug 27, 2014

Whatever is downloaded goes to the download-dir, and a wheel is put in the
wheel cache.

The download-dir may or may not gain a wheel, but the wheel cache always
does.

On May 21, 2014 2:07 PM, "xavfernandez" notifications@github.com wrote:

I'm quite interested in implementing this feature.
The API proposed by qwcode (with --wheel-cache and co) seems to fit quite
nicely.
Any idea of what should happen when combined with --download-dir ?
Should the wheel go to the default wheel-cache-dir or to the specified
download-dir or to both ?


Reply to this email directly or view it on GitHub
#1572 (comment).

@xavfernandez

This comment has been minimized.

Contributor

xavfernandez commented Aug 27, 2014

Yup, pip's code isn't simple...
As a first small step to resume the work of qwcode I tried to refactor some parts of pip.download module in #1822 .
If this can help...

@bukzor

This comment has been minimized.

Contributor

bukzor commented Nov 4, 2014

What was the conclusion here? I can't tell from the commentary.
Do we still want to implement such a thing, but in a different way?

@thedrow

This comment has been minimized.

thedrow commented Nov 4, 2014

@bukzor Yes and we need a volunteer that is able to implement this feature.

@bukzor

This comment has been minimized.

Contributor

bukzor commented Nov 5, 2014

@thedrow Thanks.

It seems like the design has changed since the OP.
If someone could explain what the new option are in the latest design, I could try my hand.

@dstufft Are you decided that the wheel-caching should be always-on, with a (soon deprecated) option to opt-out on a per-package basis?

Is there a reason to use a new wheel-cache-dir rather than re-using the already extant wheel-dir? I don't see that the semantics are significantly different between the two.

@bukzor

This comment has been minimized.

Contributor

bukzor commented Nov 5, 2014

At any rate, we need a new, open ticket for this issue, since there's still
an outstanding action item here.

It would be best if someone that knows the plan (better than I) would write
it.
On Nov 4, 2014 1:50 PM, "Omer Katz" notifications@github.com wrote:

@bukzor https://github.com/bukzor Yes and we need a volunteer that is
able to implement this feature.


Reply to this email directly or view it on GitHub
#1572 (comment).

@msabramo

This comment has been minimized.

Contributor

msabramo commented Mar 21, 2015

Love the idea. Hope this gets revived at some point.

@rbtcollins

This comment has been minimized.

Contributor

rbtcollins commented Mar 30, 2015

We'd still need egg_info because of dependency detection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment