Issue #988: new resolver for pip #2716

Closed
wants to merge 11 commits into
from

Projects

None yet

8 participants

@rbtcollins
Contributor
rbtcollins commented Apr 22, 2015 edited

This is the branch I'm working up on issue #988. Right now its a bunch of preparatory plumbing to make access to many and varied versions of a single requirement possible and cheap - since we're very likely to hit the same thing again and again and again as we try different paths.

Review on Reviewable

@xavfernandez xavfernandez commented on the diff Apr 23, 2015
pip/req/req_install.py
@@ -257,6 +262,15 @@ def link(self, link):
self._link = link
@property
+ def req_cache(self):
+ return self._req_cache
+
+ @req_cache.setter
+ def req_cache(self, req_cache):
+ assert self._req_cache is None
+ self._req_cache = req_cache
+
+ @property
@xavfernandez
xavfernandez Apr 23, 2015 Contributor

This seems to have nothing to do with this PR ?

@rbtcollins
rbtcollins Apr 29, 2015 Contributor

Its adding the req_cache, a set-but-not-change attribute.

@xavfernandez
xavfernandez Apr 30, 2015 Contributor

No, I was pointing the specifier switching to a property :)

@rbtcollins
rbtcollins Apr 30, 2015 Contributor

it was already, its just the diff anchoring oddly.

@xavfernandez xavfernandez commented on an outdated diff Apr 23, 2015
pip/req/req_install.py
@@ -312,7 +331,7 @@ def _correct_build_location(self):
assert self._ideal_build_dir
old_location = self._temp_build_dir
self._temp_build_dir = None
- new_location = self.build_location(self._ideal_build_dir)
+ new_location = self.build_location()
@xavfernandez
xavfernandez Apr 23, 2015 Contributor

since you now have the _ideal_build_dir stored as self.req_cache.path you can completely get rid of self._ideal_build_dir (and update the docstrings explaining the _correct_build_location dance :) )

@xavfernandez
Contributor

I've an issue with the RequirementCache name as it does not look (yet ?) to me like a cache...

@rbtcollins rbtcollins changed the title from Move BuildDirectory to RequirementCache to Issue #988: Resolve requirements from multiple sources Apr 29, 2015
@jamezpolley jamezpolley commented on an outdated diff May 6, 2015
pip/req/req_cache.py
+import logging
+import os.path
+import shutil
+import tempfile
+
+from pip.download import url_to_path
+from pip.utils import display_path, ensure_dir, rmtree, _make_build_dir
+
+
+logger = logging.getLogger(__name__)
+
+
+class RequirementCache(object):
+ """A cache of requirements.
+
+ :attr delete: May be set to False to disable deleting the cache.
@jamezpolley
jamezpolley May 6, 2015 Contributor

The first time I read this I thought "deleting the cache" meant an in-memory object; reading the docstring for __init__ I realised that it actually refers to the on-disk cache artifacts

Maybe "deleting the cached artifacts" or "deleting the cached files" would be clearer here, but I think this is a minor issue.

@jamezpolley jamezpolley commented on the diff May 6, 2015
pip/req/req_cache.py
+ self._urls = None
+
+ def __repr__(self):
+ return "<{} {!r}>".format(self.__class__.__name__, self.path)
+
+ def __enter__(self):
+ if self._path is None:
+ # We realpath here because some systems have their default tmpdir
+ # symlinked to another directory. This tends to confuse build
+ # scripts, so we canonicalize the path by traversing potential
+ # symlinks here.
+ self.path = os.path.realpath(tempfile.mkdtemp(prefix="pip-build-"))
+ # If we were not given an explicit directory, and we were not given
+ # an explicit delete option, then we'll default to deleting.
+ if self.delete is None:
+ self.delete = True
@jamezpolley
jamezpolley May 6, 2015 Contributor

Unless there's something I'm missing that means self._path could be unset in the meantime, this seems to duplicate lines 49-54

@rbtcollins
rbtcollins May 7, 2015 Contributor

See line 36.

@jamezpolley jamezpolley commented on the diff May 6, 2015
pip/req/req_cache.py
+ :raises: KeyError if there is no matching requirement
+ :return: The CachedRequirement object if found.
+ """
+ try:
+ return self._names[name][version]
+ except KeyError:
+ raise KeyError("%s-%s" % (name, version))
+
+
+class CachedRequirement(object):
+ """Cacheable component of InstallRequirement.
+
+ :attr url: The URL for the requirement, if it was specified as a URL or
+ path.
+ :attr name: The canonical name of the requirement or None if it is not yet
+ known (which is for requirements specifiied by URL).
@jamezpolley
jamezpolley May 6, 2015 Contributor

I think "which is for" here could be more clearly expressed as "eg. because the requirement is specified by URL"

Also, not typo in specifiied

@jamezpolley jamezpolley commented on the diff May 6, 2015
pip/req/req_cache.py
+ try:
+ return self._names[name][version]
+ except KeyError:
+ raise KeyError("%s-%s" % (name, version))
+
+
+class CachedRequirement(object):
+ """Cacheable component of InstallRequirement.
+
+ :attr url: The URL for the requirement, if it was specified as a URL or
+ path.
+ :attr name: The canonical name of the requirement or None if it is not yet
+ known (which is for requirements specifiied by URL).
+ :attr version: The version of the requirement or None if it is not yet
+ known (also for URL specified requirements).
+ :attr editable: The editability of a requirement affects the directory
@jamezpolley
jamezpolley May 6, 2015 Contributor

I think this describes what the attr is used for, but it seems to skip the allowable values. I think simply mentioning that it's boolean would suffice?

@rbtcollins
rbtcollins May 7, 2015 Contributor

This is well known in the pip codebase, I think it would not add readability here.

@dstufft dstufft and 1 other commented on an outdated diff May 6, 2015
pip/commands/uninstall.py
@@ -46,11 +47,11 @@ def run(self, options, args):
with self._build_session(options) as session:
format_control = pip.index.FormatControl(set(), set())
wheel_cache = WheelCache(options.cache_dir, format_control)
+ req_cache = RequirementCache()
@dstufft
dstufft May 6, 2015 Member

I know this is WIP, but just wanted to point out: This makes me a little uncomfortable, I think either we should have this be a context manager here too so it gets cleaned up or RequirementSet should take a None for a req_cache.

@rbtcollins
rbtcollins May 6, 2015 Contributor

Yes, this is just oversight.

@harlowja harlowja commented on an outdated diff May 8, 2015
pip/req/req_set.py
"""
# make the wheelhouse
if self.wheel_download_dir:
ensure_dir(self.wheel_download_dir)
+ # Allow for a decent # of reqs
+ # Each dep in the graph is a frame.
+ sys.setrecursionlimit(5000)
@harlowja
harlowja May 8, 2015

Should this just be dynamically computed (since we should know the height of the tree)? Perhaps something for later as this gets optmized more...

@harlowja
harlowja May 8, 2015

Actually nm, we don't know the tree height (if new dependencies keep on getting pulled in); so cancel that question (although we do know a starting/potential height and/or guess).

@harlowja harlowja and 2 others commented on an outdated diff May 8, 2015
pip/req/req_set.py
+ tuple(sorted(set(extras + chosen[name][0]))), chosen[name][1])
+ return self.resolve_constraints(constraints, chosen, new_pending)
+ # Loop over all possible versions in preference order.
+ last_error = None
+ for candidate in self.req_cache.get_versions(name, self.upgrade):
+ logger.debug("Trying %s %s" % (name, candidate))
+ if not specifier.contains(candidate):
+ # Incompatible with existing constraints.
+ msg = "%s %s incompatible with %s" % (
+ name, candidate, specifier)
+ logger.debug(msg)
+ continue
+ chosen[name] = (extras, candidate)
+ requires = self.req_cache.requires(name, extras, candidate)
+ try_pending = set(requires)
+ try:
@harlowja
harlowja May 8, 2015

Be interesting to see how this works out on sets of huge requirements (that expand to quite a large set that eventually does not work out).

@rbtcollins
rbtcollins May 8, 2015 Contributor

I'm testing with https://git.openstack.org/cgit/openstack/requirements/tree/global-requirements.txt which is ~250 requirements, some capped, some open, some pinned, and it resolves in 50seconds of cpu, 2m with cool HTTP cache, 1m with a hot cache.

@harlowja
harlowja May 8, 2015

Very nice, acceptable imho :)

@harlowja
harlowja May 8, 2015

Although I do wonder how pathological cases would work (and how long they would take to declare failure); but hopefully such things don't exist that often....

@rbtcollins
rbtcollins May 8, 2015 Contributor

it's NP-Complete I believe, so they will and they will be bad. On my mental todo list is an iteration counter and a limit (e.g. 1M rounds), plus need to profile a few different styles of immutable data here - e.g. rather than shrinking pending, perhaps build up a completed set, or use a list with careful mutations, or a list of lists or tuple tree.

@harlowja
harlowja May 8, 2015

Ya, some kind of rounds probably would be useful, I don't think such cases would happen to often, but one never knows... The part that sort of sucks, is that someone could maliciously make a release of a package pathological, adding in some crazy requirements chain, after the fact (and that would affect installs that try to resolve).

@dstufft
dstufft May 8, 2015 Member

A malicious author is (mostly) outside of our threat model, because they can do pretty much anything they want anyways (you are after all planning on executing their setup.py, or at least their software itself when you import it). That being said, I don't feel bad about adding a maximum number of rounds.

@harlowja
harlowja May 8, 2015

Fair enough, good point ๐Ÿ˜„

@StephanErb
Contributor

What's the current status of this pull request? Given that the build is fixed and it can get merged, would this be sufficient to close issue #988?

@rbtcollins
Contributor

It needs polish basically. I've just rebased it today, have to fixup tests and the semantic conflict with constraints before it can move forward.

Robert Collins added some commits Apr 22, 2015
Robert Collins Move BuildDirectory to RequirementCache
BuildDirectory is used for wheels as a temporary unpack, and we're
going to need a richer API for backtracking. This is the first step in
exposing it cleanly to RequirementSet.
39d043d
Robert Collins Push more ownership of dirs into InstallRequirement.
This should make it possible to move cleanup ownership to the
requirement cache.
b7073be
Robert Collins Move src dir ownership to the requirement cache. 5342043
Robert Collins Add CachedRequirement cache. 9d67095
Robert Collins Delete cached requirements en mass
Currently we delete succesfully installed requirements and not ones
that failed. Then (most of the time) we delete the requirements cache
that contained them, deleting everything. This means that useful
debugging context is lost (when we don't delete the cache), and we've
got complexity serving no point. Instead, either keep everything or
delete it all.
45c963e
Robert Collins Move source_dir stuff into the RequirementCache.
Since the point of the Cache is to own the directories and all the
metadata about requirements, they need to own this stuff.
3491302
Robert Collins Consolidate editable code a little 3a11e00
Robert Collins Move UI logging out of requirements resolving.
Its useful to know whats happening, so push the existing logging down
to debug, and store the skip reason (which will be constant for a
given req) on the req itself.
45e3288
Robert Collins Simplify now there is only one walking of reqs. 5ef5337
Robert Collins Make all the processing for one req be indented. 2c538aa
Robert Collins Backtracking resolver.
Still a very large commit with lots of unresolved duplication.
2620072
@xavfernandez xavfernandez commented on the diff Oct 1, 2015
pip/req/req_cache.py
+ build_path = os.path.join(self.src_dir, req.name)
+ else:
+ build_path = tempfile.mkdtemp('-build', 'pip-', self.path)
+ if req.editable:
+ ensure_dir(os.path.dirname(build_path))
+ req.build_path = build_path
+ return req.build_path
+
+ def candidate_from_version(self, name, version):
+ """Get the candidate for version of name.
+
+ :param name: The name.
+ :param version: The version that was selected.
+ :return: The InstallationCandidate.
+ """
+ canonical_name = pkg_resources.safe_name(name).lower()
@xavfernandez
xavfernandez Oct 1, 2015 Contributor

You could use pip.utils.canonicalize_name. And there are a bunch of other places also :)

@qwcode qwcode changed the title from Issue #988: Resolve requirements from multiple sources to Issue #988: new resolver for pip Nov 3, 2015
@dstufft dstufft closed this May 18, 2016
@dstufft
Member
dstufft commented May 18, 2016

Accidentally closed this, reopening. Sorry!

@dstufft dstufft reopened this May 18, 2016
@BrownTruck

Hello!

As part of an effort to ease the contribution process and adopt a more standard workflow pip has switched to doing development on the master branch. However, this Pull Request was made against the develop branch so it will need to be resubmitted against master. Unfortunately, this pull request does not cleanly merge against the current master branch.

If you do nothing, this Pull Request will be automatically closed by @BrownTruck since it cannot be merged.

If this pull request is still valid, please rebase it against master (or merge master into it) and resubmit it against the master branch, closing and referencing the original Pull Request.

If you choose to rebase/merge and resubmit this Pull Request, here is an example message that you can copy and paste:

This is the branch I'm working up on issue #988. Right now its a bunch of preparatory plumbing to make access to many and varied versions of a single requirement possible and cheap - since we're very likely to hit the same thing again and again and again as we try different paths.

---

*This was migrated from pypa/pip#2716 to reparent it to the ``master`` branch. Please see original pull request for any previous discussion.*
@BrownTruck

This Pull Request was closed because it cannot be automatically reparented to the master branch and it appears to have bit rotted.

Please feel free to re-open it or re-submit it if it is still valid and you have rebased it onto master or merged master into it.

@BrownTruck BrownTruck closed this May 26, 2016
@pradyunsg
Contributor

@rbtcollins Would you be fine if I reuse the work here? I'm pretty sure you would be... Just confirming.

@rbtcollins
Contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment