New Resolver: Candidates need a name and version #7966

pfmoore · 2020-04-02T19:02:14Z

Environment

pip version: Current master (Confirmed issue at commit 6086f71)
Python version: 3.6
OS: Checked using Docker image python:3.6-stretch

Description

Pip tries to prepare the setuptools 0.7.2 sdist, which is very old and uses Python 2 syntax. The prepare therefore fails.

Expected behavior
Don't prepare candidates unnecessarily, and in particular, don't prepare older candidates when newer ones are still options.

How to Reproduce
Start the given docker image.

Install the following requirements.txt file with pip install -r requirements.txt

appdirs==1.4.3
certifi==2018.1.18
docutils==0.14
pip @ https://github.com/pypa/pip/archive/6086f71cde81454fda588c20868a241561809836.zip
plover @ https://github.com/openstenoproject/plover/releases/download/weekly-v4.0.0.dev8%2B66.g685bd33/plover-4.0.0.dev8.66.g685bd33-py3-none-any.whl
plover-plugins-manager==0.5.11
Pygments==2.2.0
PyQt5==5.9.2
pyserial==3.4
python-xlib==0.23
setuptools==41.2.0
sip==4.19.8
six==1.10.0
wcwidth==0.1.7
wheel==0.34.2

Then run pip install --unstable-feature=resolver 'plover_plugins_manager==0.5.14'. The command fails trying to run setup.py egg_info on setuptools 0.7.2.

Debugging the failure shows the following traceback:

(Pdb) w
  /usr/local/bin/pip(11)<module>()
-> load_entry_point('pip', 'console_scripts', 'pip')()
  /app/pip/src/pip/_internal/cli/main.py(75)main()
-> return command.main(cmd_args)
  /app/pip/src/pip/_internal/cli/base_command.py(114)main()
-> return self._main(args)
  /app/pip/src/pip/_internal/cli/base_command.py(199)_main()
-> status = self.run(options, args)
  /app/pip/src/pip/_internal/cli/req_command.py(185)wrapper()
-> return func(self, options, args)
  /app/pip/src/pip/_internal/commands/install.py(333)run()
-> reqs, check_supported_wheels=not options.target_dir
  /app/pip/src/pip/_internal/resolution/resolvelib/resolver.py(65)resolve()
-> self._result = resolver.resolve(requirements)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(378)resolve()
-> state = resolution.resolve(requirements, max_rounds=max_rounds)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(275)resolve()
-> failure_causes = self._attempt_to_pin_criterion(name, criterion)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(195)_attempt_to_pin_criterion()
-> criteria = self._get_criteria_to_update(candidate)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(187)_get_criteria_to_update()
-> name, crit = self._merge_into_criterion(r, parent=candidate)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(161)_merge_into_criterion()
-> crit = crit.merged_with(self._p, requirement, parent)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(78)merged_with()
-> for c in self.candidates
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(79)<listcomp>()
-> if provider.is_satisfied_by(requirement, c)
  /app/pip/src/pip/_internal/resolution/resolvelib/provider.py(48)is_satisfied_by()
-> return requirement.is_satisfied_by(candidate)
  /app/pip/src/pip/_internal/resolution/resolvelib/requirements.py(103)is_satisfied_by()
-> assert candidate.name == self.name, \
> /app/pip/src/pip/_internal/resolution/resolvelib/candidates.py(78)name()
-> self._name = canonicalize_name(self.dist.project_name)

In particular, the issue appears to be that the resolver needs to call requirement.is_satisfied_by on all the candidates returned from find_matches, as part of the criterion merge process. This in turn needs the name and version, which triggers the prepare call.

Solution

I don't think we can reasonably handle candidates (from PyPI, at least) without a name and version, as we'll end up building everything. The best fix I can think of is to use our existing logic¹ to determine the name and version for sdists (and wheels, but I think that already happens) based on the filename. We may then find that the metadata doesn't match the filename, but we can warn/abort at that point (we may have to abort, because the resolve process may be unrecoverably broken if the candidate's name or version changes "on the fly").

I can look at adding that logic - @pradyunsg @uranusjr does this seem like a reasonable approach to take?

¹ At least I assume we have this logic somewhere!

The text was updated successfully, but these errors were encountered:

pradyunsg · 2020-04-02T20:08:32Z

This might be a case of the resolver search order being incorrect -- attempting to choose the oldest versions first instead of newer versions. Can we check if that's not the case here?

pfmoore · 2020-04-02T21:30:11Z

I checked that as part of my debugging session. I'm reasonably certain it's not that.

Basically the resolvelib algorithm merges new requirements into the existing criteria. At that point it does:

For each currently-eligible candidate, check if it satisfies the new requirement.
If not, remove the candidate from the list of potential candidates.
Finally, add the requirement to the list of requirements covered by this criterion.

(You probably already know this, and I'm over-simpifying from memory, but hopefully the above is useful for others checking out this issue).

It's step (1) that's the problem - you can't do a "satisfies" check without name and version, and you're doing this for every current candidate.

I checked this with tracing diagnostics in the resolver, although I didn't track everything down (I got swamped with output 🙁).

I'm going to try the following approach tomorrow:

For LinkCandidate objects, look at the filename in the link, and extract name and version from there. (For wheels this is well-defined by the standard, for sdists it's technically not standardised, but sufficient because the finder relies on it). If the filename isn't in a suitable format (a local directory, for instance) fall back on the current method.
Record the name/version on the candidate.
When we prepare, compare the metadata against the recorded name/version. Error out if there's a discrepancy.

That would mean that we only need to prepare for getting dependencies, or for local directories, VCS URLs, or local files and URLs that have filenames that don't follow the normal sdist format. That seems like it's an acceptable level, but we'll have to confirm that with testing.

I don't think we can do any better than that without modifying the resolvelib algorithm to allow for "unnamed" or "unversioned" candidates, and that would be a much more difficult route to take - not least because it adds a constraint on the resolver that other libraries are unlikely to support without modification.

uranusjr · 2020-04-03T02:19:10Z

Extracting the name and version sounds right to me—in fact IIRC this is exactly what the legacy resolver does. The link parsing is already done by the package finder, and information should be available in the InstallationCandidate. With the recent change (make_candidate_from_ican()), all you need to do should be to pass those info into LinkCandidate’s init.

Due to ambiguity in an sdist’s filename, the parsed name could end up being incorrect in edge cases (e.g. foo-2-1.tar.gz could be parsed as version 1 of foo-2, or version 2-1 of foo). So it would be most ideal if we could reject the candidate (and backtrack) after we find the name does not match. But that’s an obscure enough case, and I think we can choose to add a TODO and handle that in a far distant future, when someone actually reports it as a problem.

pfmoore · 2020-04-03T06:35:31Z

Also, we can avoid that issue by using the fact that we know the project name in most cases (the finder has it from the simple index page it's scraping, and the new resolver has it from the requirement that find_matches was called on). That allows us to disambiguate the parsing,

+1 on keeping the idea of having a mechanism to trigger a backtrack, and for deferring it until later when we have a real example of where we need it.

triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Apr 2, 2020

pfmoore self-assigned this Apr 2, 2020

pfmoore added the C: dependency resolution About choosing which dependencies to install label Apr 2, 2020

triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Apr 2, 2020

pfmoore mentioned this issue Apr 2, 2020

pip needs a dependency resolver #988

Closed

uranusjr mentioned this issue Apr 3, 2020

Lazy-evaluate is_satisfied_by sarugaku/resolvelib#39

Closed

pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020

Add a test demonstrating pypa#7966

b9ad549

pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020

Add a test demonstrating pypa#7966

ae0b486

pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020

Add a test demonstrating pypa#7966

037791d

pfmoore mentioned this issue Apr 3, 2020

Avoid preparing candidates when all we need is name or version #7970

Merged

pfmoore closed this as completed in #7970 Apr 3, 2020

uranusjr added this to To do in New Resolver Implementation via automation Apr 3, 2020

uranusjr moved this from To do to Done in New Resolver Implementation Apr 3, 2020

lock bot added the auto-locked Outdated issues that have been locked by automation label May 5, 2020

lock bot locked as resolved and limited conversation to collaborators May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Resolver: Candidates need a name and version #7966

New Resolver: Candidates need a name and version #7966

pfmoore commented Apr 2, 2020

pradyunsg commented Apr 2, 2020

pfmoore commented Apr 2, 2020

uranusjr commented Apr 3, 2020 •

edited

pfmoore commented Apr 3, 2020

New Resolver: Candidates need a name and version #7966

New Resolver: Candidates need a name and version #7966

Comments

pfmoore commented Apr 2, 2020

pradyunsg commented Apr 2, 2020

pfmoore commented Apr 2, 2020

uranusjr commented Apr 3, 2020 • edited

pfmoore commented Apr 3, 2020

uranusjr commented Apr 3, 2020 •

edited