Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resolver: Candidates need a name and version #7966

Closed
pfmoore opened this issue Apr 2, 2020 · 4 comments · Fixed by #7970
Closed

New Resolver: Candidates need a name and version #7966

pfmoore opened this issue Apr 2, 2020 · 4 comments · Fixed by #7970
Assignees
Labels
auto-locked Outdated issues that have been locked by automation C: dependency resolution About choosing which dependencies to install

Comments

@pfmoore
Copy link
Member

pfmoore commented Apr 2, 2020

Environment

  • pip version: Current master (Confirmed issue at commit 6086f71)
  • Python version: 3.6
  • OS: Checked using Docker image python:3.6-stretch

Description

Pip tries to prepare the setuptools 0.7.2 sdist, which is very old and uses Python 2 syntax. The prepare therefore fails.

Expected behavior
Don't prepare candidates unnecessarily, and in particular, don't prepare older candidates when newer ones are still options.

How to Reproduce
Start the given docker image.

Install the following requirements.txt file with pip install -r requirements.txt

appdirs==1.4.3
certifi==2018.1.18
docutils==0.14
pip @ https://github.com/pypa/pip/archive/6086f71cde81454fda588c20868a241561809836.zip
plover @ https://github.com/openstenoproject/plover/releases/download/weekly-v4.0.0.dev8%2B66.g685bd33/plover-4.0.0.dev8.66.g685bd33-py3-none-any.whl
plover-plugins-manager==0.5.11
Pygments==2.2.0
PyQt5==5.9.2
pyserial==3.4
python-xlib==0.23
setuptools==41.2.0
sip==4.19.8
six==1.10.0
wcwidth==0.1.7
wheel==0.34.2

Then run pip install --unstable-feature=resolver 'plover_plugins_manager==0.5.14'. The command fails trying to run setup.py egg_info on setuptools 0.7.2.

Debugging the failure shows the following traceback:

(Pdb) w
  /usr/local/bin/pip(11)<module>()
-> load_entry_point('pip', 'console_scripts', 'pip')()
  /app/pip/src/pip/_internal/cli/main.py(75)main()
-> return command.main(cmd_args)
  /app/pip/src/pip/_internal/cli/base_command.py(114)main()
-> return self._main(args)
  /app/pip/src/pip/_internal/cli/base_command.py(199)_main()
-> status = self.run(options, args)
  /app/pip/src/pip/_internal/cli/req_command.py(185)wrapper()
-> return func(self, options, args)
  /app/pip/src/pip/_internal/commands/install.py(333)run()
-> reqs, check_supported_wheels=not options.target_dir
  /app/pip/src/pip/_internal/resolution/resolvelib/resolver.py(65)resolve()
-> self._result = resolver.resolve(requirements)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(378)resolve()
-> state = resolution.resolve(requirements, max_rounds=max_rounds)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(275)resolve()
-> failure_causes = self._attempt_to_pin_criterion(name, criterion)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(195)_attempt_to_pin_criterion()
-> criteria = self._get_criteria_to_update(candidate)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(187)_get_criteria_to_update()
-> name, crit = self._merge_into_criterion(r, parent=candidate)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(161)_merge_into_criterion()
-> crit = crit.merged_with(self._p, requirement, parent)
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(78)merged_with()
-> for c in self.candidates
  /app/pip/src/pip/_vendor/resolvelib/resolvers.py(79)<listcomp>()
-> if provider.is_satisfied_by(requirement, c)
  /app/pip/src/pip/_internal/resolution/resolvelib/provider.py(48)is_satisfied_by()
-> return requirement.is_satisfied_by(candidate)
  /app/pip/src/pip/_internal/resolution/resolvelib/requirements.py(103)is_satisfied_by()
-> assert candidate.name == self.name, \
> /app/pip/src/pip/_internal/resolution/resolvelib/candidates.py(78)name()
-> self._name = canonicalize_name(self.dist.project_name)

In particular, the issue appears to be that the resolver needs to call requirement.is_satisfied_by on all the candidates returned from find_matches, as part of the criterion merge process. This in turn needs the name and version, which triggers the prepare call.

Solution

I don't think we can reasonably handle candidates (from PyPI, at least) without a name and version, as we'll end up building everything. The best fix I can think of is to use our existing logic1 to determine the name and version for sdists (and wheels, but I think that already happens) based on the filename. We may then find that the metadata doesn't match the filename, but we can warn/abort at that point (we may have to abort, because the resolve process may be unrecoverably broken if the candidate's name or version changes "on the fly").

I can look at adding that logic - @pradyunsg @uranusjr does this seem like a reasonable approach to take?

1 At least I assume we have this logic somewhere!

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Apr 2, 2020
@pfmoore pfmoore self-assigned this Apr 2, 2020
@pfmoore pfmoore added the C: dependency resolution About choosing which dependencies to install label Apr 2, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Apr 2, 2020
@pradyunsg
Copy link
Member

This might be a case of the resolver search order being incorrect -- attempting to choose the oldest versions first instead of newer versions. Can we check if that's not the case here?

@pfmoore
Copy link
Member Author

pfmoore commented Apr 2, 2020

I checked that as part of my debugging session. I'm reasonably certain it's not that.

Basically the resolvelib algorithm merges new requirements into the existing criteria. At that point it does:

  1. For each currently-eligible candidate, check if it satisfies the new requirement.
  2. If not, remove the candidate from the list of potential candidates.
  3. Finally, add the requirement to the list of requirements covered by this criterion.

(You probably already know this, and I'm over-simpifying from memory, but hopefully the above is useful for others checking out this issue).

It's step (1) that's the problem - you can't do a "satisfies" check without name and version, and you're doing this for every current candidate.

I checked this with tracing diagnostics in the resolver, although I didn't track everything down (I got swamped with output 🙁).

I'm going to try the following approach tomorrow:

  1. For LinkCandidate objects, look at the filename in the link, and extract name and version from there. (For wheels this is well-defined by the standard, for sdists it's technically not standardised, but sufficient because the finder relies on it). If the filename isn't in a suitable format (a local directory, for instance) fall back on the current method.
  2. Record the name/version on the candidate.
  3. When we prepare, compare the metadata against the recorded name/version. Error out if there's a discrepancy.

That would mean that we only need to prepare for getting dependencies, or for local directories, VCS URLs, or local files and URLs that have filenames that don't follow the normal sdist format. That seems like it's an acceptable level, but we'll have to confirm that with testing.

I don't think we can do any better than that without modifying the resolvelib algorithm to allow for "unnamed" or "unversioned" candidates, and that would be a much more difficult route to take - not least because it adds a constraint on the resolver that other libraries are unlikely to support without modification.

@uranusjr
Copy link
Member

uranusjr commented Apr 3, 2020

Extracting the name and version sounds right to me—in fact IIRC this is exactly what the legacy resolver does. The link parsing is already done by the package finder, and information should be available in the InstallationCandidate. With the recent change (make_candidate_from_ican()), all you need to do should be to pass those info into LinkCandidate’s init.

Due to ambiguity in an sdist’s filename, the parsed name could end up being incorrect in edge cases (e.g. foo-2-1.tar.gz could be parsed as version 1 of foo-2, or version 2-1 of foo). So it would be most ideal if we could reject the candidate (and backtrack) after we find the name does not match. But that’s an obscure enough case, and I think we can choose to add a TODO and handle that in a far distant future, when someone actually reports it as a problem.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 3, 2020

Also, we can avoid that issue by using the fact that we know the project name in most cases (the finder has it from the simple index page it's scraping, and the new resolver has it from the requirement that find_matches was called on). That allows us to disambiguate the parsing,

+1 on keeping the idea of having a mechanism to trigger a backtrack, and for deferring it until later when we have a real example of where we need it.

pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020
pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020
pfmoore added a commit to pfmoore/pip that referenced this issue Apr 3, 2020
@uranusjr uranusjr added this to To do in New Resolver Implementation via automation Apr 3, 2020
@uranusjr uranusjr moved this from To do to Done in New Resolver Implementation Apr 3, 2020
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 5, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: dependency resolution About choosing which dependencies to install
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants