Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip resolver backtracking deepness option #10235

Open
SMH17 opened this issue Jul 29, 2021 · 8 comments
Open

pip resolver backtracking deepness option #10235

SMH17 opened this issue Jul 29, 2021 · 8 comments
Labels
state: awaiting PR Feature discussed, PR is needed

Comments

@SMH17
Copy link

SMH17 commented Jul 29, 2021

The new pip backtracking resolver is not viable in many real scenarios causing huge waste of time, because when you try to install a package, it tries to solve dependencies for each dependency requested by the package and this leads to continuous attempts to search most appropriate dependency that in many situations cannot be solved due to incoherent requirements between the package that you want to install and the requirements of some dependency.
It is needed a flag in order to specify the max level the resolver backtracking is allowed to try to fix dependencies conflicts.

@pfmoore
Copy link
Member

pfmoore commented Jul 29, 2021

To clarify - are you OK with pip failing after it hits the limit you specify? More precisely:

  1. You'd have to specify something like "number of backtracks = N". No, we can't give you any help knowing what a good value for N would be.
  2. If the limit is reached, pip would fail with an error "backtrack limit reached". To make the install work, you need to fix whatever problem is causing the excessive backtracking. Or remove the limit.
  3. We can't give any useful diagnostics to help you work out why the resolver is taking so long. You get the pip output to that point, and that's all.

We've had similar suggestions in the past, but generally people aren't comfortable with one or more of the points noted above, so it's not actually as useful as it seems.

Think of it this way. If a "number of backtracks" limit is useful to you, you can get something similar by killing pip after a certain period of time. If you can explain why that's not suitable, you'll probably have an answer as to why a backtrack limit option isn't useful either...

@notatallshaw
Copy link
Member

notatallshaw commented Jul 29, 2021

@pfmoore Let me describe what I feel like what most people want, and OP can correct me if I'm wrong:

Let's day Package A and Package B are specified in the requirements. Package A has 1000 possible versions (v1 to v1000) and Package B has 10 possible versions (v1 to v10). A valid install is Package A v998 and Package B v8. But for whatever reason Pip currently backtracks through package A first so checks Package A v1 to v1000 before back tracking through Package B and eventually finding the valid solution. This feels to the user like a waste of time because they know that pip doesn't need to check all 1000 versions of Package A.

The idea would then by if I set a "maximum backtrack depth" to 5, then instead the behavior changes so that Pip only checks Package A from v1000 to v996 and only checks Package B from v10 to v6. If a valid solution now exists in that range then pip installs, if a valid solution doesn't exist then pip reports an error that it couldn't install given the requirements and the limits on backtracking.

Not sure how easy something like that would be to implement in the current pip architecture, but that's what I think most people are asking for,

FYI conda's default package resolution kind of does something similar automatically, they have a "current_repodata.json" which contains all the metadata for only the latest version of packages (a backtracking depth of 1), and if that fails they then download "repodata.json" and backtrack on that (first using a restrictive search to limit the size of the possible dependency tree, and then if that fails using a more flexible search to search all possible satisfiable solutions to the requirements).

@SMH17
Copy link
Author

SMH17 commented Jul 29, 2021

@pfmoore It should be added as optional flag, so of I know that a package that I'm going to install could be a subdependencies mess,
rather than install it using the normal behavior that in many cases cannot end in acceptable times (and in some scenarios causes more problems than what it solves), I specify the inspection level.
e.g.
If I want to install a package P that has dependencies D1, D2
D1 in turn has requirements D11 D12 D13
D2 in turn has requirements D21 D22 D23
if one of the D1 dependency's requirements D12 conflicts with D2 dependency requirements D23
specifying deepness=1 backtracking avoid to attempt to solve conflicts that may happens at level 2 of dependencies tree, considering just the dependencies conflicts of package P ignoring conflict between requirements of dependencies of the dependencies.

The current approach in many situations is not viable and killing pip is not an alternative. The approach I'm talking about would give more control on the installation procedure, avoiding many problems that new resolver with endless attempts to fix incoherent requirements is causing, that in many cases are caused because dependencies of dependencies have requirements that actually not even impact the correct functioning of the target package you want to install, because are used very specific features of dependencies not even affected by changes introduced in a particular version of problematic subdependencies requirements.

@pfmoore
Copy link
Member

pfmoore commented Jul 29, 2021

@notatallshaw Thanks for the explanation. I think you're asking for something different, which is an "only consider the latest N versions of any package" flag. That might be useful, I don't think that's something we've tried. It's probably something you could prototype using a proxy index, reading the PyPI pages and removing all but the latest N versions before forwarding them on. That might be a useful experiment (and might be helpful for people in general, even before anything gets implemented in pip).

@SMH17 I'm still not sure I follow precisely. You may want to research a bit more into how resolvelib works, as your description doesn't quite match the actual process (we don't do a breadth-first search, precisely, so deepness isn't something we can control directly). But it sounds like you're interested in something like the existing try_to_avoid_resolution_too_deep parameter, defined here. It's not exposed to the user, but maybe you could experiment with setting it to different values to see whether exposing it (or something like it) would give the sort of control you want.

The current approach in many situations is not viable and killing pip is not an alternative.

Can you be more precise? We know that extended install times aren't viable, but because dependency resolution is NP-hard and because of the way Python packages store their dependencies (a problem conda avoids, to an extent) we can only really work in terms of trade-offs - we can't, much as we would like to, solve this for every use case - there will always be some that will trigger worst-case behaviour.

Also, if you can clarify why killing pip isn't an alternative, I'd appreciate it. After all, any "stop after you've tried this many options" setting is just as much a way of killing pip before it completes as a "stop after 20 minutes" flag, or an external "run pip but kill after 20 minutes" script. I don't frankly see why any one is worse than another.

endless attempts to fix incoherent requirements is causing

Well, yes. If the requirements are "incoherent", pip will have problems. So will any tool.

requirements that actually not even impact the correct functioning of the target package you want to install, because are used very specific features of dependencies not even affected by changes introduced in a particular version of problematic subdependencies requirements.

We can't do anything about that. If package A says it needs package B, we have to install B. We can't say "well, it looks like you're not using the part of A that needs B, so we'll just ignore that requirement". If you want finer-grained requirements, talk to the authors of A and get them to add extras that let you say you don't need B. But pip has to accept what you (and the package authors) say you need.

@SMH17
Copy link
Author

SMH17 commented Jul 29, 2021

@pfmoore

We can't do anything about that. If package A says it needs package B, we have to install B. We can't say "well, it looks like you're not using the part of A that needs B, so we'll just ignore that requirement". If you want finer-grained requirements, talk to the authors of A and get them to add extras that let you say you don't need B. But pip has to accept what you (and the package authors) say you need.

The point is that in checking requirements, more you go deep in dependency tree of a target package you want to install, less is likely that incoherent requirements affect the functions you actually need, so although it's impossible a perfect solution to address the dependencies incoherence, a control on deepness of inspection can give a better compromise.

@pfmoore
Copy link
Member

pfmoore commented Jul 29, 2021

a control on deepness of inspection can give a better compromise

It sounds to me like you're somehow hoping that if we have a "deepness" flag, when we hit it, pip would still complete the install, rather than failing. I don't see how that would work.

There may be a viable strategy in there, but I can't see what it is, sorry. If someone wants to flesh this out into an actual PR, maybe it would be clearer, but otherwise I'm still confused as to what you're proposing.

@SMH17
Copy link
Author

SMH17 commented Jul 29, 2021

@pfmoore It's not about hoping something. Dependencies requirements incoherence is a very common situation. Now you can let resolver cycle indefinitely for hours just to see it fails to find a configuration that can satisfy all constraints, or don't let resolver try to resolve anything. That isn't the most proper way to handle requirements' incoherence.
Currently pip treats package dependencies without any consideration on dependency tree deepness, so there is no difference in how are considered requirements in a package that is the subdependency of another subdependency and requirements of a direct dependency of the package you need to install.
You have to rethink the approach in a stratified way in order that pip generates a dependency tree and when you install you can control till resolver can try to address requirements conflicts digging in subpackages' requirements, because the requirement conflicts found in a subdependency in a very deep leaf, are less likely to directly affect the functioning of package you are installing, compared to incoherencies found at nearest levels of dependency tree.

@pfmoore
Copy link
Member

pfmoore commented Jul 29, 2021

You have to rethink the approach in a stratified way

Hmm, I don't think pip's going to do that (if I understand you). It sounds like a fundamental change in the assumptions and expectations on which pip is based. But I may be misunderstanding you. Feel free to provide code demonstrating what you mean (either as a pip PR or as a standalone utility/library) if you think it's an approach with a realistic chance of doing better than pip's current approach.

I'm going to add the "awaiting PR" label to this issue to indicate that I think we need working code to make progress here, not just abstract design discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: awaiting PR Feature discussed, PR is needed
Projects
None yet
Development

No branches or pull requests

3 participants