-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using BFS instead of DFS for backtracking #10884
Comments
Have you tried implementing BFS? Do you have some results of how well it behaves in general? I have a list of problematic requirements here you might want to try it on #10481 (comment) In my investigation of trying to improve backtracking I initially also thought BFS might be a good solution but in my PoC code I wrote it ended up being slower in most cases (but it could of just been a buggy implementation). I personally think that backjumping might be the way to improve performance, or at least taking some part of it as I write here. I have a very hacky backjumping implementation here, I tested your requirements and they install very quickly compared to current pip (in fact I can't even install your requirements on pip 22.0.3 because hacking going back to such an old version it errors out generating metadata). |
Backjumping seems interesting from the performance point of view. But my main concern is not the speed, actually. It's consistency and ability to install more or less fresh versions of all the packages, i.e. not driving one of them to the ground. For the BFS, I didn't try to implement it as I'm not familiar with the code. It's more of a suggestion based on a user experience. I understand that BFS may not be optimal if the answer is really at the bottom of the search tree. But I hope that performance is in some sane boundaries for normal cases. And if the performance is not horrible, I'm, as a user, willing to sacrifice speed for stability and predictability. We're using pip to install requirements while running automated CI and it's much more important for CI to be stable rather than a bit faster. I'm able to install flake8 and hacking with 22.0.3 just fine even though it takes a few seconds (I only had to install wheel beforehand). And since you mentioned that, it is another problem of DFS and, likely, backjumping too: good chance to have a build failure or some other error while trying to build very old version of a package. Current pip will abort backtracking at this point leaving the user with no packages installed at all. BFS, I think, should be better protected from such failures, since it will not go that deep in most cases. |
That definitely feels intuitive, however in real world dependency trees I am not convinced this would be true because the version depth of one package can not be compared to the version depth of another package. Take for example I think the only real way to protect yourself against ancient versions of packages got stability is to put in lower bounds either in your requirements file or constraints file. Pip can't know ahead of time if the metadata will fail to build on your machine or not. I believe one intention, beyond the primary reason of catching missing system dependencies, that erroring out when a package fails to build metadata is to reveal backtracking too far issues and encourage you, and the ecosystem in general, to put in sensible lower bounds. But I am not a Pip maintainer, I have just been working a little bit on improving Pips backtracking performance. You should know this project is volunteer driven and ultimately changes are made by people committing PRs so they can be evaluated in practice. |
You summed up the situation quite well actually. One of the reasons DFS is initially chosen over BFS is that both can have terrible performance in some situations, but the it is relatively easy to identify and work around cases where DFS have disastrous performance (by adding a lower bound when you see pip is going too deep for a package), while it is not obvious at all how you could improve the worst case scenario for BFS (it is possible, but far less obvious without deeper investigation). |
Is the package version release date available in metadata? maybe that could be part of the picture? |
It is not. A list of information available at resolution time can be found here: https://packaging.python.org/en/latest/specifications/core-metadata/ Additionally we can obtain the URL a package is downloaded from (obviously) and the artifact hash. |
In general I think it would be good to try possible different strategies outside pip. The problem is so many people use pip that every possible edge case is almost immediately touched on the moment a release is made. And there isn't currently a good way for pip to test any of these. I'm working on a personal project to implement the bare bones functionality of pip (get, resolve, install) where each section has a clear API and can be easily replaced or extended. I don't know if it'll get the state where I can share it but I think someone needs to do something like this so more radical ideas can be tested without impacting millions of developers. For example I think one approach is to use the pypi json API which does include the upload date. However this API isn't standardized and would only be useful for pypi and not other indexes so it's not something that can be added to pip. |
I think that idea could really pay off not only in allowing easier experimentation but also in helping clarify where it makes sense to draw api boundaries (what things should be decoupled or combined). In the meanwhile I'm left wondering if it makes sense to make a github project or at least a label for resolver/backtracking related things? in which case tickets like these would fit under that umbrella. |
I think it's fairly unlikely that we'd make such a change in pip, especially given that we don't have same amount of available time for change management and developer time, as we did when we rolled out the resolver. As noted already, there are specific advantages to DFS (notably, that it's easier to control as an end user as well as being easier to reason about as an end user). Switching to BFS does not solve all the issues, it just changes the set of tradeoffs involved and it's not universally better either. Further, pip also does a first round of breadth first-style sweep of the top-level requirements, to allow users to have the control that they wish to have on the resolution process -- so, we do already have some of the top-level benefits. If folks wish to implement alternative dependency resolution algorithms and feed that into pip, that can be done today itself, by spewing out a requirements.txt file that's installed with |
I'm going to go ahead and close this, based on this rationale. Appreciate the discussion here folks! ^>^ |
Thanks for the discussion and sorry for not participating much. I agree that it doesn't look possible to make a significant improvement by changing the algorithm, since any implementation will have its own corner cases. At least, until the actual release date is available during the backtracking, so some meaningful estimations can be made. Closing for now is OK to me. |
FYI I'm still of the opinion that some form of backjumping will significantly improve most current heavy backtracking cases. My hacky implementation worked very well on your original example and other examples posted to Pip issues. The problem is backjumping is difficult to implement and understand the behavior of how it will affect backtracking in general, and so is not as clear win as #10481 (comment) was. That combined with only a few heavy backtracking cases posted to Pip issues since 21.3, personally not running in to heavy backtracking in my own projects, and in general having good lower bounds on requirements fixes most issues I've not been motivated to develop or push for it. But if someone else is very motivated to improve the situation I would humbly suggest they read my latest thoughts at sarugaku/resolvelib#64 (comment) . |
What's the problem this feature will solve?
Current pip will drive version of one of the packages to the ground looking for compatible one, then it reduces the version of a second package a little and drives the first package to the ground again, i.e. performing kind of a depth-first search (DFS). This approach has a few issues:
pip install
command.Example of the behavior is:
Here it downloads a huge pile of different versions of hacking from 4.1 down to 0.5 taking a lot of time and also ending up with a broken combination. To be clear, I'm not blaming pip for incompatibility of the result, this is, probably, the issue with the packaging of that very old version of
hacking
. However, time is wasted and the result is a broken flake8 command.While:
Here we can see that pip started downgrading from the flake8 package and found a working combination almost immediately with a reasonably fresh versions of both packages.
Describe the solution you'd like
Solution might be to use breadth-first search (BFS) instead, i.e. not driving one package to the ground, but gradually lower versions of all f them one at a time until the good combination is found.
So, for the
flake8
andhacking
problem, solution can be found quickly regardless of the order in which these packages are in the command line. This search order should also provide better compatibility between installed packages, because we should end upwith more or less fresh versions of all of them. And this should help avoiding badly packaged very old versions of some software.
Alternative Solutions
Workaround is to manually limit versions of some packages or shuffle them around in the command line until
pip
installs what you need. But that's no fun to do if you're not a developer of these packages and hardly know what the version requirements should be. And this gets even worse if you need a big number of packages installed and which of them is actually causing the backtracking problem is unclear.Additional context
Along with aborted backtracking problem we had to spend a significant amount of time -- while trying to fix CI for the
openvswitch
project -- trying to figure out whyflake8
checks are not performed during the build. Ended up with>=3.0
limit for thehacking
package, otherwise pip installshacking-0.5.4
andflake8
just throws an exception on startup. At this point our configuration script decides thatflake8
is broken and unavailable: openvswitch/ovs@d545300With the version limit, pip does this:
Code of Conduct
The text was updated successfully, but these errors were encountered: