Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --no-backtracking option for new resolver #9258

Closed

Conversation

TheKevJames
Copy link

@TheKevJames TheKevJames commented Dec 11, 2020

Adds a new option --no-backtracking to the new resolver. Off by
default (no change to default behaviour), enabling this option prevents
us from running the backtracking behaviour when pinning fails and
instead skip directly to the error reporting.

This can be especially useful in a CI situation, where the developer is
interested in knowing about conflicts up front in order to manually
solve them, or is worried about allowing for a potentially large amount
of time spent in backtracking.

My goal here is mostly to help with the various issues such as #9254,
#9215, and #9187 -- being able to enable this flag is very useful for helping
in reducing backtracking (as per the recommendations in the guide).
Options 2, 3, and 4 of that guide mostly amount to "determine which
packages have conflicts and update your constraints/requirements files to
avoid those issues". By being able to "fail fast", we allow users who are
interested in using any of those options to take advantage of the awesome
ResolutionImpossible debug output without having to wait for what could
potentially be hours worth of attempts from the resolver (see linked issues).

Still TODO:

Adds a new option `--no-backtracking` to the new resolver. Off by
default (no change to default behaviour), enabling this option prevents
us from running the backtracking behaviour when pinning fails and
instead skip directly to the error reporting.

This can be especially useful in a CI situation, where the developer is
interested in knowing about conflicts up front in order to manually
solve them, or is worried about allowing for a potentially large amount
of time spent in backtracking.
@uranusjr
Copy link
Member

Personally I think “only try the best version available” is a valid strategy, but this particular implementation should not be accepted for two reasons.

First, the --no-backtracking flag hard-couples the user interface into the backtracking nature of the resolver, and would make it difficult for pip to transition into other resolution algorithms in the future. The thing users are looking for is for the resolver to only try one version of each package, and give up if that does not work, and “not backtracking” is only how this is done internally. The flag, therefore, should express that user intention, instead of exposing the internal implementation to the user. IMO the best way toward this would be to expand the meaning of --upgrade-strategy into a more general idea, and make it express “how pip should choose package versions” in general. This fits well with the long-term plan #8115, and also #8085 if we accept it. So I would like to see this added as --upgrade-strategy=require-best (strategy name is up for discussion), and the option is eventually renamed to --strategy.

Second, the patch is suboptimal. Again, the user intention is for the resolver to not try older versions, but this implementation directly introduces a branch in the core of the algorithm, without trying to explore ways to express the idea as a general case. I believe the idea of “try only one version” can be expressed and implemented by merly changing the input of the algorithm, without changing the algorithm itself.

@TheKevJames
Copy link
Author

@uranusjr thank you for the insight -- appreciate the prompt response!

Apologies, I was not aware that pip had plans to swap in alternate resolvers in the future; it makes sense to make this flag more general to support that case, I was just under the impression that this state of having multiple resolvers was temporary, eg. that the plan was to deprecate the legacy resolver and move back to having only a single resolver in pip.

I'm not sure --upgrade-strategy=require-best is quite the goal I'm going for here... what do you think about something more along the lines of --exit-on-failure or --fail-fast or something like that? Perhaps --resolver-fail-fast or such, if we'd prefer being more narrow/precise? Thinking about the goal of this PR as it might pertain to future new resolvers here, the goal is really to give users the ability to say "please don't spend time attempting to fix issues; fail quickly with debugging output so I can do my own debugging". For the new resolver it just-so-happens to failfast by disabling backtracking, but other resolvers may behave differently.

Similarly, I think the branch in the core of the algorithm was the best way to make this change in the upstream algorithm -- I'll add a more specific comment on the other PR for us to chat through there, but I was not attempting to tell the resolver to "only attempt the best option" (though I could see the idea of --upgrade-strategy=require-best indeed being valuable!). resolvelib's failure_causes error reporting is wonderful, and having the list of potential versions which could require backtracking to resolve is very useful for debugging.

As an example, the issue which original brought me here was my company's CI systems taking >3 hours and eventually timing out when our Renovatebot attempted to build with the new pip version. With this --no-backtracking fix, I get the following output, which let me immediately solve the problem according to our business requirements. My first attempt at submitting this PR involved instead involved tweaking the existing max_rounds parameter, but that unfortunately did not let the resolver give nearly as useful input (and still did some backtracking, which means the debugging dev loop is much longer!).

ERROR: Cannot install -r requirements.txt (line 3), google-cloud-bigquery[bqstorage,pandas]==1.28.0 and INTERNAL-PACKAGE-ONE because these package versions have conflicting dependencies.

The conflict is caused by:
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-resumable-media==1.1.0
    google-cloud-bigquery 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    INTERNAL-PACKAGE-TWO 4.1.1 depends on google-resumable-media==1.0.0
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-resumable-media==1.1.0
    google-cloud-bigquery 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    INTERNAL-PACKAGE-TWO 4.0.7 depends on google-resumable-media==1.0.0
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-cloud-bigquery==1.28.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-cloud-bigquery 1.28.0 (from https://files.pythonhosted.org/packages/49/18/6aeca7e64efb37f4d4fceaed0cbb7cd861416588240b76c4796fe2e3393c/google_cloud_bigquery-1.28.0-py2.py3-none-any.whl#sha256=9266e989531f290d3b836dc7b308ac22b350c4d664af19325bd0102261231b71 (from https://pypi.org/simple/google-cloud-bigquery/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*))
    INTERNAL-PACKAGE-TWO 4.0.6 depends on google-cloud-bigquery==1.26.1
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-cloud-bigquery==1.28.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-cloud-bigquery 1.28.0 (from https://files.pythonhosted.org/packages/49/18/6aeca7e64efb37f4d4fceaed0cbb7cd861416588240b76c4796fe2e3393c/google_cloud_bigquery-1.28.0-py2.py3-none-any.whl#sha256=9266e989531f290d3b836dc7b308ac22b350c4d664af19325bd0102261231b71 (from https://pypi.org/simple/google-cloud-bigquery/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*))
    INTERNAL-PACKAGE-TWO 4.0.5 depends on google-cloud-bigquery==1.26.1
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-cloud-bigquery==1.28.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-cloud-bigquery 1.28.0 (from https://files.pythonhosted.org/packages/49/18/6aeca7e64efb37f4d4fceaed0cbb7cd861416588240b76c4796fe2e3393c/google_cloud_bigquery-1.28.0-py2.py3-none-any.whl#sha256=9266e989531f290d3b836dc7b308ac22b350c4d664af19325bd0102261231b71 (from https://pypi.org/simple/google-cloud-bigquery/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*))
    INTERNAL-PACKAGE-TWO 4.0.3 depends on google-cloud-bigquery==1.26.1
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-cloud-bigquery==1.28.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-cloud-bigquery 1.28.0 (from https://files.pythonhosted.org/packages/49/18/6aeca7e64efb37f4d4fceaed0cbb7cd861416588240b76c4796fe2e3393c/google_cloud_bigquery-1.28.0-py2.py3-none-any.whl#sha256=9266e989531f290d3b836dc7b308ac22b350c4d664af19325bd0102261231b71 (from https://pypi.org/simple/google-cloud-bigquery/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*))
    INTERNAL-PACKAGE-TWO 4.0.2 depends on google-cloud-bigquery==1.26.1
    INTERNAL-PACKAGE-ONE 4.0.18 depends on google-resumable-media==1.1.0
    google-cloud-bigquery 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on google-resumable-media<2.0dev and >=0.6.0
    INTERNAL-PACKAGE-TWO 4.0.1 depends on google-resumable-media==0.6

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

@pradyunsg
Copy link
Member

It’s not so much about having multiple resolvers as it is evolving/iterating on the resolution proceeds and making optimisations to it.

@stonebig
Copy link
Contributor

stonebig commented Dec 12, 2020

With a "--limit-bactracking 1" option it may be easier to catch quickly what is the new wheel or wheel version that derails your CI process.

Some people pay for the CI , so they may prefer an early fail, the planet also.

The constraint.txt file is less practical

@uranusjr
Copy link
Member

I was thinking about the same (although to get what you expect the number of backtracking should be 0 instead of 1?), but "number of backtracking" is even more tightly coupled with the resolver algorithm than "only allow the newest version".

@uranusjr
Copy link
Member

Similarly, I think the branch in the core of the algorithm was the best way to make this change in the upstream algorithm -- I'll add a more specific comment on the other PR for us to chat through there, but I was not attempting to tell the resolver to "only attempt the best option" (though I could see the idea of --upgrade-strategy=require-best indeed being valuable!). resolvelib's failure_causes error reporting is wonderful, and having the list of potential versions which could require backtracking to resolve is very useful for debugging.

But failure_causes is not affected by find_matches. The causes are pairs of (requirement, parent_candidate); the former is passed to find_matches() to get a list of candidates to try. The content shouldn’t change as long as at least one candidate is returned by the provider. Can you provide an example where your patch would give a good error message, but returning only one candidate from find_matches() does not?

@stonebig
Copy link
Contributor

stonebig commented Dec 12, 2020

I took 1 in my example as mainly it's the new version of a wheel that causes pain, so a backtracking of 1 allows me to slip and should handle well the problem till I look to logs 1 or 2 days later

Then when awake or available, I can do a pass with 0 backtracking to make it fail, if the log is not obvious enough . It's just a suggestion, I didn't look in resolver code, so it can be a bad or impractical idea

@uranusjr
Copy link
Member

I actually think limiting backtracking is a good idea, and would be very useful for debugging conflicts. But it is too internal to be exposed as a pip option. Maybe we can make it an (undocumented) environment variable instead, and “brand” it as a mechanism for debugging when you need it, and not a stable thing for people to depend on in the build chain etc.?

@stonebig
Copy link
Contributor

stonebig commented Dec 12, 2020

In my case it would be perfect to have a non-documented or non-stable option, time for the ecosystem to mature/adapt and the resolver to do 3 cycles of refinement. And maybe this intellectual idea won't work.

@TheKevJames
Copy link
Author

Maybe we can make it an (undocumented) environment variable instead, and “brand” it as a mechanism for debugging when you need it, and not a stable thing for people to depend on in the build chain etc.?

I could definitely get behind the idea of an option like this in the short term, as that would help get me and my team unblocked on the current round of "takes too long to resolve" errors. Would something along the lines of pip --use-feature resolver-fail-fast be what you're thinking of, or are you thinking more along the lines of a change to resolvelib directly which, say, reads from a secret $RESOLVELIB_DISABLE_BACKTRACKING env var?

That said, I think there could be some real value in determining how to do this in a more stable and documented way; I suspect that no matter how good this or any future resolvers get there will always be cases where users would benefit from being able to force early exiting (eg. short dev loop, avoid costly CI, porting over old libraries with lots of preexisting libraries, maybe an occasional e2e test, ...) and some cases where they'll want the full resolver to run (eg. especially in dev mode to help them added dependencies correctly!)

Maybe this is something worth making explicit in the Resolver contract between pip and any resolvers meant to work with it? Eg. something like a fail_fast: bool = False argument in the Resolve.resolve() interface which, if overridden, a resolver must use to provide debugging information a exit early? In the case of resolvelib, that would happen to line up with disabling/limiting/whatever the backtracking, but that's all implementation details hidden from pip.

@uranusjr
Copy link
Member

I could definitely get behind the idea of an option like this in the short term, as that would help get me and my team unblocked on the current round of "takes too long to resolve" errors. Would something along the lines of pip --use-feature resolver-fail-fast be what you're thinking of, or are you thinking more along the lines of a change to resolvelib directly which, say, reads from a secret $RESOLVELIB_DISABLE_BACKTRACKING env var?

Something in between. The environment variable can be defined in pip, and resolvelib can take it as an int argument max_backtracks like max_rounds. Evantually, the switch can be made into a pip option, once (if) we figure out how to make it generic without hard-coupling the implementation.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Feb 4, 2021
@oberstet
Copy link

fwiw, what is the latest pip version without this backtracking stuff? I need to get rid of this major PITA now ..

@oberstet

This comment has been minimized.

@pradyunsg
Copy link
Member

pradyunsg commented Feb 26, 2021

@pradyunsg
Copy link
Member

Closing since this hasn't been updated in a while and has significant merge conflicts. Feel free to open an updated PR! :)

Thanks @TheKevJames for the PR!

@pradyunsg pradyunsg closed this Feb 26, 2021
@oberstet
Copy link

@pradyunsg sorry for the tone, I am just "upset" because of the huge pain this incurred on my side. in my world, there are zero upsides (we've been fine with pip), but a huge amount of work resulting from the fallout of this. and I am still on it. anyways, I am pinning our OSS projects to pip v19.3.1 now. current pip is going into some endless loop ..

@pradyunsg
Copy link
Member

Thanks, I understand that and I do empathise.

Please take a look at #9187 (comment), and if possible, provide us with a reproducer for the issue there so that we can investigate why the resolver is doing what it's doing.

@oberstet
Copy link

oberstet commented Feb 27, 2021

I think this is what happens

fwiw, since I don't have time currently to minimize and extract that into a reproducing simple case. the complete pinned deps however are (can't freeze, since it now uses our upstream forks in deps): https://gist.github.com/oberstet/8c1e9b992f4c6fbbf5b96c649322a6c5

  • we have an app with 130 direct and indirect dependencies
  • different direct dependencies in turn depend on different library version (different, an "incompatible", I guess non-overlapping, permissible ranges that is)
  • due to this, our dependency DAG (directed acyclic graph) probably cannot be solved formally
  • I call this situation: confluent incompatible dependencies in DAG with version range limits

now, with the old pip, we just get some warnings. nevertheless, it works. because the version range limits in the libs down in the DAG are too limiting, and so they still technically work. or, that is: the version that happens to be installed works nevertheless.

thus: historically, we've been fighting installation order for those 130 deps, and introduced some manually selected lib version down the DAG in the upper app requirements. to pin the "right" version. this was also a pain.

now, pip resolver can't find a formal solution .. I guess ... because there is none .. and will just work and work, consume CPU, and never continue actually installing deps.

all of that is a big mess.

in my desperation, I now even forked upstream packages, to relax the version limits there.

and finally: I want to simply get rid of a bunch of our deps .. it is too hard to manage.

also, I came to the conclusion that version range limits are particular bad in community libraries that are widely used upstream, and hence have a high probability of running into the confluent DAG issue.

which is one reason why our own libraries have "open ended" limits .. we only specify a lower version bound in the deps of our libraries.

@stefansjs
Copy link

IMHO backtracking should be opt-in not opt-out. I love the idea that pip won't let me install packages that conflict with existing packages. But I cannot use it right now because it can't even resolve that. I'm left with opting out. IMHO, pip should validate that a package can be installed, but not provide any default implementation that undermines installing one package (which is my situation).

Re flag naming: I overall agree with the sentiment of not coupling user-exposed flags to internal implementation details. I would say that my experience with this feature is that I do need to understand the internal implementation to understand why a resolution strategy is taking SO MUCH LONGER and look for a workaround (without much information to go on #9215). It's actually not the algorithm that takes this extra time, but the data dependencies.

A few suggestions for how to name a flag.

  1. just go for --resolver=strict (by default) vs --resolver=backtracking vs --resolver=warn (here strict is kind of a fail-immediately strategy)
  2. --use-feature=dependency-backtracking (or --use-feature=no-dependency-backtracking)

@uranusjr
Copy link
Member

uranusjr commented Jun 1, 2021

IMHO backtracking should be opt-in not opt-out.

Could you provide some backing evidence for this proposal? From my understanding, no mainstream package managers do this.

@gordonwatts
Copy link

I've just encountered this. The error and the cyclical dependency was my fault - but the hang meant I spent a chunk of time going down the wrong avenues to figure out what went wrong (I'm not a newbie, but I'm far from experienced when it comes to the python ecosystem).

Running with the latest stable pip (), on windows, under python 3.9:

(.venv) PS I:\gwatts\Code\irishep\pyhep-2021-SX-OpenDataDemo> pip list
Package    Version
---------- -------
pip        21.1.2
setuptools 56.0.0
(.venv) PS I:\gwatts\Code\irishep\pyhep-2021-SX-OpenDataDemo>

My requirements.txt file is:

(.venv) PS I:\gwatts\Code\irishep\pyhep-2021-SX-OpenDataDemo> type .\requirements.txt
# Standard Stuff
jupyterlab

# ServiceX and coffea, etc., related stuff
servicex_clients

# Special things which are in beta - before release these should be removed!
func-adl-servicex==1.1b1

The issue is that servicex_clients pins a different version of func-adl-servicex. These are libraries I control, so totally my fault (they are all on pypi so the above should work in any environment). But when presented with this, pip just ran and ran and ran and ran. It would be awesome if it would fail quickly, and then I could quickly figure out what went wrong.

Great tool, btw, and I've done lots of work with DAG's, so I know how painful they can be sometimes!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants