Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort immediately when there's a build failure, instead of backtracking #10655

Closed
pradyunsg opened this issue Nov 12, 2021 · 39 comments · Fixed by #10722
Closed

Abort immediately when there's a build failure, instead of backtracking #10655

pradyunsg opened this issue Nov 12, 2021 · 39 comments · Fixed by #10722
Labels
C: build logic Stuff related to metadata generation / wheel generation C: dependency resolution About choosing which dependencies to install type: enhancement Improvements to functionality
Milestone

Comments

@pradyunsg
Copy link
Member

If you try to install a package that does not have wheels for your platform and you don't have the relevant build dependencies for it, pip will backtrack until it finds a version that it can build OR it has exhausted all the available versions of that package.

This can be especially painful when you're missing the build dependencies (in case of https://discuss.python.org/t/pip-without-setuptools-could-the-experience-be-improved/11810, it is setuptools) but in other cases, it can be a compiler / C library etc.

Would it make sense to fail immediately, instead of backtracking in these cases?

@pradyunsg pradyunsg added type: enhancement Improvements to functionality C: dependency resolution About choosing which dependencies to install state: needs discussion This needs some more discussion C: build logic Stuff related to metadata generation / wheel generation labels Nov 12, 2021
@pradyunsg
Copy link
Member Author

This can be especially painful

To elaborate on this: here's a thought experiment -- a Cython-based project my-cython-wrapper that doesn't have wheels for your platform, which fails when trying to link the compiled package because you don't have the required C library. Each failure involves a download, building the isolated environment, metadata generation, doing a Cython compilation, compiling a bunch of C/C++ code and then... it fails. And then pip will go to an older version and try again, going all the way until it exhausts the available versions of my-cython-wrapper.

@DiddiLeija
Copy link
Member

This could be good default behavior. I have experimented this issue in the past -- and yes, it can be really painful. I want to suggest some ideas:

  • A non-default option to "backtrack instead of crashing", if the user prefers to backtrack and re-build everything. Maybe something like --backtrack-build-failure?
  • An explanative documentation about this behavior. When the build abort, we could add a link that redirects to that page, so users won't get confused and open innecessary issues on GitHub...

@pradyunsg
Copy link
Member Author

I mean, I think we'd need to have this go through a pass-a-flag-to-opt-out period.

As for the documentation, I think we should be able to get away with a clear-enough error message. With the progress I've made on #10421 so far, I'm pretty confident that we should be able to get most of the way there with just the error messages.

@pfmoore
Copy link
Member

pfmoore commented Nov 12, 2021

Would it make sense to fail immediately, instead of backtracking in these cases?

But doesn't that mean that if package X has wheels for your platform at version 1.0, and they release 2.0, initially without wheels because building on your platform is a PITA (ahem Windows), then rather than getting the 1.0 wheel you'd start getting failures? That seems like a rather bad regression...

@pradyunsg
Copy link
Member Author

Yea... the "nice" thing about that though, is that it is an immediate failure where you have a clear actionable step, rather than a long-winded one, or something that subtly hides issues (eg: newest version of $package added a C dependency that you don't have, and you're quietly getting the older version even though you didn't intend to).

It's a much more explicit failure model (and matches what we had prior to the backtracking resolver, in a good way).

@pfmoore
Copy link
Member

pfmoore commented Nov 12, 2021

I'm still confused. At the moment, using the X example I gave above, if I say pip install X, I get X. Maybe not the latest version, but I get it. You seem to be suggesting that it's better for me for pip install X to suddenly start failing, and... what? I have to say pip install X==1.0 and keep an eye on when X release a 2.0 wheel so that I know when to remove my pin?

Maybe that works if you're talking about a dependency somewhere deep in a project's requirements, but it seems to me that it sucks for requirements specified directly on the command line.

And yes, IMO it was bad when this happened with the old resolver, too.

@pradyunsg
Copy link
Member Author

pradyunsg commented Nov 12, 2021

You seem to be suggesting that it's better for me for pip install X to suddenly start failing,

No. I don't think it's "better". I agree that it's still suboptimal. I also don't think we have any "globally optimal" approach here.

To try and rephrase what I said earlier -- we're operating in tradeoffs here, and I think having an immediate failures in all of these cases (both the example you've picked as well as the ones I've provided) is better than having some of them work with backtracking while also having long-winded "resource" intensive failures in other cases.

@pfmoore
Copy link
Member

pfmoore commented Nov 12, 2021

OK, I get your point now. However, I don't think either solution is ideal, and given that's the case, I would prefer to stick with what we have for now, until we work out a genuinely better solution. I do think there's a better solution waiting for someone to discover it, so I'm optimistic that "explore options" is not going to be wasted time.

If you try to install a package that does not have wheels for your platform

Maybe we should make --prefer-binary the default, then? That way, if there's no need to build we won't build. Users won't get newer versions unless they opt into building sdists, or the project gets round to releasing wheels, but that seems like a reasonable compromise. The assumption is that projects don't release wheels for one version, and then stop - but that seems like a reasonable assumption to make, TBH. Obviously this would need a transition period - and it would effectively signal that pip considers installing from wheels to be the "normal approach". That's something that I, personally, am in favour of, but it might not match everyone's preference.

Or maybe we only ever do a single sdist build, and if it fails we switch to --only-binary mode, on the assumption that it's a missing build toolchain, rather than a single broken sdist?

As you see, there are lots of plausible (at least plausible to me 🙂) approaches to explore, so if this issue is "Explore alternatives to repeatedly trying to build older and older sdists" then I'm on board, but if it's specifically "Is abort immediately the approach we should use" then my answer is "no, I'd prefer something else".

One fundamental problem here is that we have no way of knowing how likely it is that building a sdist will work. So the best we can do is work on assumptions and heuristics:

  • "Wheels are the main mode of delivering packages" leads to "--prefer-binary as default".
  • "If it fails once, it's likely that it will fail every time" leads to "only do one build".
  • "The users have said what they want and we should do whatever it takes to deliver it" leads to the current behaviour¹.
  • I'm not 100% sure what you'd suggest leads to "abort on build failure" - maybe "users want the latest version or nothing"? But I fear that my biases are showing here, maybe you could come up with a better phrase that explains your approach?

Whatever we do here, we need to be very careful how we report what pip is doing to the user. "No suitable versions found" is a bad message when the user can see there's a valid wheel for version N-1. Or when there's two sdists and the user knows the older one built, but there's a compiler-dependent bug in the new one. We also need to tell the user how to get pip to skip a sdist that won't compile, if we do go with "stop on a build failure" - and I'm not at all clear how we'd do that ("X != 2.0 unless there's a wheel" isn't a valid requirement 🙁).

¹ Of course the underlying problem here is that when people say they want "foo", they probably don't actually mean that - and a 10-year old version of foo wasn't what they were after. Which is just another way of saying "if you lie to the computer, it will get you"... But punishing people for assuming that the computer will behave reaonably isn't a good answer, either.

@pfmoore
Copy link
Member

pfmoore commented Nov 12, 2021

Also, do we have to have a "one size fits all" solution here? Maybe keep searching (or build once, or prefer binary) if it's a top-level requirement, but abort if it's a dependency somewhere down the tree? That has its own problems but might be a viable compromise.

@pradyunsg
Copy link
Member Author

One of the problems with prefer-binary is that we'd prefer to get an older version even if a newer version would build and work successfully. :)

@pradyunsg
Copy link
Member Author

However, I don't think either solution is ideal, and given that's the case, I would prefer to stick with what we have for now, until we work out a genuinely better solution.

I promise I've read the entire post, and... the thing I wanna push back against is the first sentence.

Why is it not better to have eager failures, at a point where we can provide clear information about what the failure is?

  • "A non-Python build dependency is missing", then they have a clear actionable information immeditely -- install it and try again!
  • "the latest version of X doesn't support my system", then they have a clear actionable information immediately -- look up which is supported, and get a version that is supported.

I understand that this means that pip install X won't get you X if the latest version doesn't support it, and that you'd need to be more specific -- but surely that is not a worse-enough experience to justify requiring other users (or the same user later!) to wait on building all the sdists of a package. Note that we don't just have interactive users, we also have automated users where there's no easy way to interrupt an execution -- and especially in those case, an eager failure would be much better than non-eager failure.

We're operating in trade-offs, and I think we do have a better-than-status-quo answer here -- it isn't perfect, but it's certainly better IMO.

@pfmoore
Copy link
Member

pfmoore commented Nov 12, 2021

Why is it not better to have eager failures, at a point where we can provide clear information about what the failure is?

Why is it not better to have a success rather than a failure? I'm not arguing that the status quo is good. Just that we shou;ld look for a better solution than just failing at the first problem.

I get that prefer-binary may be too big of a change. But what's wrong with trying to find a binary that works after a build failure, and only erroring if that doesn't work? "Cannot build foo X.Y, trying to find older wheels that work... none found" now they have clear actionable information and an assurance that they need to make the build work if they want to install this package.

I think we're going to have to disagree here, let's see what others have to say.

@edmorley
Copy link
Contributor

edmorley commented Nov 12, 2021

I don't think it should be the responsibility of pip to try and protect the user from possible failure, when the user has chosen not to pin to a specific version or range. By running pip install foo the user has chosen to install the latest version (supported by their Python version), and in the spirit of explicit is better than implicit, failing early seems preferable if that version cannot be built.

As an end user, I would much rather get a failure (if I've made the trade-off not to pin/specify a version range), so I know to either (1) start pinning, (2) install a new build dep, (3) report a potential bug to the project.

@notatallshaw
Copy link
Member

Isn't this the same discussion that's here #9764 ?

Wasn't this the behavior when the "new" resolver was implemented and it was changed because of overwhelming user complaints?

@DiddiLeija
Copy link
Member

DiddiLeija commented Nov 13, 2021

I think you're right. I consider we can restore this behavior, and add a permanent flag to use backtracking instead of failing. Or we can just implement a well-explained error message, as @pradyunsg proposed. In both cases, the user complaints should be satisfied.

@pradyunsg
Copy link
Member Author

pradyunsg commented Nov 13, 2021

#9264 seems to be where we introduced this behaviour. I find it amusing that this was a concerned I raised then as well, and that both Paul and I are consistent humans. :)

As I said, we're operating in tradeoffs, and what @pfmoore thinks is important doesn't match with what I think is more important, which is why we disagree! One way we can go about this is gathering user data -- I don't think we'd had concrete user feedback that status quo is the "better" behaviour, or whether what I'm pushing for here would be "better", or there's something else we can do entirely!

I think we can do a round of "ask a representative set of users" with a survey question, that we're happy with that, can get us closer to answering this! If this sounds reasonable, I can go about drafting a survey.


FWIW, I've not looked carefully but "lemme look for compatible wheels" seems... complicated to implement; even if it's viable. For that to work though, at the point where we're failing, we've already got a pinned link/editable candidate and will need to have somehow get direct access to the candidates and format control.

@pfmoore
Copy link
Member

pfmoore commented Nov 13, 2021

lol I'd forgotten that earlier discussion. Agreed we really need more concrete information.

I know that we've had complaints about pip backtracking through all versions of a package, so that lends support to @pradyunsg's argument that doing so is bad. So I'm willing to accept that blindly trying all sdists is not the right approach, even if it is technically what the user asked for.

There's also #10617 to consider here. @pradyunsg's proposal would have done the right thing for py2exe users on Python 3.10 in that case.

I'm starting to come around to thinking that this idea might be OK. But I'm still very bothered about explaining it:

  1. In the documentation, where I don't believe that currently we even mention that pip needs to build a sdist to extract the dependencies, so it's difficult to even express the problem as things stand ("why is pip even building if it's not the right package version?")
  2. In the error message, which bluntly is all that most people will read, no matter what we do. There's a lot to explain here - why we didn't bother looking any further (when maybe the user can see a perfectly good wheel for the previous version), how to tell pip to use the older version (if it's a dependency, the user might not be able to change the requirement so might need to use a constraint file), how to make building the sdist work (what if it's not a missing compiler? We can helpfully suggest things, but we don't really know and users will come to us saying "I installed a C compiler, and it's still failing").

Even though we say in the docs "the pip team cannot provide support for individual dependency conflict errors", we still get lots of bug reports asking for exactly that. And the support burden of reviewing (and maybe answering, in spite of what we say in the docs) those issues is non-trivial. The issue "pip tried to build hundreds of versions of my package" might be unwelcome for the user, but it's far easier to diagnose and respond to than "pip tried to build the wrong version of my package" (I bet that sometimes, the user won't even mention that the build failed). Certainly, choosing the worse behaviour because it's easier to support is a bad path to take, but is a better behaviour which results in maintainers deciding to take a hard line on "it's not a pip bug, resolve your own dependency conflict error", really better in the long run?

FWIW, I've not looked carefully but "lemme look for compatible wheels" seems... complicated to implement; even if it's viable.

Looking at #9264 and the current source, Factory retains a list of build failures, and _make_candidate_from_link does

if link in self._build_failures:
    # We already tried this candidate before, and it does not build.
    # Don't bother trying again.
    return None

It shouldn't be impossible to extend that to record the name of the package and also skip any other sdists for that package without trying the build. But given what I've said above, I'm no longer going to fight for a more nuanced solution than "abort immediately on build failure". I just wanted to point out that it's possible to do this.

Ultimately, though, this all comes down to the fact that we don't have reliable static metadata for sdists. Getting PEP 643 ("Metadata for Package Source Distributions") implemented, at least in setuptools, would be a huge step forward here. Although it would do nothing for older sdists, we could categorise them as "legacy" at which point a degraded experience is much more justifiable.

@pradyunsg
Copy link
Member Author

pradyunsg commented Nov 13, 2021

FWIW, I don't know if you've noticed: one of the things I'm doing in #10421 is adding notes like "this is an issue with the package, and not pip" (eg: #10421 (comment), https://github.com/pypa/pip/pull/10538/files#diff-a0b86a9499746602572aca9eef86205c4bd5acedf66f936ad60f4cf14f1f2d38R125). This is borrowed from/inspired by npm, and meant to guide users to direct users to investigate why pip install package is not working, rather than pip install package is not working.

I think that combined with clarity about what point in the build process the failure happened, should help make it easier for users to understand failures. This doesn't mean that we'd somehow solve bad errors coming out of other tools (or even all instances of that in pip), but I do think clearer errors coming out of package build failures would go a long way here.

@uranusjr
Copy link
Member

uranusjr commented Dec 9, 2021

Linking in #10719 for a slightly different situation, where the build “succeeds” but produces a bad result.

@hroncok
Copy link
Contributor

hroncok commented Dec 10, 2021

To elaborate on this: here's a thought experiment -- a Cython-based project my-cython-wrapper that doesn't have wheels for your platform, which fails when trying to link the compiled package because you don't have the required C library. Each failure involves a download, building the isolated environment, metadata generation, doing a Cython compilation, compiling a bunch of C/C++ code and then... it fails. And then pip will go to an older version and try again, going all the way until it exhausts the available versions of my-cython-wrapper.

This can also easily fill all the remaining disk space. When I attempted pip install scipy on Python 3.10 before there were wheels, it only ever ended when my disk was full.

@pradyunsg
Copy link
Member Author

So... It looks like the code change for this will be easier than testing this would be. :)

@pradyunsg pradyunsg changed the title [idea] Abort immediately when there's a build failure, instead of backtracking Abort immediately when there's a build failure, instead of backtracking Dec 10, 2021
@pradyunsg pradyunsg removed the state: needs discussion This needs some more discussion label Dec 12, 2021
@notatallshaw
Copy link
Member

notatallshaw commented Dec 12, 2021

Seems I spammed #10722 and should of put my input here.

Beyond what has already been said by @pfmoore I would just like to add that in my experience users often don't understand constraints files, and therefore if building the metadata fails for a specific package as well as telling them that the package (and not pip) failed to build on version N and they should check if they have the right perquisites for that package, pip should probably also tell them that if they want to skip downloading version N they will need to add an addition to the constraints file (not the requirements file [in fact it's still confusing to me why top level requirements don't act as constraints]).

In a similar fashion my experience is most package owners don't know about yanking and the effect it has on pip (probably because pip ignored yanked status until 20.3.1), so it would be also helpful to mention the package could be yanked by the owner if it's problematic.

@pradyunsg
Copy link
Member Author

pip should probably also tell them that if they want to skip downloading version N they will need to add an addition to the constraints file (not the requirements file ).

Fair, but this will not be in the error message. This might make sense in the supporting documentation tho.

it would be also helpful to mention the package could be yanked by the owner if it's problematic.

Agreed, though again, this belongs in the documentation and we shouldn't be printing a wall of text.

[in fact it's still confusing to me why top level requirements don't act as constraints]

What does this mean? What's confusing about them?

$ pip install requests!=2.26.0  # this works, 2.26.0 is the latest version as of writing
$ pip install requests requests!=2.26.0  # this also works

You can absolutely prevent a version from being used by using a requirements file alone; without a constraints file.

@notatallshaw
Copy link
Member

notatallshaw commented Dec 12, 2021

[in fact it's still confusing to me why top level requirements don't act as constraints]

What does this mean? What's confusing about them?

$ pip install requests!=2.26.0  # this works, 2.26.0 is the latest version as of writing
$ pip install requests requests!=2.26.0  # this also works

You can absolutely prevent a version from being used by using a requirements file alone; without a constraints file.

There are scenarios where adding requests==2.26.0 to the top level user requirements will still download (but not install) other versions of requests. I have seen this many times when testing user requirements for backtracking optimizations. I'll take a look and try and find a specific example if you want.

Which makes sense though? There must be some reason have a separate "constraints" rather than taking it from the users top level requirements? But as a user who has read through the pip docs and contributed code I still find it confusing.

@jbylund
Copy link
Contributor

jbylund commented Dec 21, 2021

If I have a package that requires build dependencies a and b for version 1.2.3, but version 1.2.2 only required a? installing on a system missing b then backtracking would actually yield a working solution very quickly. It seems like this is suggesting it would be better to fast-fail than install the older version?

Agree with the previous comment that maybe the focus should be more on how to avoid very expensive exploration than heuristics for early termination.

@pfmoore
Copy link
Member

pfmoore commented Dec 21, 2021

It seems like this is suggesting it would be better to fast-fail than install the older version?

Correct, that's the proposal here. And under the proposed solution, you'd be expected to explicitly say pip install package==1.2.2 if you wanted to install that version rather than explicitly install b. Even if you didn't know which version introduced the requirement for b, you'd need to find that out and specify based on that.

But to put this in context, the expectation is that a situation like this would be very rare. If you have a real-world example suggesting that it might be commoner than people are assuming, that would be useful. But I don't know whether a theoretical example will change many minds here 🙁

@pradyunsg
Copy link
Member Author

pradyunsg commented Dec 21, 2021

Another way to phrase that situation is: you have a constraint which affects what packages can be used but haven't specified it.

Instead of having the dependency resolver stumble upon an answer by chance / find the solution in a suboptimal manner, the proposed behaviour here will force you to specify the additional build constraints external to the metadata yourself. This makes the resolution quicker, installations more easy to reason about and stops masking build-time issues.

If you have a real-world example suggesting that it might be commoner than people are assuming, that would be useful.

I have one example, though it definitely doesn't suggest that it would be commoner: try installing something like numpy on a CentOS 6 VM (manylinux2010?). The newer releases no longer work (require a newer compiler version than the CentOS image has) but an older release that can be compiled using the available compiler will work or because a compatible wheel was uploaded.

Now, I think it's better to get a failure in 20 seconds stating the compiler error; instead of a working install after 20 minutes of walls of red -- especially since this situation is basically indistinguishable for pip compared to "you don't even have a compiler installed" or "this package isn't supported on Windows" -- and in both those cases, we don't have any benefits from backtracking.

@pradyunsg
Copy link
Member Author

Picking your example: Imagine that the package is a security related project where you do want to stick to the latest version. Of course, this whole scenario requires source builds so... Let's assume that is required here. Either because of no-binary or because the package author thinks that bundling underlying system dependencies those into a wheel (effectively pinning the version) isn't a good idea.1

In this situation, the current behaviour silently gives you an older version of the package. You have no way to affect that or detect that. There's no way to easy check that this is happening, other than pip's output, which has no formatting guarantees. There's workarounds/ways around this, which move you from reactive to proactive (which, idk, may be good or bad) : compare the version post-install by having a separate step for validation, or proactively monitor releases with pins.

In this case, I think it's better to know that your configuration is no longer supported by the latest release (as will probably be evident from the failure output from setup.py egg_info) than it is to get a working configuration because we backtracked on a build failure.

Footnotes

  1. I know from $work that there's at least one such package on PyPI; but I'm gonna avoid naming the specific package here.

@pfmoore
Copy link
Member

pfmoore commented Dec 21, 2021

In this case, I think it's better to know that your configuration is no longer supported by the latest release

(I know we're just going round the same loop again, but...) given that this is quite specific to the case you describe, would it not make more sense to have the behaviour be opt-in? So you do pip install --need-latest $security_project? It seems to me that the case you describe is an entirely valid example of where the proposed behaviour would be useful. But it's not an argument for making it the default.

@edmorley
Copy link
Contributor

edmorley commented Dec 21, 2021

For me, this comes down to:

  1. When I run pip install <package name> (with no version specified) I pretty much expect the latest version of the package [*], not "latest version that builds with the build deps I happen to have installed".
  2. Explicit vs implicit. I don't see it as a bad thing for pip to require the user to specify their intentions (eg via pinning to older dep, or by installing the missing build dep) more explicitly in ambiguous/higher-chance-of-being-broken situations.

[*] So the exception here is of course when a package defines a minimum supported Python version via python_requires. However I think the python_requires case is different, since:

  • if I'm running EOL Python I can't expect to necessarily get the latest versions of things,
  • using an incompatible Python version is less likely to be something that the user would want to be made aware of so they could immediately correct (vs a missing build dep where that would be surprising and likely unintentional)
  • the Python version compatibility decision can be made quickly (vs 20 mins of failed builds for a missing build dep)

@edmorley
Copy link
Contributor

edmorley commented Dec 21, 2021

Another scenario where backtracking is undesirable:

Imagine a package where the build process has a step that is fallible and can fail intermittently (eg downloading an asset or build tools during package build time). When backtracking in the case of build failures is permitted, a pip install <package> may install different versions of the package depending on when the network connection happened to be flaky or not.

@pfmoore
Copy link
Member

pfmoore commented Dec 21, 2021

Another scenario where backtracking is undesirable:

I'd class that as "another scenario where unreliable builds are undesirable" :-)

More seriously, I don't think there's any way we can reasonably identify why a build might have failed, so we have to look at the question from the point of view of what's the "right" behaviour if a build fails, and pip has no other information available as to why the build failed (even if the user might, from reading the build output).

The proposal here is that we basically assume that every other version of the package will also fail to build, and furthermore that any wheels available for older versions of the package are unacceptable. I feel that's too extreme. Others feel that it's better in many cases, and making the user explicitly work around the assumption in the cases where it's incorrect is an acceptable price to pay. We don't yet have consensus, but I feel that the general opinion is against me, and I don't feel strongly enough to argue for my position (although I do feel strongly enough to keep wanting to make sure that people understand what they are proposing 😉)

@edmorley
Copy link
Contributor

I'd class that as "another scenario where unreliable builds are undesirable" :-)

Yeah my point is more that pip cannot assume that just since the build of a package failed, that it might not succeed on an immediate retry of the same version. That is, it's not necessarily a correct assumption to move on and try other versions. Just like there is no guarantee that older versions will not also all fail too.

As such, it seems there are just too many unknowns in the "the build failed" case, and so pip shouldn't try and intervene.

@pradyunsg
Copy link
Member Author

pradyunsg commented Dec 21, 2021

(I know we're just going round the same loop again, but...) given that this is quite specific to the case you describe

Of course it's specific -- I was responding to a specific example. What I did here was take an example of the situation that @jbylund put forward, and made the case that there's good reason for someone to want different behaviour where backtracking makes it infeasible to get the behaviour they'd need unless they do a bunch of extra work.

Yes, a flag would be possible for this situation if that were the only situation where it made sense to not backtrack. It's not though.

The proposal here is that we basically assume that every other version of the package will also fail to build, and furthermore that any wheels available for older versions of the package are unacceptable. I feel that's too extreme.

Hmm... Interesting! While your interpretation is valid1 I'd argue that this can be flipped the other way too: assuming that an older version could work and still be desirable is also a flawed assumption.

As the person who put this proposal forward: the proposal here is that "Errors should not pass silently" and backtracking on a build failure (specifically, a metadata generation failure or a wheel generation failure) is exactly that.

We don't have any way to know why a package failed to generate metadata/wheel, so we should not assume that an older version could work (which is the operative assumption when backtracking).

Yes, there are situations where backtracking does get a result. However, it is (a) not sufficiently common IME (b) requires doing work that will necessarily fail in all other scenarios, resulting in a significantly degraded performance/behaviour in those cases (c) silently hides issues that a user might want to be informed about and (d) makes it impossible to account for certain workflows/use cases.

We have no guarantee that an older version will work. Even in cases where it would, I've argued that it is better to fail early and have the user figure that out themselves. It will provide all users a shorter feedback loop -- thing fails, Google it, realise you need a missing library/different older version etc; fix that and move on. Compare that to "why is pip backtracking for 20 minutes and giving me an older version of pandas" + "why is pip backtracking for 3 hours and still failing to install numpy?"

When you don't know why it failed, failing eagerly means that you have a single short failure that you can share with someone who can help you; instead of a 10000+ lines of output where the initial lines are cut out because of limited terminal scrollabck and the final failure is on version 0.0.1 from 2006.

Yes, this means we have that single class of situations that "just work" today will stop working. Yes, we require users to provide additional information (in the form of additional restrictions), to account for things that pip can't know. Yes, this is more work for the user than having the tool just do it for them. This is in exchange for so many other situations getting a better situation though.

At the end of the day, it's a balance of tradeoffs as we can perceive them today. We might all change our minds and realise the current behaviour is actually better once we get more feedback or inputs. That's fine too!

Right now though, I strongly feel that we should lean toward failing eagerly instead of trying really hard to get things done. We'd degrade on fairly specific scenario (I actually can only some older version but I didn't tell pip about this) while accommodating for many more scenarios (eager detection of packaging changes in a new release, less surprising backtracking behaviours, failures don't pass through silently, no backtracking forever when you're missing libfoo, no backtracking forever when you're on the wrong Python version etc).

I'm tired of this looping discussion now, and I'm not going to be looking at this until next year now.

Footnotes

  1. actually, no. It's not valid, because we don't magically disallow the user from trying again with an older versions. They can still try with pandas < 1.3.3 or whatever.

@pfmoore
Copy link
Member

pfmoore commented Dec 21, 2021

I'm tired of this looping discussion now, and I'm not going to be looking at this until next year now.

Agreed, we're getting nowhere. In the interests of getting a resolution, I'll note that I don't have any personal need for any particular behaviour here, so I'll withdraw my objection to this PR.

If, after it's implemented, I find that it causes me actual problems, I'll whine incessantly that "I told you so" 🙂 Feel free to ignore me if I do.

@notatallshaw
Copy link
Member

Is this going to be part of 22.0? I think given objection has been withdrawn it makes sense.

I do think it's worth keeping an eye on pip issues for any regressions and having a standard explanation on how to handle these issues (install dependencies based on install error, else use constraints file to prevent certain versions from installing).

@astrojuanlu
Copy link
Contributor

My 2 cents: I'm wondering how much of this family of problems would be alleviated by

  1. Giving more information on why the backtracking happens when it starts, instead of having to wait. Something like "pip found that package A==1.0.0 is incompatible with previously requested requirement B==2.0.0, trying older versions of A". Whenever I see that pip is backtracking (example) and the message INFO: pip is looking at multiple versions of X to determine which version is compatible with other requirements. This could take a while, I always think "oh damn, how will I find out what the problem was now?"
  2. Try to separate the use cases depending on whether it's a "user installing a package on their laptop" or a "continuous integration service is rebuilding an environment", because they seem quite different to me.

@notatallshaw
Copy link
Member

1. Giving more information on why the backtracking happens _when it starts_, instead of having to wait. Something like "pip found that package A==1.0.0 is incompatible with previously requested requirement B==2.0.0, trying older versions of A". Whenever I see that pip is backtracking ([example](https://github.com/poliastro/poliastro/issues/1435)) and the message `INFO: pip is looking at multiple versions of X to determine which version is compatible with other requirements. This could take a while`, I always think "oh damn, how will I find out what the problem was now?"

There's an existing PR for this: #10258

Though if this landed this particular situation would never start backtracking so it's not as relevant.

@pradyunsg pradyunsg added this to the 22.0 milestone Jan 20, 2022
@pradyunsg
Copy link
Member Author

I might end up having time to tackle this before the release. There's a decent chance that this misses the 22.0 release window though, since my time is fairly limited.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: build logic Stuff related to metadata generation / wheel generation C: dependency resolution About choosing which dependencies to install type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants