Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backtrack if can not activate #5000

Merged
merged 5 commits into from Mar 1, 2018
Merged

backtrack if can not activate #5000

merged 5 commits into from Mar 1, 2018

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Feb 1, 2018

This is a fix for #4347
Unfortunately this too regressed error messages for the case that you specified a dependency feature that does not exist.
@alexcrichton advice on improving the message?

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

Oh dear that's a bummer! It seems like a lot of roads are pointing towards error messages not being that great...

This one I would say though yeah is even more pressing than links in that this comes up quite frequently (typos and such). In that sense it'd be pretty bad if we regressed this error message :(

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 2, 2018 via email

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 5, 2018

It will take some work to rebase/merge this with #4834, and that is a much bigger/more common UX improvement. I think this should be put on hold for that to land and for a plan for error messages to be developed.
@alexcrichton You are right, I have been thoroughly convinced that error messages should happen next. You offered some mentership on that, how do you think I should proceed?

@alexcrichton
Copy link
Member

@Eh2406 certainly! Unfortunately I don't really know how to precisely proceed. I do have a feeling, though, for how to proceed generally!

So prior to #4978 we had a custom error message for the links key which indicated. I think that a great end state would be to end up basically as close as we can to an error message like that where you've got a causal chain through packages and a nice description for what's happening.

As to how to get there... I'm not sure. I think it may also be worth taking a step back and looking at the resolution algorithm from a distance too. Right now Cargo in theory attempts all possible crate resolution graphs. I mean like all possible graphs. This can put us in one of two situations:

  • At least one graph is "successful" and satisfies all the constraints. In this situation Cargo should never return an error and eventually reach such a graph.
  • All possible graphs would lead to a resolution failure, so Cargo should return an error.

Here we're most interested in the latter situation, but it also means there's no one error for us to return. In some sense we tried everything and it could have all failed for a number of reasons. I think we could probably optimize to return the "shortest" error perhaps? (for whatever definition "shortest" actually means)

There's also the separate bug that Cargos' backtracking and exploration of the problem space takes forever, but hopefully #4834 can help with that!


Hm so that's unfortunately not super actionable, but is that helpful at least to get started?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 5, 2018

Thanks for the pointer to the links code, I will go and read that. I like the idea of having a good causal chain for the cases when we can.

I think it may be useful to break it down into more situations.

  • The tree is trivially solvable, even if it is big. This is common, For example Servo, when I tested it, does No backtracking at all. -> No error messages.
  • The tree is solvable, given a exponential amount of time. Make resolution backtracking smarter #4834 will make it much more rare, But the problem is NP complete so this is always a possibility. -> We should print some kind of diagnostic to help the programer make it easier for us. Ether at some time into resolution like we do Resolving dependency graph... Or wen we bail after a user configured number of time as you suggested in Abort crate resolution if too many candidates have been tried #4066
  • The tree has no solutions. -> We need to report something useful to the programer.

We should keep in mind that the most common reason for any kind of error is a simple typo.
And that the user's mental model does not include millions of permutations of ancient versions of the dependencies.

The first time we discover that the tree is not trivial we are in the state closest to the user's mental model of just take the newest version of everything. And almost anything we print will be helpful in discovering that it is just a typo. So I think we should print a good causal chain for the backtracking to the user, and continue in the serch.

Another huristick if something deep in our tree is unsolvable, then we probably hit almost the same error message lots of times. For example We depend on A=* (with 100 versions to choose from) which depends on B=* (with 100 versions to choose from) which depends on C=* all versions have been yanked, If we printed all errors (a thout experiment) We will have 10k "can't find a candidate for C ...." and just 100 "can't find a candidate for B ...." and 1 "can't find a candidate for A ...." The most common one is the most helpful. So maybe 10 minutes into resolution we can print "we are still working on finding something that works but, here are the 3 most often encountered difficulties:" each with a full a good causal chain for the backtracking.

All of these thoughts depend on being able to print the message as part of some context, and then continue with the serch. How do I do that? I could make a function that returns the errors as strings. But the words "errors as strings" makes me very nervous.

@alexcrichton
Copy link
Member

All excellent points! I think we should take it as an assumption that if we're going to produce a human readable error then we can reach that conclusion in a reasonable amount of time. In that sense I hope that #4834 will get us closer to that goal but we may have other bugs of "Cargo goes in the weeds too much". I don't think, though, that such a situation should impact the design of the error messages here.

Now I will say we have two other tricks up our sleeve for "gee resolution is taking forever":

  • First we can actually inform the user what's happening. Recently Cargo changed to do this and will print out, after I think a second or so of resolution, "Resolving crate graph...". It's not much but it at least let's the user know we're not stuck in network traffic.
  • Second we should probably implement Cargo just giving up at some point. If Cargo has attempted a million graphs and they're all failing, it's doubtful that the million+1th graph will work. This is sort of covered by Abort crate resolution if too many candidates have been tried #4066 where we should (a) have a hard limit on the amount of time Cargo takes during resolution and (b) when this error happens mention something like "hey this may be a bug in Cargo, just letting you know"

In that sense I think your second and third cases would fold into one another. I definitely agree that mistakes tend to be common typos before we get to serious "oh dear what to do" scenarios. Despite that though if Cargo is indeed successful I don't think that we should print out anything (even if we hit errors along the way). I think that only if we end up failing entirely should we print an error.

We could certainly explore though a more progress-bar like situation for Cargo where it prints incrementally what it's doing for debugging perhaps?

All of these thoughts depend on being able to print the message as part of some context, and then continue with the serch. How do I do that? I could make a function that returns the errors as strings. But the words "errors as strings" makes me very nervous.

I think it's sort of encoded as CargoResult or weird options in resolve today, but I think we should stick with that. An error happening only happens in the "slow" case where "slow" means that you've modified or don't already have a Cargo.lock. On the fast path we'll never hit an error because the Cargo.lock means we'll resolve precisely to the same one as before.

In that sense I wouldn't be too worried about making errors expensive to compute, it hopefully isn't too bad! We can of course profile and test on Servo though to test this out.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 5, 2018

Ok, I think we mostly agree and if we don't it is because we are talking past each other. I think I need to start thinking more "incrementally". I should start with reading the old 'links' code and seeing if I can make the current error messages better. Only then come back to the larger questions of when to print and for witch backtracks.

@Eh2406 Eh2406 mentioned this pull request Feb 14, 2018
@Eh2406 Eh2406 force-pushed the i4347 branch 2 times, most recently from c1abbab to 4ece446 Compare March 1, 2018 04:40
needs a test where we have an activation_error the then try activate something that dose not work and backtrack to where we had the activation_error then:
- Hit fast backtracking that go past the crate with the missing features
- Or give a bad error message that does not mention the activation_error.
The test will pass, but there is code that is not yet justified by tests
... required by package `foo v0.0.1 ([..])`
versions that meet the requirements `*` are: 0.0.1

the package `bar` depends on `bar`, with features: `bar` but it does not have these features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm shouldn't this read "the package foo" instead of "the package bar"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it is fixed. Thanks. (Sorry about all the force pushing.)

package `bar v0.0.1 ([..])`
... which is depended on by `foo v0.0.1 ([..])`

all possible versions conflict with previously selected packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this clause be possible to remove in this scenario? I think it's more applicable to version conflicts than feature conflicts, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@Eh2406 Eh2406 force-pushed the i4347 branch 2 times, most recently from d0a828d to 26b96a4 Compare March 1, 2018 18:23
@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 1, 2018

I think this is finally ready to go.

Some things I did not figure out how to test:

  1. Where to clone BacktrackFrame. I think I have this right. I think my initial version could have had O(n^2), but I could not get a test case that proved it.
  2. Adding conflicting_activations to the BacktrackFrame. I think this is needed as I described in the commit, but I could not get a test case that proved it.

@alexcrichton
Copy link
Member

Nah this looks great to me, thanks @Eh2406

@bors: r+

@bors
Copy link
Collaborator

bors commented Mar 1, 2018

📌 Commit 2cbd1dd has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Mar 1, 2018

⌛ Testing commit 2cbd1dd with merge 382967a...

bors added a commit that referenced this pull request Mar 1, 2018
backtrack if can not activate

This is a fix for #4347
Unfortunately this too regressed error messages for the case that you specified a dependency feature that does not  exist.
@alexcrichton advice on improving the message?
@bors
Copy link
Collaborator

bors commented Mar 1, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 382967a to master...

@bors bors merged commit 2cbd1dd into rust-lang:master Mar 1, 2018
@Eh2406 Eh2406 mentioned this pull request Mar 2, 2018
bors added a commit that referenced this pull request Mar 2, 2018
missed this important bug

In the PR #5000 I finished and we merged yesterday I missed a bug and left in an outdated comment.

@alexcrichton
@Eh2406 Eh2406 deleted the i4347 branch March 6, 2018 20:46
bors added a commit that referenced this pull request Mar 14, 2018
Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly.

This work is inspired by @alexcrichton's [comment](#4066 (comment)) that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack `O(versions^depth)`  times. Even tho it is fast to identify the problem that is a lot of work.

**The set up:**
1. Every time we backtrack cache the (dep, `conflicting_activations`).
2. Build on the work in #5000, Fail to activate if any of its dependencies will just backtrack to this frame. I.E. for each dependency check if any of its cached `conflicting_activations` are already all activated. If so we can just skip to the next candidate. We also add that bad `conflicting_activations` to our set of  `conflicting_activations`, so that we can...

**The pay off:**
 If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep, `conflicting_activations`) to the cache so that next time our parent will not bother trying us.

I hear you saying "but the error messages, what about the error messages?" So if we are at the end `!has_another` then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user.

I added a test in the vain of #4834. With the old code the time to run was `O(BRANCHING_FACTOR ^ DEPTH)` and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.
@ehuss ehuss added this to the 1.26.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants