New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 1977: public & private dependencies, as it relates to the resolver #6129

Open
Eh2406 opened this Issue Oct 3, 2018 · 9 comments

Comments

Projects
None yet
2 participants
@Eh2406
Copy link
Contributor

Eh2406 commented Oct 3, 2018

This is a sub part of the discussion of implementation of RFC 1977, specifically how the resolver could implement this protocol. The main Tracking issue is at rust-lang/rust#44663.

This comment rust-lang/rfcs#1977 (comment) described the properties we want to uphold the way that fits best with my brain.
So to use the termanology, I feel like when we are adding a packedge B we need to walk up the dependency tree to all potential C's that can see the addition to test if there is a conflicting A.
That can be done pretty fast if we have a fast way to walk up dependency tree and a fast way to see all the pub reachable packages for each ancestor. (Neither of which we have at this time.)
But that feels like a lot of state to be cloned for each tick.
Currently we avoid the expensive clones with extensive use of RC.

Is it time to add a im-rs dependency?
Is there a better algorithm?
Is there a small part that would make a good starting PR?

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Oct 5, 2018

Adding new dependencies seems fine by me! My historical strategy with the resolver has been "don't add any fancy representation tricks, profile Servo, add tricks as necessary"

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 17, 2018

I have a branch that stubs out getting this working. It involves all the ugly things I can think of. The state is a mess of big nested data structures. The algorithm is a bunch of nested loops in the hottest part of the resolver. It is not even checking for redundant work. The code is mostly repeated between the code that check if a candidate can be added and where the the candidate is added. Several existing optimizations are completely removed in the case that a pub-dep has ben looked at. The error messages are just "Todo", and most of the commit messages are "oops".

But it is working! I modified the passes_validation to check for pub-dep conflicts, then modified the registry_strategy to make them. It promptly generated a failing case. Copy the result to a standalone test, hand minimize, and add an explanation. Add something, anything to get that test passing. Then rinse and repeat. Lets just say that the fuzzer is very good at finding every optimization that this can react (badly) with. Eventually it passes!

Next try the limited_independence_of_irrelevant_alternatives and if finds a case of a registry that should work, but doesn't. Then rinse and repeat 3 more times. Finding backjumping optimization that need to be handled more correctly, disabled for now. But with the backjumping optimization disabled, it now finds a case that experiences exponential blowup.

So my plan for the Impl day at RBR is to attempt to get backjumping working, and clean my history into something submittable.

edit: the cleaned up history is at https://github.com/Eh2406/cargo/tree/pub-dep

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 19, 2018

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 24, 2018

So at RBR I got the history cleaned up. Alex and I reviewed the progress so far and a plan for how to re enable backjumping without losing correctness.

That backjumping strategy was implemented today. Unfortunately, the fuzz tests are finding kasese of exponential blowup (50_000 ticks). I suspect it is similar to #5213, in which we backtrack to where ConflictReason no longer applies without getting closer to the gole. Not that I really understand it well enough to explain / make a test case.

On a different note I also ran the other resolver tests, we have several that are hitting the 30 sec timeout on this branch. So we will need to work on per tick performance before this can merge.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 24, 2018

Good news If you run proptest overnight with --release it does an ok job of minimizing the problem (~100 lines, hand minimized to ~30). Bad news, I no longer think the backjumping strategy that I and Alex came up with is correct. I.E. it will reject cases it should except. In addition to being incomplete, allowing exponential blowup.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 29, 2018

I am (for now) ignoring the fact that the current strategy will fail to resolve when there is a valid solution in rare circumstances. I have cleaned up the case of exponential blow up, so I understand why it is happening. What I don't understand is why It is not happening using normal deps. (it is a lot harder to write, but should be possible.) I am investigating that, hoping that I will either find an optimization I forgot to add to pub/priv deps, or a case that can be reproduced on master.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Oct 29, 2018

I can now confirm that the normal deps gets saved by #5213, where the pub deps do not.

Unfortunately, a simple solution did not work. I think doing something special for PublicDependency is not going to work. But I worry that doing the obviously correct thing will just move the slowenes around. I guess jump off that bridge when we come to it. here gose.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Nov 13, 2018

I implemented the "obviously correct thing", and the fuzz tests pointed out that it was not backtracking correctly. I assumed I had done something silly, but could not find it. Then I got distracted by #6258. Coming back to it the minimized test was correctly pointing out that the "obviously correct thing" will not work. It is possible to make a pub-dep violation, without changing Activations, by changing which previously activated version a dependency resolves to. So any iplomantation of is_conflicting that just looks at is_active will not backtrack correctly in the presence of pub-dep. To complicate things that implementation detail of is_conflicting was just used for a new data structure in #6283, making it much harder to change.

bors added a commit that referenced this issue Nov 21, 2018

Auto merge of #6336 - Eh2406:im-rs, r=alexcrichton
Persistent data structures by im-rs

There has been a long standing "TODO: We should switch to persistent hash maps if we can" in the resolver. This PR introduces a dependency on [im-rs](bodil/im-rs#26) to provide persistent hash maps. It then uses `im_rc::OrdSet` to store the priority list of dependencies, instead of `std::BinaryHeap`, as cloning the `Heap` was one of the most expensive things we did. In Big O terms these changes are very big wins, in practice the improvement is small. This is do to a number of factors like, `std::collections` are very fast, N is usually only in the hundreds, we only clone when the borrow checker insists on it, and we use `RC` and other tricks to avoid cloning.

I would advocate that we accept this PR, as:
- Things are only going to get more complicated in the future. It would be nice to have Big O on our side.
- When developing a new feature finding each place to add `RC` should be an afterthought not absolutely required before merge. (cc #6129 witch is stuck on this kind of per tick performance problem as well as other things).
- As the code gets more complex, making sure we never break any of the tricks becomes harder. It would be easier to work on this if such mistakes are marginal instead of show stoppers. (cc #5130 a bug report of one of these bing removed and affecting `./x.py build` enough to trigger an investigation).
- The resolver is already very complicated with a lot of moving parts to keep in your head at the same time, this will only get worse as we add  features. There are a long list of tricks that are *critical* before and `~5%`  after, it would be nice to consider if each one is worth the code complexity. (I can list some of the ones I have tried to remove but are still small wins, if that would help the discussion).

Overall, I think it is worth doing now, but can see that if is not a slam dunk.
@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Dec 8, 2018

I have a branch where I have implemented both versions simultaneously. Like the "obviously correct thing" it records every PID in the path from the dependency that can see the conflict to both conflicting crates. Like the strategy Alex and I came up with at RBR, it ensures the parents relationship is still active when the Back Jumping occurs. Because it looks at the dependency tree it passes the test the previous strategy failed. Because it will records all the PIDs involved in conflict it can learn like #5213. Unfortunately, because it adds a lot of things to the conflict it is a highly efficient way of creating a test case like #6283. So efficient that, In debug the fuzz test only tell me that it hit the 50k limit, In release the fuzz test only tell me that there are cases that took longer than 90sec to resolve!

I have to say, adding new features to the resolver has gotten a lot harder now that we have fuzz tests making sure that they are correct, complete, and efficient. :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment