Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge incompatibilities smartly #19

Open
Eh2406 opened this issue Oct 6, 2020 · 13 comments
Open

merge incompatibilities smartly #19

Eh2406 opened this issue Oct 6, 2020 · 13 comments

Comments

@Eh2406
Copy link
Member

Eh2406 commented Oct 6, 2020

There is a really cool part of the algorithm that makes things more efficient and makes for better error messages. Unfortunately it is not clear when it is correct to use it. https://github.com/mpizenberg/pubgrub-rs/blob/master/src/internal/incompatibility.rs#L209

cc #14 (comment)

What does the Dart version do? When and how do we want do this?

@mpizenberg
Copy link
Member

For context, I had written this comment about the mutability danger before the incompatibility_store existed. I think now that it exists, it's not a problem to remove incompats anymore. Especially merging NoVersion with NoVersion is what we should always do I think. If we go for such kind of optimizations mentioned there, it will concern at least both that merging function, and the prior_cause one. In particular, in the prior cause function, we could replace in place the incompatibility for maximum efficiency I think. Has to be checked of course!

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 7, 2020

I don't fully understand the distinctions between incompatibilities and incompatibility_store and so dont get how it helps here. Can you elaborate?

@mpizenberg
Copy link
Member

mpizenberg commented Oct 7, 2020

Ah yes. So there are two kind of incompatibilities here (and I'm not talking about external vs derived), the incompatibilities that are in the set of tracked incompatibilities (those described by the PubGrub algorithm, here they exist in both state.incompatibilities and state.incompatibility_store), and the ones that are not (they only exist in incompatibility_store).

All external incompatibilities are tracked but some derived are not. If you pay close attention to the conflict_resolution algorithm in the core module, you'll see that the goal is to find the "root cause" of a conflict. For this, we go back to find the "prior cause" until such cause was a decision, or a derivation where the previous satisfier level and the satisfier level are different. That is a terminal case for the loop and we return that incompatibility as the "root cause" of the conflict. So all the intermediate prior causes built during the loop, that will eventually end in the root cause, are incompatibilities that are not added to the set of incompatibilities of the system. But they must exist somewhere in order to be able to rebuild the derivation tree of that root cause if necessary.

When I was in Elm those lived somewhere managed by the GC. But in Rust they need to live somewhere. So I tried 3 designs. Two of them are using indices in the incompatibility to refer to its two parents. Those indices are their position in the incompatibility store. All incompatibility ever created are pushed into that store, even the intermediate ones generated during non-terminal steps of conflict resolution. That "store" has no equivalent in the pubgrub algorithm since its description does not care about memory management.

As such, I was then saying that it's not an issue to remove incompatibilities in incompatibilities now, since they also live in the store in case we need them for error reporting.

Let me know if it's still not clear

@mpizenberg
Copy link
Member

mpizenberg commented Oct 7, 2020

I'll try to categorize the kind of incompatibility collapse that we can do safely.

New external incompatibility

When adding a new external incompatibility, there are plenty of cases where we can merge it directly to other already existing incompatibilities.

Adding a NoVersion(p)

That happens when pick_version() returns no suitable such version. The corresponding term defines a range in which no version exist. Then instead of adding it directly to incompatibilities, we could merge it with all incompats in incompatibilities referring to that same package p. We just loose the knowledge that no version exist in such range, but the algorithm will complete the same.

A slightly less aggressive collapsing strategy would be to keep no more than 1 NoVersion(p) per package p. And then just collapsing those together when adding such a new NoVersion(p), but not touching any of the other incompats.

Adding a UnavailableDependencies(p)

That happens when getDependencies() returns None. In that case, instead of directly adding this to the incompatibility set, we can do exactly the same optimization that for NoVersion(p) with the same tradeoffs.

Adding a FromDependency(p1, r1, p2, r2)

That happens when we are converting p1 dependencies at a given version v1 (r1 = Range::exact(v1)) into incompatibilities. If p1@v1 depends on p2 in r2 we create that incompatibility kind.

In the case where we decided previously that NoVersion and UnavailableDependencies kinds would be fully merged on insertion, it means that the only existing external incompats in incompatibilities are of the type FromDependency (plus of course the one NotRoot that is used to bootsrap the algorithm). In that case, we should look for another incompatibility of the type FromDependency(p1, r1', p2, r2). If there is one such, we can merge them into FromDependency(p1, r1 union r1', p2, r2).

New derived incompatibility

Those happens at every loop iteration of conflict resolution, with the call to prior_cause(). If previously we decided to completely collapse NoVersion and UnavailableDependencies, the only possible kinds for parents of a prior cause are FromDependency or DerivedFrom (plus again, the one NotRoot). In that case, since we already collapsed pairs of FromDependency, there is no "obvious" collapsing to do.

In the case where we previously did the "less aggressive" collapsing of external incompats, meaning we have maximum 1 NoVersion(p) and UnavailableDependencies(p) per package p, we could do some collapsing now. Such that NoVersion with DerivedFrom is collapsed into the DerivedFrom.

In the case where we didn't do any collapsing for new external incompatibilities, we could do the collapsing lazily here in prior_cause().

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 8, 2020

Thank you for the great write ups! This is a lot clearer now.

@mpizenberg
Copy link
Member

A note on previous write up on merging strategies. I was wrong saying that we can fully merge NoVersion and UnavailableDependencies incompats. If they are the first incompats for a given package, there is no other thing we can merge them with. In addition, the knowledge of no version existing could be important for future incompatibilities referring to that package. So we have to keep one of those NoVersion and UnavailableDependencies per package. Then decide what to do of those when building prior causes, merging or not with the other parent.

@mpizenberg
Copy link
Member

In addition to those "theoretical" improvements, once we have ways to easily generate many large registries producing plenty of backtracking, it would be nice to have statistics on resolution of those, like

  • Final size of incompatibilities
  • Final size of incompatibility_store -> this is probably THE memory hungry thing of this implementation
  • Number of duplicates in incompatibility_store -> NoVersion and FromDependency could very well be added multiple times I think
  • Duplicates in incompatibilities -> check that there are no more after implementing collapsing strategies

@mpizenberg
Copy link
Member

Just did a small analysis of the final incompatibilities for our current large_case benchmark. In the end, the incompatibility_store holds 1526 incompatibilities. Meanwhile, incompatibilities is of length 1482, so this means that very few intermediate steps are made in conflict resolution (44 to be exact).

From those 1482 incompatibilities, 529 are unique, so lots of duplicates in there. All duplicates are external incompatibilities, more precisely, all duplicates are from dependencies.

22 of the unique incompatibilities are derived, 507 are external. These are composed of the root incompatibility, 28 NoVersion incompatibilities (for 26 packages) and 478 from dependencies.

@mpizenberg
Copy link
Member

Some very basic meta analysis of those numbers:


22 of the unique incompatibilities are derived, 507 are external

This is a dependency heavy benchmark, with very few conflicts, so it is not designed to evaluate improvements in the conflict resolution part of the algorithm.

From those 1482 incompatibilities, 529 are unique, so lots of duplicates in there. All duplicates are external incompatibilities, more precisely, all duplicates are from dependencies.

Keeping an internal cache of get_dependencies already called for a given package/version would easily avoid inserting duplicates of incompatibilities.

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 9, 2020

So it sounds like there are big wins by just merging ones that are exactly the same, without doing anything "smartly". Or even better figuring out how not to generate the duplicate in the first place.
And you beat me to it :-)

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 17, 2020

This is functionality we do not yet have. Can we keep it open?

@mpizenberg
Copy link
Member

Ha, sorry yes

@Eh2406
Copy link
Member Author

Eh2406 commented Apr 15, 2021

For future reference, I think the relevant code in dart is _dependencyBounds in package_lister. Although it is not as sophisticated as what we have already described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants