-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine the computation of atomic actions from a solver solution #5066
Conversation
This PR refines the computation of atomic actions (and more importantly, their dependencies) computed by opam from a solver solution. It improves it in the sense that it eliminates some cases where opam would complain that a solution is cyclic, even though it would be possible to execute it while preserving dependencies.
5128975
to
54a2507
Compare
What are the properties that we want to enforce on the selected installation order? I don't understand if we want:
One could imagine a package that depends on (B1 | B2), and checks during its installation the installation status of B1 or B2, and does something different when they are present or absent. (For example: the package |
Indeed, that's something I wondered as well, and I don't really know for sure.
|
What we are trying to do here is to determine a "dependency order" from two things: the original dependency formulas, and the satisfying assignment found by the solver. We need to be careful at whether the problem is well-posed:
For example, if |
Looking at an example. I started from the following disjunctions found in opam-repository: The CNF of this formula (note: reasoning modulo CNF conversion makes sense if we assume a "strongly robust" criterion) is as follows, with temporary variables
Let's assume we have a satisfying assignment that installs gtk and sourceview, and also installs gtk3 for unrelated reasons. So: x = gtk = sourceview = gtk3 = true, y = sourceview3 = false. How do we want to compute dependencies?
Clearly the clause
|
My gut feeling is that, as long as we cannot rule out the weird
|
I agree with the reasoning, though I don't know how you computed the CNF formula? I would have instead distributed & over | (IIRC that's what the cudf library does when normalizing this formula to CNF), giving:
I'm not sure how your reasoning works on this one? (though, note that my PR only changes something in the case where one of the packages is already installed in the universe before the query) |
Right. I didn't think that we needed to account for that, but I think I agree with you now. However, I do not agree that this is actually a cycle! For me, there being a cycle would mean that there is no way of co-installing the packages involved in the cycle. But e.g. for dune 2.9.1 it is possible: simply split the install query into two queries, e.g. installing everything but ocamlfind-secondary, then installing ocamlfind-secondary. This will rebuild dune as part of the second query. So my proposal would be to amend this PR to:
|
Ah! I thought vaguely about this but:
|
I would like
to do the same as
(where the latter works with current opam). Note that you write that ocamlfind-secondary depends on dune, but that's not the case: it depends on ocamlfind directly. |
Right, I mixed up the dependencies in my example... But I think I could fix it. Let's start again: A depends on (B|C), C depends on nothing, B depends on A. You suggest to build (C; A; B; A). But the build plan gives:
and at this point the version of B that we have is built against a stale version of A, and it should be rebuilt, and I think we have cycle. Your "real-world cycle" is too large, and the package names are too long, but my impression is that something is wrong in the picture:
The fact that (Granted, the idea that the build of B may be affected by the state of A, and thus depend on whether A itself saw (C) or (C,B) installed is kind of far-fetched. But it can happen in theory, right?) Am I missing something? |
@Armael thanks a lot for this PR! It seems to have some issues though:
|
I think you are right again. Consequently:
I'll thus move to close this PR. Not all hope is lost for the broader issue though, and the diagnostic of @kit-ty-kate in #4541 is correct: the culprit is |
Wow, I am impressed and this is very interesting! But it would probably remain sensible to exclude dependencies belonging to an unsatisfied conjunction (assuming DNF!!) from the action graph ? I don't think this would break any of our invariants 🤔 |
Related: #4541
Solves the issue specifically mentionned at #4541 (comment)
This PR refines the computation of atomic actions (and more
importantly, their dependencies) computed by opam from a solver
solution. It improves it in the sense that it eliminates some cases
where opam would complain that a solution is cyclic, even though it
would be possible to execute it while preserving dependencies.
Background
(this is my limited comprehension of what happens from reading the code, I'm
happy to be corrected!)
When opam invokes a solver with a query for installing new packages, the solver
eventually produces a new "universe" indicating which versions specifically of
which packages can be installed to satisfy the query.
In order to then actually install the packages, opam needs to perform a bit more
work: from knowing 1) the dependency constraints of all packages; and 2) the set
of packages that need to be installed, it needs to deduce in which order to
install the new packages so as to obey dependencies.
What opam does is to compute a graph of the relevant packages, where an edge "A
-> B" in the graph means that action/package A depends on B. It then checks that
this is an acyclic graph, and the install plan can be extracted e.g. by doing a
topological sort of the packages in the graph.
However, the issue is that this graph is an approximation.
Strictly speaking, a given opam package does not have a set of dependencies:
dependencies can be specified as general boolean formulas. Considering that
these formulas are put in CNF, this means that dependencies are of the form:
When considering that package A depends on a set of packages {B1, ..., Bn, C1
... Cm, ...}, this corresponds to approximating "A depends on B1 or B2" as
"A depends on B1 and B2".
Because of this approximation, it may happen that the resulting graph contains a
cycle even though it could be possible to install all the packages while
satisfying the dependencies (in their general form as boolean formulas).
This means that in full generality we should consider the hypergraph of
packages, where, for a package A, instead of edges A->B1, ..., A->Bn ("A depends
on B1" and ... and "A depends on Bn"), we have potato-shaped hyper-edges
A->B1...Bn, A->C1...Cm, etc ("A depends on B1 or ... or Bn" and "A depends on C1
or ... or Cm" and ...).
This gives us a representation that is faithful to the general form of
dependencies. However, it is now hard (NP complete, AFAICT) to find an acyclic
schedule in this hypergraph...
What this PR implements
This PR avoids solving the general problem of finding an acyclic schedule in the
general hypergraph of packages. Instead, it does compute the hypergraph (this is
basically trivial) but uses this representation to improve the approximation in
a simple case. For package A, where:
if there is a Bi that is already installed in the universe (and stays
installed), then we can consider the clause satisfied, and avoid depending on
the other Bis.
For instance, with #4541 (comment), if you put the dependencies of dune.2.9.1 in CNF, there are clauses of the form:
(ocaml >= 4.08 | ocamlfind-secondary)
before this PR, these would result on dune depending on ocamlfind-secondary (even in
a switch with ocaml >= 4.08), eventually resulting in a cycle. With this PR,
these do not result in a dependency, avoiding the cycle.
Of course, in principle there are still cases where opam will report a cycle
that does not exist; but I would suggest to see whether they actually come up in
practice before implementing a more complex approach.