develop a tool to find the maximal consistent set of hackage packages #493

Closed
bos opened this Issue May 24, 2012 · 4 comments

Projects

None yet

2 participants

@bos
bos commented May 24, 2012

(Imported from Trac #500, reported by @dcoutts on 2009-02-14)

To a great extent the point of libraries is code re-use. It follows that is it useful to be able to install and use two packages simultaneously. The extension of this is that we would like to know the maximal set of packages that we can install consistently. Them being consistent implies that we can use any of them in another package and have consistent dependencies.

If we knew a package was within the maximal subset then we can give it a positive mark on hackage. We may also be able to encourage package authors to update their dependencies to as to get their packages within this set.

Calculating the maximal consistent subset is certainly an NP-complete problem. It is at least as hard as working out if any single package is installable and that is already NP-complete. However we do not necessarily have to get an exact maximal set, approximation would probably do. We would also like to influence the choice of the set by the importance of packages, especially when it comes to resolving choices about divisive packages (see #499).

We would also like to influence the choices by manually supplied preferences.

This can probably be implemented as an extension of the standard package dependency resolution algorithm and using the code developed for ticket #499.

@bos
bos commented May 24, 2012

(Imported comment by @michiexile on 2009-02-14)

If we assume that we can have "installable with..." as a primitive operation (and package THAT instance of NP in there), then it should be easy to build a graph with vertices the packages and edges given by an edge package1 -- package2 precisely when there is some set of dependency fulfillment that is fulfilled by all of them.

We can label each edge with the dependency intervals under which it exists.

This graph can be extended to a simplicial complex by inserting a simplex (you can view it as a distinguished clique) whenever all the edges of it have a nontrivial intersection of their labels.

A dependency list, here, will end up being a sequence of intervals - one for each package in the dependency list. Simultaneous dependencies are given as intersections of these intervals; with an absent interval being replaced, when needed, by a full (i.e. [-∞,∞]) interval.

As a worked example, consider packages p1 through p5, and with the following dependencies:

p1: none

p2: 1 in [1.0,1.5]

p3: 1 in [1.0,1.0], 2 in [1.0,2.0]

p4: 1 in [1.5,1.5]

p5: 1 in [1.5,1.5], 4 in [1.0,2.0]

We find all edges between these packages:
12: 1 in [1.0,1.5]

13: 1 in [1.0,1.0], 2 in [1.0,2.0]

14: 1 in [1.5,1.5]

15: 1 in [1.5,1.5], 4 in [1.0,2.0]

23: 1 in [1.0,1.0], 2 in [1.0,2.0]

24: 1 in [1.5,1.5]

25: 1 in [1.5,1.5], 4 in [1.0,2.0]

34: conflicting dependencies for 1 - edge absent

35: conflicting dependencies for 1 - edge absent

45: 1 in [1.5,1.5], 4 in [1.0,2.0]

Now, we can look at 3-cliques in the resulting graph:
123: 1 in [1.0,1.0], 2 in [1.0,2.0]

124: 1 in [1.5,1.5]

125: 1 in [1.5,1.5], 4 in [1.0,2.0]

145: 1 in [1.5,1.5], 4 in [1.0,2.0]

245: 1 in [1.5,1.5], 4 in [1.0,2.0]

And 4-cliques - all of which should avoid any already disqualified edges:
1245: 1 in [1.5,1.5], 4 in [1.0,2.0]

and that's it for this example.

Now, there is a polynomial time clique-finding graph; and intersections of intervals is reasonably easy to do as well - so this looks like a possible way to structure something like this.

@bos
bos commented May 24, 2012

(Imported comment by dbueno on 2009-02-15)

This would fit in as a standard query if one had a pseudo-boolean based package dependency framework, like what I began to develop on my own branch of cabal-install. The idea is to really use a SAT solver (or pseudo-boolean solver if you want to solve optimisation problems like this one). I have that branch on my local machine; there are still some bugs and only SAT support -- but I would like to continue development. If anyone would like to help, even just to discuss how to encode the package constraints as a SAT problem, please mail me. My email is my trac username at gmail.com.

@bos
bos commented May 24, 2012

(Imported comment by @dcoutts on 2009-02-15)

Replying to @michiexile:

If we assume that we can have "installable with..." as a primitive operation (and package THAT instance of NP in there), then it should be easy to build a graph with vertices the packages and edges given by an edge package1 -- package2 precisely when there is some set of dependency fulfillment that is fulfilled by all of them.

Hmm, I'm not sure about this. Solutions do not combine easily here. Just because we can find a way to install package P1 and a way to install package P2 does not mean we can combine those solutions to make a solution for package P1 and P2. If we could then the problem would be polynomial.

Perhaps I don't understand the algorithm you're presenting but it looks like it could also be used to solve the normal problem of package planning in polynomial time. That's a bit suspicious because we know that problem is NP-complete.

Perhaps you can present the idea in more detail somewhere.

@tibbe
Member
tibbe commented May 15, 2014

Closing as there's been no activity in years.

We're cleaning up the bug tracker to make it useful again and are thus closing bugs that haven't seen any activity in a long time. Please re-open (or file a new bug) if the problem reappears.

@tibbe tibbe closed this May 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment