Pip needs a dependency resolver #988

Open
cboylan opened this Issue Jun 11, 2013 · 74 comments

Projects

None yet
@cboylan
Contributor
cboylan commented Jun 11, 2013

pip's resolver logic is currently like so:

  1. for top-level requirements:

    a. only one spec allowed per project, regardless of conflicts or not. otherwise a "double requirement" exception
    b. they override sub-dependency requirements.

  2. for sub-dependencies

    a. "first found, wins" (where the order is breadth first)

While 1b is reasonable (and actually a feature of pip currently), 1a and 2a are not. pip should be attempting to resolve.

NOTE

If 2a is a problem for you now, there is a workaround . Specifically declare what you want the solution to be in requirements file, or as a top-level pip install argument, and that will be honored.

e.g. if you're installing myproject, but depproject is not ending up with the correctly resolved version, then specify what the correct answer should be like this:

pip install myproject depproject>=1.5,<2.0


edited by @qwcode

@dstufft
Member
dstufft commented Jun 11, 2013

I added this to 1.5 because I think it's a pretty big wart that should be sorted out.

@qwcode
Contributor
qwcode commented Jun 11, 2013

for now, the "pip solution" is to declare in your requirements file (or as an install argument directly) what specifier you want, and it overrides what happens down in the dependency tree.

i.e. in this case, put "foo>=1.0,<=2.0" in your requirements or add it as an install argument.

also, pip compiles everything it's going to install into a set first, then installs.
it's not installing everything as it's discovered, and it's discovery order is not depth first.
the order of discovery isn't the issue, but rather the lack of doing any resolution as it's compiling the set.
Currently, first found wins.

@qwcode
Contributor
qwcode commented Jun 11, 2013

I keep meaning to add a "conflict resolution" section to the Internal logic section to the docs that covers this.
practically speaking, pip's override logic is pretty nice IMO, and is all most people need most of the time.

@emonty
Contributor
emonty commented Aug 11, 2013

I believe, based on discussions with @dstufft, that this may be related to the problems people will have if they try to upgrade from old distribute to new setuptools in the same set of software as other things. Or, more specifically, if you depend on a piece of software that depends on distribute with an unbounded high end, and you are currently running on a pre-merge distribute, what will happen is that the dependency discovery will put distribute into the list along with the rest of the things that software A depends on, then it will add distribute's dependency, setuptools, to the list, later. THEN, when it's installing, distribute will be upgraded, which removes the distribute provided setuptools code, then the next thing in the list is installed, which is not setuptools, and which fails because setuptools is not installed yet.

@qwcode
Contributor
qwcode commented Aug 11, 2013

@emonty the distribute to setuptools upgrade problem you speak of is described in #1064, but also here http://www.pip-installer.org/en/latest/cookbook.html#importerror-no-module-named-setuptools

but this is not related to pip's conflict resolution shortcoming issues described in this issue.

@dracos
dracos commented Aug 29, 2013

Just had this issue installing a project - on a system that already had python-dateutil 1.4 installed - that depended upon python-dateutil (no specific version) and django-tastypie 0.9.16. Even though django-tastypie has a python-dateutil dependency of >=1.5 (and not 2.0), and the parent has no specific version dependency on python-dateutil, python-dateutil remained at 1.4. This seems pretty fundamentally against what a package installer should be doing in such a circumstance :)

@dstufft
Member
dstufft commented Aug 29, 2013

FWIW I've been experimenting with making a real resolver for pip.

@qwcode
Contributor
qwcode commented Aug 29, 2013

@dracos there is a "solution" right now (short of a new resolver for pip; although that would be nice). Specifically declare what you want the python-dateutil requirement to be in a requirements file, or as a top-level pip install argument, and that will be honored.

http://www.pip-installer.org/en/latest/cookbook.html#requirements-files
http://www.pip-installer.org/en/latest/logic.html#requirement-specifiers

pip install myproject datetutil>=1.5,<2.0

@dracos
dracos commented Aug 29, 2013

@qwcode Thanks, I know that and will have to do so - but this means anyone working on the project will have to manually check all dependencies. Say someone upgrades django-tastypie and it now requires a later version of python-dateutil, there's no way to detect this besides manual checking of each dependency upgrade/install. Oh well.

@qwcode
Contributor
qwcode commented Aug 29, 2013

@dracos understood. it's a pain point. but just want others who find this, to at least know, there is some kind of solution.

@qwcode
Contributor
qwcode commented Aug 29, 2013

@dstufft as you work on a new resolver, keep in mind that top level requirements (i.e. pip install arguments are requirements file entries) will still have to be considered overrides (or dominant). that's a pip feature. Also, to state the obvious, this change will be "backwards incompatible" in the sense that many installations will turn out differently.

@dracos
dracos commented Aug 31, 2013

I've made a script (that is basically a patch to pip's prepare_files(), but I didn't want to patch pip, I can't control pip everywhere I use it for example) - available at https://github.com/dracos/check-pip-dependencies - that notes multiple dependency requests for the same package and outputs a list of conflicts, that you can then manually resolve using the methods suggested above.

@benoitbryon

This ticket looks like #174.

@y-p
Contributor
y-p commented Dec 19, 2013

The pydata people at continuum analytics have built a packaging toolchain
more suited to scientific packaging (That's good, it can get funky).
Facing the same issues, they've implemented a dependency resolver using a SAT solver,
see here. The paper mentioned there is readable and they've open sourced the wrapper package
around the (open as well) SAT solver engine.

Basically, it's all there.

@Ivoz
Member
Ivoz commented Feb 18, 2014

@y-p the only problem with that is that it relies on C code, rather than pure python. I presume that it would notionally be unacceptable to expect that everyone installing pip have a C compiler easily available on their system; thus you would have to provide a compiled pip for every system Python hopes to run on. That is a major ask (anyone compiling and distributing for python arm?).

It might be "all there", but it's in an unusable form for pip's general audience.

@merwok
merwok commented Feb 18, 2014

MIT-licensed pure Python: https://github.com/ActiveState/depgraph
(I just know of it, never tested it on the field)

@Ivoz
Member
Ivoz commented Mar 4, 2014

0install used to be based on python (now on OCaml), and also wrote a pure python solver, which can be found in their 1.16 release (also in sat.py); afaik it was py23 compatible.

Also some good notes on sat solving on their page, and an awesome email.

minisat paper; GRASP paper

@Wilfred
Contributor
Wilfred commented Mar 14, 2014

pkglib has recently released a pure Python dependency solver, FWIW: https://github.com/ahlmss/pkglib/blob/master/pkglib/pkglib/setuptools/dependency.py#L596

@dstufft
Member
dstufft commented Mar 14, 2014

Thanks, I'll take a look at it when I take a look at the enthought one as well.

@dstufft dstufft removed this from the 1.6 milestone Jun 16, 2014
@pfmoore
Member
pfmoore commented Jun 30, 2014

I've been reading some of the references. One thing I don't see in any of them is how version dependencies are managed - specifically, if package A depends on B (version >=2.0) how is that encoded. Is each version of a package treated as an entirely separate object for resolution purposes? Does anyone have any pointers to references on how this is handled?

@dstufft
Member
dstufft commented Jun 30, 2014

So for a raw SAT solver yes. Basically a SAT solver lets you solve a boolean equation, ideally efficiently.

So if you have foo>=2.0 you'd look and see that foo has 1.0, 2.0, 2.1, 2.2, and 2.3, and you'd translate that into an equation like foo_2_0 or foo_2_1, or foo_2_2 or foo_2_3 (It's actually more complicated than that, because it also needs an equation to ensure that only 1 foo_x_y can be true at a time). The SAT solver will then spit back at you which set of variables should be True/False to satisfy the equation.

The most basic of SAT solver can simply set all variables to False, and then randomly try setting a single variable to True (and recording which it's already tried) to bruteforce the equation. Speed ups occur when you add other things on top of that such as backtracking (instead of starting over from the beginning you undo the last choice and try a different choice), simplifying (Finding subsets of the problem that are the same and collapsing them into themselves), as well as other techniques.

You can also be smarter about how you choose which variables to try to set to True. In the case above we probably don't want to randomly select one because if the only version specifier is foo>=2.0 then we'll end up with a random foo each time, but instead we'd want to try the highest version of foo first, and then go backwards, or even better, try the currently installed version of foo first, and then resort to trying the highest versions.

Basically all the SAT solver papers are just defining better ways at guessing which variables and computing/storing the results to speed up what is essentially a brute force problem.

@Ivoz
Member
Ivoz commented Jun 30, 2014

@pfmoore in addition to dstufft's reply, I would guess that reading my previous link http://0install.net/solver.html would tell you how this is done, it's a great article.

@pfmoore
Member
pfmoore commented Jun 30, 2014

@Ivoz thanks I'd missed that one

@openstack-gerrit openstack-gerrit pushed a commit to openstack/oslo.db that referenced this issue Jul 6, 2014
@zzzeek zzzeek Test for distinct SQLAlchemy major releases
This change presents one way we might include test support
for oslo.db against specific SQLAlchemy major releases, currently
including the 0.7, 0.8, and 0.9 series.  As we will want to
begin including features within oslo.db that target advanced
and in some cases semi-public APIs within SQLAlchemy, it will
be important that we test these features against each major release,
as there may be variances between major revs as well as
version-specific approaches within oslo.

To accomplish this, I was not able to override "deps" alone,
as the SQLAlchemy revision within requirements.txt conflicts
with a hand-entered requirement, and due to pip's lack of
a dependency resolver (see pypa/pip#988
and pypa/pip#56) I instead overrode
"commands".  I don't know that this is the best approach, nor
do I know how the tox.ini file is accommodated by CI servers,
if these CI servers would need their tox invocation altered or
how that works.

This patch may or may not be the way to go, but in any case
I'd like to get input on how we can ensure that more SQLAlchemy-specific
oslo.db features can be tested against multiple SQLAlchemy versions.
Note that even with this change, running the "sqla_07" environment
does in fact produce test failures, see http://paste.openstack.org/show/85263/;
so already oslo.db expects behaviors that are not present in
all SQLAlchemy versions listed in the common requirements.txt.

Change-Id: I4128272ce15b9e576d7b97b1adab4d5027108c7c
a1fd49f
@openstack-gerrit openstack-gerrit added a commit to openstack/openstack that referenced this issue Jul 6, 2014
@zzzeek @openstack-gerrit zzzeek + openstack-gerrit Updated openstack/openstack
Project: openstack/oslo.db  a1fd49fd9b726017de02856ab0e0dfe3751e2394

Test for distinct SQLAlchemy major releases

This change presents one way we might include test support
for oslo.db against specific SQLAlchemy major releases, currently
including the 0.7, 0.8, and 0.9 series.  As we will want to
begin including features within oslo.db that target advanced
and in some cases semi-public APIs within SQLAlchemy, it will
be important that we test these features against each major release,
as there may be variances between major revs as well as
version-specific approaches within oslo.

To accomplish this, I was not able to override "deps" alone,
as the SQLAlchemy revision within requirements.txt conflicts
with a hand-entered requirement, and due to pip's lack of
a dependency resolver (see pypa/pip#988
and pypa/pip#56) I instead overrode
"commands".  I don't know that this is the best approach, nor
do I know how the tox.ini file is accommodated by CI servers,
if these CI servers would need their tox invocation altered or
how that works.

This patch may or may not be the way to go, but in any case
I'd like to get input on how we can ensure that more SQLAlchemy-specific
oslo.db features can be tested against multiple SQLAlchemy versions.
Note that even with this change, running the "sqla_07" environment
does in fact produce test failures, see http://paste.openstack.org/show/85263/;
so already oslo.db expects behaviors that are not present in
all SQLAlchemy versions listed in the common requirements.txt.

Change-Id: I4128272ce15b9e576d7b97b1adab4d5027108c7c
f04960a
@piotr-dobrogost

https://github.com/nvie/pip-tools project has pip-compile command dealing with resolution of dependencies. At http://nvie.com/posts/better-package-management/ there's this statement:

We’ve created pip-compile to be smart with respect to resolving complex dependency trees

Might be worth to find out what algorithm is used there.

@qwcode qwcode changed the title from Pip needs a real dependency resolver to Pip needs a dependency resolver Feb 3, 2015
@nils-werner

Just for sake of completeness, see the enthought/depsolver project.

@JustinCappos

tl;dr: Automatic resolution of conflicting dependencies has two solutions which differ slightly in usability and performance. However, it makes sense to look and see if just doing simple dependency resolution is good enough.

Full post:

(I am attempting to summarize much of the discussion on the distutils mailing list with subject: "name of the dependency problem")

The way that I see it, there are three options: simple dependency resolution, backtracking, or SAT solving.

Most Linux package managers do not do backtracking or SAT solving. They simply perform dependency resolution and, in the case of an error, will print a message and halt. I believe they make this design choice due to two factors. First, the occurrence of conflicts is very low, so the added complexity isn't worth the headache. Second, oftentimes the automated handling of conflict may mask the fact that there is actually some underlying problem or issue with the underlying dependency specifications. In other words, when these dependency issues occur it is often because the existing dependency specifications are incomplete or inaccurate. This is a situation that the developers want to be aware of rather than having the issue solved for them in a potentially incorrect manner.

In situations where it is useful to have an automated way to resolve dependencies, backtracking and SAT solving are the two options I am aware of.

Backtracking dependency resolution, basically boils down to saving the state of the dependency resolver whenever you have multiple choices to resolve a dependency. Then if you reach a later point where there is a conflict, you can backtrack to the point where you made a choice and see if another option would resolve the conflict.

The other option is to turn the dependency resolution problem into a boolean equation and use a SAT solver to find a solution. The SAT solver acts like a black-box that finds a matching set of packages, possibly subject to some sub-constraint (such as installing the minimum number of packages).

Given these two candidate solutions, it is worth understanding their pros and cons.

[Speed]: If the number of conflicting constraints is zero or small, then the algorithms will have similar speed. If the number of conflicting constraints is very large, then a SAT solver will almost certainly be faster than backtracking. I do not honestly know where the cut-off would be, but would imagine ~3-5 conflicting delegations in a package would be where backtracking would start to be noticeably slower than a SAT solver. In the worst case, both the SAT solver and backtracking techniques will be exponential in their input because the problem is NP hard.

[Predictability]: One important issue for many users and developers is to understand why a set of dependencies were chosen. Backtracking dependency resolution algorithms will choose the most recent package version and, in cases where there are options about what to install, will prefer resolving a dependency using the order in which packages are listed when the dependency is specified. Thus the developer has a reasonable measure of control over how dependencies are resolved and it is easier for users to understand why dependencies are chosen.

The SAT solver finds a solution that resolves dependencies using in-built heuristics. This means that developers specify dependencies, but the resolver decides the choices to make to resolve them. In some cases, this may find solutions that can be counter-intuitive. (I will use a near-trivial example to try to make this as clear as possible. Real world examples are likely to be much larger and more complex.) For example, suppose that package A has a dependency that can be fulfilled by B versions 1.0 to 2.0 and a dependency on package C version 1.0 (which B 2.0 also depends on). Suppose that two systems install A and that those systems are identical except that one system already had C 1.0 installed and the other installed it as part of dependency resolution for A. It is possible that those systems would end up with different versions of B installed. This is because the SAT solver made different choices about what was optimal based upon the difference in initial state and already installed packages. This can be confusing to users because both systems end up with A and C installed, but have a different version of B.

So, to summarize, there are reasons to potentially choose either type of automatic dependency conflict resolution. Despite (re?)inventing backtracking dependency resolution when building Stork, I really do not have a strong opinion on what should be used. Both algorithms behave in a similar manner when there are no conflicts (the normal case). If there are many conflicts, SAT solvers are faster, but backtracking makes it easier for the developer to control how dependencies are resolved and makes the resolution process easier for the user to understand. There is no wrong answer here and the distinctions between them are not major.

The value of automatic dependency conflict resolution depends on how often it is needed and how hard dependencies are to resolve other ways. Detection could be performed at upload time for packages hosted on Warehouse, to warn the developer of the situations where their users will see conflicts. (This could be a feature regardless of whether a SAT solver, backtracking, or just simple dependency resolution is adopted.)

It is plausible that simply doing simple dependency resolution and raising an error when a conflict occurs (i.e. not choosing either scheme) could also be the most desirable outcome. Perhaps more data and discussion would be useful before deciding on a course of action?

@guettli
guettli commented Apr 17, 2015

Maybe I am missing something. Tell me if I am wrong.

Before the decision how to do the dependency resolution, the data structures and the transfer of them should be thought about.

How does a dependency resolution algorithm get the needed data?

I look at the warehouse code. Things get much more complicated since the dependency has an attribute "kind".

Related: pypa/warehouse#491 (comment)

How can a package depend on "foo installed with the 'build' kind of dependency"?

I think we should focus on data structures, their relations and the transfer of them. Algorithm can be very easy if the basics are done.

@qwcode
Contributor
qwcode commented Apr 18, 2015

@JustinCappos to be clear, can you explain "simple dependency resolution"

@qwcode
Contributor
qwcode commented Apr 18, 2015

short of a SAT solver or backtracking, it seems pip could improve itself significantly by simply trying to combine requirements (and failing if they can't be combined) as it parses them (in the current algorithm) vs just accepting the first one.

@JustinCappos

What I'm calling simple dependency resolution is to just greedily take the
first matching thing that fulfills the requirements. Usually, this
involves prioritizing the highest versioned package first.

For example, suppose that you have the following packages w/ dependencies:

foo-2.0 -- depends on bar
foo-1.0
bar-1.1
bar-1.0

If you ask to install foo, the dependency resolver will decide you likely
want foo-2.0 (the latest version). When reading the metadata for foo-2.0,
the dependency resolver will see the dependency on bar and resolve this
with bar-1.1 (the latest) and then select this for installation it as well.

However, there may be cases where this will fail to install a package when
there are conflicting delegations. Suppose we tweak the example above to
look like this:

foo-2.0 -- depends on bar and zap-1.0
foo-1.0
bar-1.1 -- depends on zap-2.0
bar-1.0
zap-2.0
zap-1.0

Simple dependency resolution will go through the same steps to choose
foo-2.0 and bar-1.1 as before. However, following this it will choose
zap-1.0 (to satisfy bar-1.1) and zap-2.0 (to satisfy foo-2.0). The intent
to install different versions of the same package is a conflict and, in the
case of what I am calling simple dependency resolution, the system will
print an error and terminate.

Note that this need not occur when two packages both specify a version
number. It is enough that a single package does so. Suppose that
dependency resolution is done depth first (so that bar-1.1's dependencies
are resolved before the second dependency of foo-2.0). If that dependency
of bar-1.1 simply specifies zap, then zap-2.0 will be chosen by the simple
dependency resolver. This will also cause some resolvers to declare an
error and exit.

On Sat, Apr 18, 2015 at 5:35 PM, Marcus Smith notifications@github.com
wrote:

@JustinCappos https://github.com/JustinCappos to be clear, can you
explain "simple dependency resolution"


Reply to this email directly or view it on GitHub
#988 (comment).

@guettli guettli pushed a commit to guettli/virtualenv-for-everybody that referenced this issue Apr 20, 2015
Thomas Güttler New paragraph: pypa/pip#988 dee6805
@qwcode
Contributor
qwcode commented Apr 22, 2015

to be clear, I've seen 2 uses of "back-tracking" in the discussions.

the first, is what @JustinCappos mentions, which is back-tracking to try alternative versions when a conflict is discovered.

the second, is simply backtracking to re-walk dependencies, as additional constraints are discovered. For example, if you start with "A>=1", but then later discover the constraint "A<2", you may need to re-walk the dependencies of A, if the new constraint resulted in a new version of A.

the second type of "back-tracking" would be used in "simple dependency resolution", where you're still only reporting conflicts and not attempting the first type of "back-tracking" to fix the conflict.

@JustinCappos

I'm not sure that these situations are completely disjoint. As I see it,
the constraints depend on a specific version of a package because they come
from that package's metadata. For example, it is B-1.1 that depends on
A>=1, not B because B-1.0 may have no dependency on A at all.

On Tue, Apr 21, 2015 at 10:28 PM, Marcus Smith notifications@github.com
wrote:

to be clear, I've seen 2 uses of "back-tracking" in the discussions.

the first, is what @JustinCappos https://github.com/JustinCappos
mentions, which is back-tracking to try alternative versions when a
conflict is discovered.

the second, is simply backtracking to re-walk dependencies, as additional
constraints are discovered. For example, if you start with "A>=1", but then
later discover the constraint "A<2", you may need to re-walk the
dependencies of A, if the new constraint resulted in a new version of A.

the second type of "back-tracking" would be used in "simple dependency
resolution", where you're still only reporting conflicts and not attempting
the first type of "back-tracking" to fix the conflict.


Reply to this email directly or view it on GitHub
#988 (comment).

@qwcode
Contributor
qwcode commented Apr 22, 2015

not saying their disjoint, just that "backtracking" (in the 2nd sense) can occur in a "simple dependency resolver". In both cases, it's fundamentally the same process, but just for different purposes.

@dstufft
Member
dstufft commented Apr 22, 2015

https://github.com/CocoaPods/Molinillo is something (written in Ruby) that perhaps we could look at for inspiration as well.

@Ivoz
Member
Ivoz commented Apr 29, 2015

@JustinCappos if you read again 0install's summary I believe it demonstrates an easy way to solve the "get confusing/random choices out of a SAT solver" problem.

@JustinCappos

Thanks for sending this. I'd seen the unoptimized version before from the
OPIUM paper with the version number constraints, but hadn't seen this
tweak. (An aside: it is interesting to me is that the rationale behind the
tweak mostly seems to be to improve the performance, but that it had a side
effect where it made the package selection behavior more predictable to the
end user.)

To summarize, the modified algorithm works as follows:

Take the package you want to resolve and find the highest version number
you can resolve using the SAT solver. Choose that package. Take the first
dependency of that package and assume the package you already selected must
be installed and find the highest version number you can resolve using the
SAT solver. Choose that package. (Continue this process recursively in a
depth first manner, solving for all packages.)

What is interesting to me here is that the 0install folks came up with a
very similar solution to backtracking dependency resolution (which uses
recursion and performs package selection in exactly the same manner). The
difference is that the backtracking dependency resolution algorithm
determines that packages cannot be resolved by trying candidates itself
(and then backtracking), while the 0install solution leverages a SAT solver
to find that candidates will not be resolvable (and thus avoiding them).
As I understand it, both algorithms will come up with the exact same
solution in every case.

So, with that in mind, I suppose it is worth looking at other factors.
Performance and maintainability spring to mind. It would not surprise me
to learn that the SAT solver optimization is faster at pruning than
backtracking dependency resolution. I don't know if pip already requires a
SAT solver. Also, I don't know what performance is likely / acceptable for
either solution.

I have a student, Bing who is planning to look into the dependency
resolution options over the summer. We can look at 0install's
implementation as well and provide some numbers to help people to compare.

Does anyone still want to advocate for the OPIUM-style SAT solving? From
the 0install post, it seems likely to be both slower and less
understandable than what they implemented. So I would think that it would
not be likely to be selected for inclusion. (If there is little interest
in it, we will prioritize results for the other algorithms instead.)

On Wed, Apr 29, 2015 at 3:04 AM, Matt Iversen notifications@github.com
wrote:

@JustinCappos https://github.com/JustinCappos if you read again 0install's
summary http://0install.net/solver.html I believe it demonstrates an
easy way to solve the "get confusing/random choices out of a SAT solver"
problem.


Reply to this email directly or view it on GitHub
#988 (comment).

@dstufft
Member
dstufft commented Apr 29, 2015

Honestly, I don't much care what the actual solution is as long as it will give us a predictable and "ideal" (where ideal == the latest acceptable version that matches all constraints) set of things to install. A constraint we have is that we won't know the full list of dependencies up front, ideally in the future more and more of them will be available via an API provided by Warehouse, but for a long time the vast bulk are going to require downloading the files from PyPI and inspecting them (possibly even executing some Python) to determine what the dependencies are.

This says to me that whatever system we use needs to be able to handle incrementally adding new constraints into it and ideally it should attempt to minimize "wrong" guesses because each attempt is high cost for us. This sounds a lot more like the backtracking solution that @JustinCappos mentioned than a traditional SAT solver, but this is really an area I don't have a ton of foundational knowledge in so I'm personally figuring out the "right" answers as I go along.

@guettli
guettli commented Apr 30, 2015

Me too. I think the resolver algorithm should be replaceable. For the first step a simple one is enough. Later, the details could be improved. Optimize later.

For me the big question is: where does the resolving happen? At the client or at the server?

And the next question, nobody could answer me up to now: How to handle the "kind" attribute of the dependcy. Related db schema of warehouse: https://github.com/pypa/warehouse/blob/54c67bab2b150acad1267d4d0a4b32426adaa62d/warehouse/legacy/tables.py#L471-L485

I think it would be good to keep it simple: remove the "kind" attribute. If a package "foo" has several kinds,create N packages from one git/hg repo: "foo-dev" "foo-with-fancy-feature" ...

Do RPM/DPKG have dependecies which has a "kind" attribute?

Somehow related: The open street map database schema evolved during the first years a lot. Guess what was done? It was simplified: Less tables, less attributes. Special cases were migrated to a simple and common graph structures.

@dstufft
Member
dstufft commented Apr 30, 2015

Resolving will happen client side, and the KIND is one of:

  • requires
  • provides
  • obsoletes
  • requires-dist
  • provides-dist
  • obsoletes-dist
  • requires-external
  • project-url.

A lot of these don't make sense or they only make sense in a legacy sense, in reality you're only really dealing with something that says it requires certain things, something that says it provides a certain package, and something that says it obsoletes something.

@rbtcollins
Contributor

Ok so, this issue has gotten long and a bit academic. The tl;dr in my opinion is:

  • erroring and stopping isn't very useful.
  • just unioning can only handle the most simple of cases, and certainly none of the ones the projects i'm involved with which care about this care about.
  • I don't care about the implementation approach. backtracking, formulate as SAT, random walk, simulated annealing... meh. We're unlikely to get it right first time, because PyPI is so many orders of magnitude harder than distro packaging resolution, its simply not at all funny.
  • we need something inside pip, because this has to kick in anytime conflicts exist with installed requirements or even potentially user pre-stated constraints (there are issues for these, don't have them offhand sorry).
  • I've written a backtracking resolver inside pip and am in the final stages of coercing the surrounding refactorings I did to let all the existing tests pass, which will then lead into making the code layout and structure actually be pleasant (its not too bad right now but thats only relative to what came before....)

Thoughtful comments and kibbitzing on my PR (or an alternative PR that shows a better way) appreciated and solicited. Even just testing it out to see if it eats your dog would be helpful to me. Thanks!

@qwcode
Contributor
qwcode commented May 14, 2015

erroring and stopping isn't very useful.

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

just unioning can only handle the most simple of cases

a lot of the cases I've seen in the pip tracker would have been solved with a simple AND of the additional constraints. so I understand, can you explain the cases you care about?

@rbtcollins
Contributor

Resolving << erroring << today, agreed. But since I have a working resolver branch, I think debate is largely academic and uninteresting at this point. Unless someone wants to content that erroring << resolving << today.

Unioning fails on transitive situations.

Simple cases like the following work with unioning: A depends on B and C >=1.0. B latest release depends on C >= 1.2. Today you get 1.0 and its all broken.

Unioning fails on situations like this: A depends on B and C >=1.0, <2.0 (e.g. ~= 1.0). B latest release depends on C>=2.0. B's prior release depends on C with some specifier that permits a C <2.0.

@glyph
glyph commented May 14, 2015

@rbtcollins - I am a little confused by the direction of those less-than signs. It reads like you're saying that resolving is worse than erroring is worse than what we have today, but then everything else sounds like you're saying the opposite of that? Would you mind clarifying just a bit?

@rbtcollins
Contributor

Read as evolution not less than. ETIRED, not being all that clear :)

@parkouss parkouss referenced this issue in mozilla/mozilla_ci_tools May 14, 2015
Closed

Do not fix libraries versions in the setup.py #188

@qwcode
Contributor
qwcode commented May 14, 2015

A depends on B and C >=1.0, <2.0 (e.g. ~= 1.0).
B latest release depends on C>=2.0.
B's prior release depends on C with some specifier that permits a C <2.0

I wouldn't describe this as "Unioning fails".
Unioning, as a thing, is not the problem.
The "failure" is that a simple single-pass approach can result in a conflicting union.
If using backtracking, it requires more passes to find a good union.

@JustinCappos

I'm not sure that I would say that resolving is better than erroring is
better than today.

A different way to look at "erroring" versus "resolving" is to consider
what a conflict is. It is likely that a developer was confused when
packaging software possibly because different developers did not
synchronize their dependency requirements. They certainly didn't test the
current combination of software described by the requirements on the
repository. As a result, at best, the requirements as specified are
untested, if not outright incorrect.

Now, the question is, what should be done about this issue? Many (most?)
Linux distributions choose the "error" approach because they want a
developer to be aware of the issue, look at it, and fix it. The
alternative "resolve" solution tries to have the package manager decide
what to do without user or developer interaction.

I'm not sure whether erroring is better or worse than resolving. To try to
answer this question a student of mine is starting to look quantitatively
at the problem and the steps that may make sense moving forward. We're
very interested in checking out your PR and seeing how your solution will
compare with others!

Thanks,
Justin

On Thu, May 14, 2015 at 3:56 AM, rbtcollins notifications@github.com
wrote:

Resolving << erroring << today, agreed. But since I have a working
resolver branch, I think debate is largely academic and uninteresting at
this point. Unless someone wants to content that erroring << resolving <<
today.

Unioning fails on transitive situations.

Simple cases like the following work with unioning: A depends on B and C

=1.0. B latest release depends on C >= 1.2. Today you get 1.0 and its all
broken.

Unioning fails on situations like this: A depends on B and C >=1.0, <2.0
(e.g. ~= 1.0). B latest release depends on C>=2.0. B's prior release
depends on C with some specifier that permits a C <2.0.


Reply to this email directly or view it on GitHub
#988 (comment).

@remram44
Contributor

Interesting resource: https://people.debian.org/~dburrows/model.pdf

Implementing the full backtracking algorithm shouldn't be that much work, this is a solved problem anyway; people implement much more complex algorithms in Python everyday. However I don't know how much metadata pip can get at this time. If getting the dependencies for a version of package B involves downloading the tarball and running the setup.py, backtracking to find a good solution suddenly becomes really slow.

On another point, if we're discussing doing this right, we probably need "conflict" specifiers as different libraries might provide the same interface (i.e. same top-level package name). This doesn't really add difficulty to the resolution algorithm, but packages would need a way to specify this.

Of course the pip/setuptools/distutils situation is not helping 😢

@rbtcollins
Contributor

@JustinCappos w.r.t. distro packaging. apt-using distros (Debian, and Ubuntu etc) don't error - they resolve. I don't know what metric you're using for 'most' - # of distinct packaging systems, # of product OS's using a given system (so weighted by vendors) or # of users (so weighted by market success) - but I'm pretty sure apt has a massive dominance if you weight by users, and is still very large weighted by vendors.

@remram44 - Daniels paper is less relevant to pip than it may seem. Certainly the translation to SAT is in common, but distros aggressively trim the space - there's typically at most 5 versions of any one package being chosen amongst (missing, last-release-of-distro, last-release-of-distro-security, new-release, new-release-of-distro) - and only 3 versions for installation runs being done within a release. Whereas we have dozens or hundreds of versions. Not having conflicts is an advantage in many ways, though I do agree way may need them. We probably want to be a bit conservative about that though. My branch implements full backtracking with pseudo random probing (which IIRC has been shown to support getting out of bad brute-force scans efficiently), and linear best-first probing of a given dep. I'll be super enthusiastic to see folk putting forward different resolution algorithms: the resolver is 100 lines of code: the rest of my branch is refactoring to deal with the reality of having numerous versions which we need to cache on disk while we're choosing (because we can revisit a given version many times - hundreds or thousands easily).

@grahamc
grahamc commented Jun 3, 2015

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

+1, I can't count the number of times invalid dependencies have been put together because an invalid situation was ignored.

@cazino cazino added a commit to scality/ScalitySproxydSwift that referenced this issue Jul 10, 2015
@cazino cazino Can't have mock dependency both in test-requirements.txt and tox.ini,
otherwise pip crashes with 'Double requirement given' error.
That might be fixed in a future version of pip :
see pypa/pip#56
and pypa/pip#988
d7fd934
@cazino cazino added a commit to scality/ScalitySproxydSwift that referenced this issue Jul 10, 2015
@cazino cazino Packages update and install build-essential on Ubuntu
Add swit-2.3.0 target

Avoid to install anything except flake8 which is the only thing needed in the pep8 env
(this is how Swift does it : https://github.com/openstack/swift/blob/49e7c25876cf4d3bc9412443f881e8472e88e827/tox.ini)

Mock 1.1 (latest version as the time of writing) dropped py26 compatibility.
Can't have mock dependency both in test-requirements.txt and tox.ini,
otherwise pip crashes with 'Double requirement given' error.
That might be fixed in a future version of pip :
see pypa/pip#56
and pypa/pip#988
b8f4027
@SMesser
SMesser commented Sep 3, 2015

In the cases where a "fail-fast" approach is taken, there should be more detailed messages about the conflicting requirements. Such messages should list the conflicting requirements and possibly suggest workarounds to whatever extent possible. Current messages list the package whose installation failed, and only one of the requirements, e.g. "pkg_resources.ContextualVersionConflict: (requests 2.5.1 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('requests<2.5.0,>=2.2.1'), set(['docker-py']))".

Clearly docker-py is involved, but the other, conflicting packages are not shown. As a developer, my reaction to this message is "Okay, I understand requests 2.5.1 is what is currently installed, and I understand that docker-py doesn't like requests 2.5.1, but I'm installing docker-py. Why won't pip install what it needs?"

@srkunze
srkunze commented Oct 14, 2015

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

+1, I can't count the number of times invalid dependencies have been put together because an invalid situation was ignored.

I agree. Errors should never pass silently.

We would love a solution to this issue: also cf. #3183

@xyzza
xyzza commented Nov 16, 2015

Fellows, may we use the NuGet v3 dependency resolution approach? It's better than nothing, isn't it?
It's seems to be viable..

@Juanlu001

I think this is what the conda guys use: https://github.com/ContinuumIO/pycosat

@Ivoz
Member
Ivoz commented Feb 2, 2016

@Juanlu001 unfortunately it's not pure python

@erikrose erikrose added a commit to erikrose/letsencrypt that referenced this issue Mar 3, 2016
@erikrose erikrose Require setuptools>=1.0 in all our packages.
When pip-installing any of our packages, pip hits our permissive, any-version "setuptools" dependency first and then ignores all subsequent, more constrained ones, like cryptography's "setuptools>=1.0". See pypa/pip#988. It thus, on a box with setuptools 0.9.8, sticks with that version. Then, at runtime, letsencrypt crashes because pkg_resources can't satisfy cryptography's setuptools>=1.0 requirement.

This change lets us pip-install our packages and have it work. We'll need to make sure our direct requirements (all of them) satisfy the more constrained requirements of our dependencies. Yes, it is disgusting.
6d8cd24
@erikrose erikrose added a commit to erikrose/letsencrypt that referenced this issue Mar 3, 2016
@erikrose erikrose Require setuptools>=1.0 in all packages that use the cryptography lib.
When pip-installing any of these packages, pip hit our permissive, any-version "setuptools" dependency first and then ignored all subsequent, more constrained ones, like cryptography's "setuptools>=1.0". See pypa/pip#988. It thus, on a box with setuptools 0.9.8, stuck with that version. Then, at runtime, letsencrypt crashed because pkg_resources couldn't satisfy cryptography's setuptools>=1.0 requirement.

This change lets us pip-install our packages and have it work. We'll need to make sure our direct requirements (all of them) satisfy the more constrained requirements of our dependencies. Yes, it is disgusting.
55b63fc
@DemiMarie

One issue is that any pure-python solution will be slow, except perhaps if we are running on PyPy.

However, it seems to me that the better solution is to allow the same package to have two different versions installed, eliminating conflicts.

@FichteFoll

allow the same package to have two different versions installed, eliminating conflicts.

Due to how importing in Python works, I believe this to actually be possible by providing a dynamic Loader for sys.meta_path, which checks for the package's specified dependencies in order to load the correct version, and enabling that for all modules/packages installed via pip. I do not know how one would accomplish a more or less semi-permanent modification of sys.meta_path however.

@rbtcollins
Contributor

@drbo the complexity is O(n^e) - so exponential in the size of the graph: CPython performance vs pypy vs C is linear. Further the constant cost of examining a package is high (network transfer of the archive), so yeah - Python performance is insignificant.

@Ivoz
Member
Ivoz commented Apr 7, 2016

However, it seems to me that the better solution is to allow the same package to have two different versions installed, eliminating conflicts.

Not great.

What happens when liba v1.2 emits a dictionary with 3 fields, but libg requires v1.4 of liba which has added an extra field in the dictionary? If these are all allowed to mix in the same runtime, then generally a developer will bump into cryptic errors after doing a normal update of dependencies. Telling them you just won't run two different versions of the same library at the same time eliminates this class of runtime bugs.

@RonnyPfannschmidt
Contributor

also python does not support importing more than one version of the same code

@remram44
Contributor
remram44 commented Apr 7, 2016

What happens when liba v1.2 emits a dictionary with 3 fields, but libg requires v1.4 of liba which has added an extra field in the dictionary?

The approach here wouldn't be to have each library have a private version of its dependencies, and bring the kind of madness only the Node.js world could tolerate, but to do dependency resolution out of the installed package versions when a program is started. I believe this is the setuptools way, this is why the setuptools-generated entry scripts look like this:

#!python
# EASY-INSTALL-ENTRY-SCRIPT: 'reprounzip==1.0.4','console_scripts','reprounzip'
__requires__ = 'reprounzip==1.0.4'
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.exit(
        load_entry_point('reprounzip==1.0.4', 'console_scripts', 'reprounzip')()
    )

The point of pkg_resources was always to do all kinds of path/packages manipulation but most of that was never widely used.

@DemiMarie

One approach would be to isolate multiple versions from each other in the
same process, but mutable globals keep that from working in Python. It
might work in Haskell or Rust, but not in Python.
On Apr 7, 2016 5:02 AM, "Matt Iversen" notifications@github.com wrote:

However, it seems to me that the better solution is to allow the same
package to have two different versions installed, eliminating conflicts.

Not great.

What happens when liba v1.2 emits a dictionary with 3 fields, but libg
requires v1.4 of liba which has an extra field in the dictionary? If
these are all allowed to mix in the same runtime, then generally a
developer will bump into cryptic errors after doing a normal update of
dependencies. Telling them you just won't run two different versions of the
same library at the same time eliminates this.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#988 (comment)

@dstufft
Member
dstufft commented Apr 7, 2016

What pkg_resources does, doesn't actually remove the need for a dependency resolver. It just pushes the need from an install time need to a runtime need (e.g. if you dynamicalyl adjust the sys.path to add the needed version you have to resolve which one of the installed versions you should add, taking into account all of the version specifiers that may be required). This is a bit of a narrower field since you're unlikely to have every version of every library installed, but on the flip side it means that we inflict our own dependency resolver on other packaging systems that otherwise have no need for them (like when installing with apt-get), nevermind the fact that pkg_resources is incredibly slow because it has to issue a ton of stat calls and traverse the entire sys.path looking for installed files, a cost that you have to pay on every entry point using script.

@leohowell

Similar issue with @dracos

  1. A==1.0 depends on B==1.0 and C(no specific version)

    B==1.0 depends on C==1.3

  2. In a python environment that already had package C==1.0 installed.

  3. After install A:
    A==1.0

    B==1.0

    C==1.0

  4. C won’t be update to 1.3

@odyssey4me

Just to add to the conversation, and for reference to anyone else finding this issue, if you specify a pin in requirements and something else in constraints, the requirement will be ignored and only the constraint used. I've shown an example here: https://gist.github.com/odyssey4me/d8acc9888cc206818e21e059f28b3576

@openstack-gerrit openstack-gerrit pushed a commit to openstack/deb-python-oslo.db that referenced this issue Aug 17, 2016
@zzzeek @thomasgoirand zzzeek + thomasgoirand Test for distinct SQLAlchemy major releases
This change presents one way we might include test support
for oslo.db against specific SQLAlchemy major releases, currently
including the 0.7, 0.8, and 0.9 series.  As we will want to
begin including features within oslo.db that target advanced
and in some cases semi-public APIs within SQLAlchemy, it will
be important that we test these features against each major release,
as there may be variances between major revs as well as
version-specific approaches within oslo.

To accomplish this, I was not able to override "deps" alone,
as the SQLAlchemy revision within requirements.txt conflicts
with a hand-entered requirement, and due to pip's lack of
a dependency resolver (see pypa/pip#988
and pypa/pip#56) I instead overrode
"commands".  I don't know that this is the best approach, nor
do I know how the tox.ini file is accommodated by CI servers,
if these CI servers would need their tox invocation altered or
how that works.

This patch may or may not be the way to go, but in any case
I'd like to get input on how we can ensure that more SQLAlchemy-specific
oslo.db features can be tested against multiple SQLAlchemy versions.
Note that even with this change, running the "sqla_07" environment
does in fact produce test failures, see http://paste.openstack.org/show/85263/;
so already oslo.db expects behaviors that are not present in
all SQLAlchemy versions listed in the common requirements.txt.

Change-Id: I4128272ce15b9e576d7b97b1adab4d5027108c7c
141d266
@dstufft
Member
dstufft commented Dec 15, 2016

#4182 has an interesting sub-problem, specifically related to build dependencies.

@xoviat
xoviat commented Jan 24, 2017

Enthought has been moved to enthought/sat-solver and has advanced significantly.

@xoviat
xoviat commented Feb 10, 2017

@westurner The main blocker is that requirements cannot be downloaded reliably from pypi.

@westurner

@xoviat

@westurner The main blocker is that requirements cannot be downloaded reliably from pypi.

In terms of? https://status.python.org/

Would it be helpful to cache the catalog [JSON] dependency metadata?

@xoviat
xoviat commented Feb 10, 2017

Would it be helpful to cache the catalog [JSON] dependency metadata?

Now you are onto it. What if pip downloads a package and then discovers that it downloaded the wrong package? This may happen hundreds of times with a dependency solver enabled. The only reliable solution is to be able to fetch metadata without downloading the entire package. But currently the API that can be used to do this doesn't work half the time (possibly exaggerating, but still generally correct).

@xoviat
xoviat commented Feb 10, 2017

To be clear, the API "doesn't work" not because it's "down" but because it only works for particular packages.

@cournape

One issue is that any pure-python solution will be slow, except perhaps if we are running on PyPy.

We did not investigate substantially, but in our tests, we found our own simplesat to be ~ as fast as conda's solver using picosat. It is definitely too slow for cases where you have 1000s of packages installed, but those are rather rare I think.

@xoviat
xoviat commented Feb 10, 2017

The issue is definitely not the SAT solver.

@pradyunsg
Contributor

Also some good notes on sat solving on their page, and an awesome email.

@Ivoz The link to the email is broken. Is there some other place I could find it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment