Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topological sorting of subrequirements #2478

Closed
taion opened this issue Mar 4, 2015 · 17 comments
Closed

Topological sorting of subrequirements #2478

taion opened this issue Mar 4, 2015 · 17 comments
Labels
auto-locked Outdated issues that have been locked by automation

Comments

@taion
Copy link

taion commented Mar 4, 2015

As evidenced by #2260 and #2473, the behavior where pip installs packages as a function of the order specified, even when some packages depend on other packages, is very fragile, and occasionally leads to the output of a pip freeze not being valid as input to a pip install -r.

The most straightforward example I can think of is h5py. If I install h5py in a clean environment, my frozen requirements look like:

Cython==0.22
h5py==2.4.0
numpy==1.9.2

However, if I try to pip install -r from these requirements, the current sorting behavior makes these packages install in the order

  1. numpy
  2. h5py
  3. Cython

However, h5py cannot be installed without Cython already being installed first (and Cython is specified in install_requires).

I believe the correct way to handle this is for pip to do a topological sort of the requirements, and ensure that subrequirements are installed before their parent requirements, regardless of the order in which requirements are specified.

@piotr-dobrogost
Copy link

Possible duplicate of #988

@taion
Copy link
Author

taion commented Mar 4, 2015

@piotr-dobrogost

I think this might be an interesting subset of the more general problem.

If I'm not misreading #988, the biggest portion of the work required there is the SAT solver for finding the specific versions of packages that satisfy the requirements given.

However, in this case, when using pip freeze and pip install -r, we've already taken care of that problem by pinning the versions. All that is required is a simple topological sort of the requirements to ensure that they install in an order that works.

I believe this can be tackled separately from the broader issues of dependency version resolution mentioned in #988, and it would solve a number of issues right now where pip freeze gives me output that can't be directly fed into pip install -r.

@piotr-dobrogost
Copy link

You are right.

@dstufft
Copy link
Member

dstufft commented Mar 4, 2015

What's the order that works? Generally you should not depend on the order of install_requires in order for your installation to work or not.

@taion
Copy link
Author

taion commented Mar 4, 2015

@dstufft

To clarify, in my example, both numpy and Cython are in install_requires for h5py. Any order that installs numpy and Cython before installing h5py would work in this case.

The current logic does the correct thing if I do e.g. pip install h5py. The problem is that if I do e.g. pip install numpy h5py, then the current ordering logic will try to install h5py before installing numpy, and fail to install h5py.

What I want is that subrequirements under install_requires are installed (in any order) before the requirement itself, regardless of whether they are top-level arguments to pip install, or of their ordering in the requirements.txt file.

@dstufft
Copy link
Member

dstufft commented Mar 4, 2015

In particular, in the documentation for setuptools, the order of installation is unspecified. It says:

When your project is installed, either by using EasyInstall, setup.py install, or setup.py develop, all of the dependencies not already installed will be located (via PyPI), downloaded, built (if necessary), and installed.

It doesn't state whether things will be installed before, after, or concurrently. If you depend on something while the setup.py is executing then you want setup_requires not install_requires.

@taion
Copy link
Author

taion commented Mar 4, 2015

The current pip code already makes an attempt to install subrequirements before parent requirements, though. It shouldn't be more than a few dozen lines of code in req_set.py to add proper topological sorting of all dependencies.

It's not the case that I need numpy and Cython installed for setup.py for h5py to run. They are exactly requirements for building and installing the package.

The concrete issue is that, as it stands, in cases where packages require other packages for installation, the output of pip freeze > requirements.txt cannot just be fed into pip install -r requirements.txt. Instead you would have to do something like cat requirements.txt | xargs -n1 pip install, which feels silly.

I think it would be a general improvement for RequirementSet to properly order the requirements.

@taion
Copy link
Author

taion commented Mar 4, 2015

Specifically, this is breaking common use case 1 of requirements files here: https://pip.pypa.io/en/latest/user_guide.html#requirements-files

setup_requires would make the install work, but it's not really doing the right thing; if I'm doing an installation, all I need is for the install_requires packages to be installed first. It's really not ideal to e.g. install numpy once through setup_requires, only to then have to install it again properly.

@taion
Copy link
Author

taion commented Mar 4, 2015

I admit I'm a bit confused by the setuptools docs on setup_requires:

A string or list of strings specifying what other distributions need to be present in order for the setup script to run. setuptools will attempt to obtain these (even going so far as to download them using EasyInstall) before processing the rest of the setup script or commands. This argument is needed if you are using distutils extensions as part of your build process; for example, extensions that process setup() arguments and turn them into EGG-INFO metadata files.

For h5py, I don't need Cython or numpy for setup.py egg_info, as suggested above. I do need them for setup.py build, though.

The thing that's really not ideal about putting Cython and numpy in setup_requires is that it makes me build both Cython and numpy twice, which makes the installation take almost twice as long as it otherwise would.

@dstufft
Copy link
Member

dstufft commented Mar 4, 2015

install_requires are not build time requirements and you can't treat them as such. It'll work in a few cases but won't work in others. For example, if you treat install_requires as build requirements, then pip wheel won't work either (nor would setup.py build or setup.py bdist_wheel). Try checking out the h5py code in a clean environment and running setup.py build without first installing cython.

The unfortunate thing is that setuptools doesn't really have "build requirements", the closest thing to them is "setup requirements". There are some hacks that you can do to solve some of the problems (like it being required for setup.py egg_info (see for example, https://github.com/pyca/cryptography/ setup.py). It's not a great situation but it's the situation we're in currently. There are plans to add proper build requirements (as well as to make it easier to have binary builds, and caching binary builds locally to prevent things like double compiles).

The change in ordering was to attempt to install dependencies first, primarily as a hackish way of preventing a problem like pip install h5py from successfully installing h5py, but then failing to install numpy (perhaps because of a missing python-dev header or whatever), fixing that, and then doing pip install h5py again and being told that h5py is already installed. This isn't perfect (as you noticed, you can "trick" it by having two mentions of numpy) but it was a rough approximation until we get something that can build a proper dependency graph.

That being said, I didn't close this issue, because the solution you want would make it better for the case that the current order is attempting to solve, I'm mostly just trying to be clear here that you shouldn't rely on the order of install_requires for what you're doing because the chances are in the future it's going to break if we do anything like parallizing builds.

@taion
Copy link
Author

taion commented Mar 4, 2015

That makes sense. Thank you for clarifying the issues with install_requires. I agree that the solution I propose is not ideal, but I think it produces a pretty good compromise in the near-term that will address some current pain points.

I'll get a PR together when I get a chance. I look forward to the more strategic solutions being deployed in the future.

@jluttine
Copy link

Is there any workaround for this? I have a package which requires, for instance, h5py and numpy. But "pip install mypackage" fails on h5py installation because "no module named numpy". I don't understand why numpy isn't installed before h5py. Any ideas what I can do to my setup.py to get this working?

rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
@rbtcollins
Copy link

Hi jluttine, you should make sure h5py install_requires numpy and that your setup.py can run without numpy installed. If your setup.py cannot run without numpy installed, then you should setup_requires on numpy (and you may need some lazy-import glue to permit that to work at all) - see the discussion on #2603 for more info on that.

rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
@jluttine
Copy link

rbtcollins, thanks for the comment. Yes, h5py install_requires numpy and my setup.py can run without numpy installed. Having numpy in setup_requires for my package didn't help the installation of h5py, because h5py didn't have numpy in setup_requires. Thus, installing my package fails when installing h5py requirement (if numpy isn't installed manually beforehand). I didn't check if this has now changed. Anyway, my understanding is that there's not much I can do: either h5py should fix its setup.py or pip should behave differently. But of course, any workaround for my package would be great. And I'll take a closer look on this again.

rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
@rbtcollins
Copy link

You might as a workaround be able to specify numpy on the command line before your package, but fundamentally the bug here is in h5py: if it requires numpy to run setup.py, it needs to setup_requires it.

rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 25, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 29, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 30, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 30, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 30, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 31, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
rbtcollins added a commit to rbtcollins/pip that referenced this issue Mar 31, 2015
This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
dstufft added a commit that referenced this issue Apr 1, 2015
Issue #2478: Topological installation order
jxtx added a commit to bxlab/hifive that referenced this issue Apr 3, 2015
@dstufft
Copy link
Member

dstufft commented Apr 7, 2015

This should be fixed now.

@dstufft dstufft closed this as completed Apr 7, 2015
@taion
Copy link
Author

taion commented Apr 7, 2015

@dstufft @rbtcollins Thanks for fixing this! Sorry I never got around to sending in a PR myself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

No branches or pull requests

5 participants