Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install build requirements before calling setup.py #4799

Merged
merged 18 commits into from Mar 1, 2018
Merged

Install build requirements before calling setup.py #4799

merged 18 commits into from Mar 1, 2018

Conversation

ghost
Copy link

@ghost ghost commented Oct 20, 2017

  1. Fix the PEP 518 build environment system. The PEP 518 build environment system in master does not comply with the specification because it runs egg_info before the build environment is set up. In addition, the PEP 518 system had a circular import, indicating that it needed to be moved to a separate file.

Solution: move the BuildEnvironment into a new file called backend.py and make it a property of a requirement so that it persists throughout the life of setup.py from the first time it's called to when the build directory is cleaned up.

Notably, the first implementation of PEP 518 also did not account for recursion (building source build requirements), which was completely untested (that implementation disabled no-binary but did not select only-binary). I had a test failure where pip would basically rebuild the entire pypi, so I simply disabled non-binary requirements for now.

Ideally, we would use a single process with an "build environment manager" that figures out circular build dependencies, but I decided to wait until we have a resolver (and PEP 517 is completed) before attempting that.

Closes gh-4647.

@ghost ghost mentioned this pull request Oct 20, 2017
from pip._internal.vcs import vcs

logger = logging.getLogger(__name__)


def make_abstract_dist(req):
def make_abstract_dist(req_to_install):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the req spelling; it's shorter and more to the point. Plus, this is noise in the PR diff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -210,6 +210,7 @@ def test_wheel_package_with_latin1_setup(script, data, common_wheels):
assert 'Successfully built SetupPyUTF8' in result.stdout


@pytest.mark.xfail(reason="PEP 518 does not support links")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, this version does not support dependencies that don't have binary (wheel) files available. The reason why is explained in the initial post.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation potentially allows pip to spawn multiple sub-processes recursively to install build_dependencies. Solving that problem is too complex until we have the resolver, and the practical benefits will be realized with this implementation (think numpy, which depends on Cython, which has wheels available).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, this version does not support dependencies that don't have binary (wheel) files available.

OK... so this version isn't mergeable because it doesn't support scenarios pip currently supports? Am I understanding that right? Or are you saying "pip has a bug in situation X", in which case let's fix that bug first rather than proposing code which stops pip working at all in situation X. If necessary, submit an isolated PR that says "pip has a catastrophic bug in situation X - detect that situation and refuse to proceed". Then once that's in, this PR can simply ignore situation X, leaving it as a future extension.

This is another example where you're including too much in one PR. You need to make your PRs much more granular. And yes, that does mean that work on later stages has to wait till earlier stages are thrashed out and merged. But that's better than a huge, un-reviewable mess that never gets merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this version isn't mergeable because it doesn't support scenarios pip currently supports

let's fix that bug first rather than proposing code which stops pip working at all in situation X.

How can I say this? Situation X is rare. In fact, I know of no actual scenarios where situation X would actually occur. In addition, situation X is only supported because at the time, people did not consider the full implications of supporting it. Correctly supporting situation X requires either:

  1. An entirely new refactoring and/or rewrite of key areas (correct).
  2. Recursive process spawning (unmonitored and uncontrolled), which is what the current implementation does (this is horrible).

In contrast, there is another situation: situation Y. Situation Y, in contrast to situation X, is widely used and important.

This PR drops the recursive process spawning (very bad), removing support for situation X (which would actually maybe be used once) and adds support for situation Y (widely used).

But that's better than a huge, un-reviewable mess that never gets merged.

Now that would be implementing situation X correctly; something that even the original implementer avoided.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, I can rattle off a list of at least three major projects (scipy, scikit-learn, numpy) that would use situation Y and don't care about situation X and quite a few more minor projects. I can think of none that would need situation X.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies by the way for the abstract "X" and "Y". I didn't really follow the actual problem, so I simplified (or not...)

As I say, if X is rare, then a PR that identifies it and explicitly reports a "we don't handle this" error should be fairly uncontroversial. So submit that, get it accepted, and then this PR can ignore that case in safety. By merging the two steps into one PR, we end up muddling the issues. Also, a PR that says "drop support for X" would have to clearly explain what X was, and I'd be able to avoid all these clumsy circumlocutions :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So submit that, get it accepted, and then this PR can ignore that case in safety.

Will do. Communication is good.

@pfmoore
Copy link
Member

pfmoore commented Oct 20, 2017

OK, deep breath.

I just re-read PEP 518, to clarify in my own mind what's going on. The only thing that PEP specifies is that if a project specifies X, Y, and Z as build dependencies in pyproject.toml, then pip has to install X, Y and Z in order to run the build. There's no constraint on how pip installs these dependencies, but our implementation choice was to create an isolated environment containing those dependencies, and run setup.py in there (post-PEP 517, replace "run setup.py" with "run the build tool").

#4647 says that currently, we're running setup.py to execute the egg-info command before we set up the build environment. That's clearly wrong, according to PEP 518, and so it's completely right that we can't release pip claiming PEP 518 support without addressing this.

This PR talks about "PEP 518 problems", but it's not clear which problems it's fixing. Is it an attempt to fix #4647? You also mention recursion. Specifically I understand that to mean that if project X specifies Y as a build requirement, and project Y is only available as source with project Z as a dependency, then PEP 518 requires pip to:

  1. Create a build environment containing Z.
  2. Build Y in that environment, and install it into a build environment...
  3. which is then used to build X.

I don't see how this relates to #4647, which demonstrates its issue with a single project having a build dependency which is available as a binary. So I'm going to assume that the two problems are independent (at least in principle - if the pip code to implement things is messy and difficult to disentangle, then fair enough but that's just the classic "simple matter of coding"...)

I do see that handling recursive builds (and their build environments) is likely to be complex. I don't think we can unilaterally decide to omit that part of the behaviour just because it's hard, though. What I will say, is that it's unlikely the authors of PEP 518 had thought much about the recursion situation. I certainly don't remember it coming up in the distutils-sig discussion. And I'm reasonably comfortable with your assertion that situations like this are rare, whereas binary build dependencies are a lot more common. Is this worth limiting the PEP in some way, such as permitting tools to ignore build dependencies that themselves need building, I don't know. Possibly. Maybe @dstufft @njsmith @brettcannon and @ncoghlan as authors/BDFL-delegate of PEP 518 have a view. Maybe it needs to be discussed on distutils-sig.

I wish you'd clearly stated the problem and asked for a discussion, rather than arbitrarily picking an approach and burying it without explanation in a long and complex PR that you'd been repeatedly asked to simplify. If nothing else, I'd have had some time this evening to do other things :-) But I do think it's good that this has surfaced now. We have a chance to make a reasoned decision, which is much better in the long run.

@pfmoore pfmoore mentioned this pull request Oct 20, 2017
@njsmith
Copy link
Member

njsmith commented Oct 20, 2017 via email

@brettcannon
Copy link
Member

We purposefully didn't dive into details of building dependencies as that's a build tool decision on how to make that work; the PEP is just a way to specify dependencies like requirements files, not on how to actually execute any build or install dependencies.

For me, if you have a build dependency that requires building, then you need to build that build dependency.

@pfmoore
Copy link
Member

pfmoore commented Oct 20, 2017

Thanks @brettcannon @njsmith that pretty much seals it for me. It's a requirement of the PEP, so we have to implement it if we want to claim we support the PEP.

Sorry @xoviat - looks like we need to support full recursion, even though it's a rare case. (It's a minor consolation, but loops in the build dependency graph would be a packaging error, and we'd be fine to fail the build if we get those).

@pfmoore
Copy link
Member

pfmoore commented Oct 20, 2017

Are you fine with an infinite loop in that case? Because that's what currently happens

I'm OK with it. I'd rather we managed to be more user friendly than that, but I won't lose sleep over it, personally.

One other corner case that comes to mind (again, possibly related to something you've mentioned elsewhere). If the user specifies --no-binary :all: and we respect that when installing build dependencies, we become much more likely to trigger this sort of issue. But ignoring the flag is arguably just as unreasonable as enforcing --only-binary :all: which was in effect what you were originally proposing here. So I suspect we just have to accept that if the user forces extra builds, they get what they asked for in terms of complexity.

@pfmoore
Copy link
Member

pfmoore commented Oct 20, 2017

Actually I think circular dependencies can actually deadlock a Windows system with the current pip. Want a PR with a test for that?

I'd prefer a description of how such a problem would get triggered, and an explanation of why we deadlock (I assume you mean "deadlock" rather than "infinite loop" - but what locking do we do?)

I think at this point we need less code, and more discussion and specifications.

@ghost
Copy link
Author

ghost commented Oct 20, 2017

When I say deadlock, I mean deadlock. You need to reboot.

@ghost ghost closed this Oct 21, 2017
@ghost ghost deleted the pep518 branch October 21, 2017 04:51
@ncoghlan
Copy link
Member

Note: I'd consider pip still compliant with PEP 518 even if it considered recursive pyproject.toml builds and --no-binary :all: mutually exclusive for now (i.e. the first iteration of the PEP 518 support could require you to allow binary build dependencies in order to manage the bootstrapping problem).

The other possibility to consider is that build systems like Fedora's koji allow for the notion of a shared buildroot: once something is in the buildroot, future builds can just use it, they don't need to rebuild it each time.

For pip, the equivalent to this would be to have a build cache (distinct from the current wheel cache, which may include both downloaded and locally built wheel files). Given a local build cache, even a --no-binary :all: build could still reasonably use those wheel files (since they were built locally, just earlier on).

@pfmoore
Copy link
Member

pfmoore commented Oct 21, 2017

With each of these builds, a new process is spawned. So the processes multiply fast

Well, I can see one concurrent build per level of recursion. Surely recursion can't go that deep that it's a problem. I'f you've hit massive explosion of processes, that sounds like an implementation bug.

For pip, the equivalent to this would be to have a build cache

That sounds like an entirely reasonable approach. I'd certainly hope we wouldn't try to build any given build dependency multiple times.

Note: I'd consider pip still compliant with PEP 518 even if it considered recursive pyproject.toml builds and --no-binary :all: mutually exclusive for now

I'm happy with that - if for no other reason than that with the "assume setuptools as a build dependency if nothing is explicitly specified" principle, it would be awfully easy to accidentally slip into endlessly building setuptools from source in order to build setuptools... I actually suspect it's entirely reasonable to simply say that --no-binary: :all: doesn't apply to installation of build requirements, but I don't know the use cases for --no-binary :all: so someone who does should probably validate that idea.

If we wanted to get into "practicality beats purity" discussions, the key point here is that it's likely that there's a distinction to be had between "build systems" (in the PEP 517 sense - setuptools, flit, enscons, ...) and "build dependencies" (like "numpy because setup.py imports it"). It would be reasonable to assume that the former are available as wheels (because otherwise we hit "build the build system" recursions) but less so for the latter. How we'd leverage that distinction (without changing PEP 518 to for example have two types of build dependency) I don't know, though.

@ncoghlan
Copy link
Member

For the distro package use case, the option we would want is something along the lines of --no-implicit-build-deps (i.e. we'd want pip to let us take care of setting up the build environment, and just complain if anything was missing, rather than trying to fix it automatically).

@ncoghlan
Copy link
Member

That approach would actually pair nicely with the implicit build dependency installation ignoring the --no-binary :all: setting, as it would push the responsibility back onto the caller to do:

# Make sure you have pip, setuptools & wheel bootstrapped before you start
pip install --no-implicit-build-deps --no-binary :all: [build dependency specifiers]
pip install --no-implicit-build-deps --no-binary :all: [runtime dependency specifiers]

if they really wanted a "build completely from source" setup.

@pfmoore
Copy link
Member

pfmoore commented Oct 21, 2017

the option we would want is something along the lines of --no-implicit-build-deps

Hmm, I'm not sure that's something that we've even considered. I don't know how that fits alongside the idea of isolated builds, actually (if you manually install the build deps are they then visible in the actual install?) I'd have to defer to someone who understands the build isolation stuff to comment her - I'm getting out of my depth...

@pfmoore pfmoore mentioned this pull request Oct 21, 2017
@ncoghlan
Copy link
Member

ncoghlan commented Oct 21, 2017

It may help to know how the RPM build process works, since it's a "generate-and-filter" model:

  1. Inside the build root (which is the chroot), you install all the build dependencies
  2. Then you build & install the component itself
  3. The you take a specified subset of the files inside the build root, and those become the built binary package

So the chroot will have both the build dependencies and all sorts of other intermediate artifacts (object files, helper apps, etc), but those get discarded by the filters that indicate which outputs should end up in which RPMs.

Related to that, we likely wouldn't use Python level isolation for system package builds - we'd need the system site-packages to be available when building wheels, as we'd often be managing the build dependencies with RPM's BuildRequires, rather than the Python level settings (however, we'd still benefit from the Python level settings - we'd just use them to generate the right RPM level settings, rather than using them directly at build time)

@dstufft
Copy link
Member

dstufft commented Oct 21, 2017

Sounds like setuptools should just have a pyproject.toml with an empty list for build dependencies, and perhaps we should special case setuptools to force that in older setuptools.

@pfmoore
Copy link
Member

pfmoore commented Oct 21, 2017

I don't understand your point about the finder, no.

And yes, see my point about "build tools" being different from other build dependencies...

@dstufft
Copy link
Member

dstufft commented Oct 21, 2017

In a released sdist, setuptools does not require anything besides Python to install itself.

@pradyunsg
Copy link
Member

Also, does anyone else understand what I'm saying about using the finder to resolve dependencies?

I do.

I don't understand your point about the finder, no.

The finder provides a list of versions of a package available; the current implementation just uses the latest version in that list. Whatever pip does as dependency resolution today, the build environments are setup without going through that entire section of the codebase.

@pradyunsg
Copy link
Member

pradyunsg commented Oct 21, 2017

if they really wanted a "build completely from source" setup.

@ncoghlan How about this approach -- build wheels for the underlying dependencies from source and puts these wheels into a directory and use --no-index --find-links <that-directory> for building the main sdist into a wheel or something. This would not need any new options in pip I think.

I'm just asking. :)

@pradyunsg
Copy link
Member

@xoviat I feel it would be nice to keep #4802 for discussing what pip implements for PEP 518; how pip implements it should probably be discussed here.1

I'll respond here, to some implementation related points you made there:

Subprocess gets the requirements list exactly as specified. That way it goes through the resolver.

What about command line arguments (of the top of my head at least these are significant --index-url, --find-links, --verbose and --isolated). How exactly do you want to go about this? I think we could try to reuse sys.path or options.

The idea is sort of like this: in install.py, you would have a BuildEnvironmentManager:

I guess we can just move these lines into the BuildEnvironmentManager.

That's sort of a major-ish refactor -- something like this will be needed eventually but probably not in pip 10. (I have some ideas about some refactors of the codebase that should make this easier to do later but that's next year stuff)

If we're going with just wheels, my understanding is that we won't need to manage build environments. We can build a new one for every wheel we're building and then trash it -- which is doable in WheelBuilder itself.


1I like to segment approach/implementation discussions into issues/PR for discussions with lots of parties so that those that are not familiar with the implementation details are not alienated or left speculating. :)

@pradyunsg
Copy link
Member

That's in fact the entire reason that this PR exists: to fix this exact assumption made by the original implementer.

Alright. Could you please explain what's wrong with this assumption?

@pradyunsg
Copy link
Member

Is it because egg_info is called before WheelBuilder is involved somehow?

I don't exactly remember the flow.

@@ -91,17 +94,43 @@ def dist(self, finder):
)
return dist

def prep_for_dist(self):
self.req.run_egg_info()
self.req.assert_source_matches_version()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically here.

@pradyunsg
Copy link
Member

Okay... Now I get it. You've basically wrapped that call to be inside the BuildEnvironment context and moved BuildEnvironment into a longer existing scope -- correct?

@@ -1,6 +1,8 @@
#!/usr/bin/env python
from setuptools import find_packages, setup

import simple # Test gh-4647 regression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a separate test please?

A single test should only try to test one thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was incorrect when it was initially written. There is no need to have a broken test and a correct test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I think the comment can be changed to "ensure dependency is installed".

@ghost
Copy link
Author

ghost commented Jan 27, 2018

@pradyunsg Before I tackle the final points, is there anything else?



class NoOpBuildEnvironment(BuildEnvironment):
def __enter__(self):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I should have overridden the initialization method here as well.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything that stands out to me on another quick glance.

news/4799.bugfix Outdated
@@ -0,0 +1 @@
Fix situation where build requirements are installed before calling setup.py.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a news entry since this bug never actually made it to a release.

@ghost
Copy link
Author

ghost commented Jan 28, 2018

Okay, so can we go ahead and merge this?

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does what it says on the the title. LGTM.

I'd like someone else to review this but if no one has the time to, in the coming weeks, I'll merge. (sorry that's the timescale due to the limited dev time we have)

]
args = [
sys.executable, '-m', 'pip', 'install', '--ignore-installed',
'--prefix', prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trailing comma

@pradyunsg pradyunsg dismissed their stale review January 29, 2018 05:44

Concerns have been addressed.

@ghost
Copy link
Author

ghost commented Feb 15, 2018

cc @pradyunsg

@ghost ghost closed this Feb 28, 2018
@pfmoore
Copy link
Member

pfmoore commented Feb 28, 2018

@xoviat why did you close this when @pradyunsg has promised to merge it if no-one else is available to review it?

@pradyunsg
Copy link
Member

pradyunsg commented Feb 28, 2018 via email

@pradyunsg
Copy link
Member

Still curious why it was closed tho.

@ghost
Copy link
Author

ghost commented Mar 1, 2018

Sorry, I closed this because I have too many PRs open at once. If it's going to be merged soon, then I'll reopen it.

@ghost ghost reopened this Mar 1, 2018
@pradyunsg
Copy link
Member

I think this is good to go. I'll merge once the CI is happy.

@pradyunsg pradyunsg merged commit 163149f into pypa:master Mar 1, 2018
@pfmoore
Copy link
Member

pfmoore commented Mar 1, 2018

Yay! Thanks guys for getting this done - I know it's been hard work, but it's really appreciated 😄 🎉

@astrojuanlu
Copy link
Contributor

Thanks all! This moves us closer to pip 10.0 I suppose? :)

@njsmith
Copy link
Member

njsmith commented Mar 1, 2018

So does this mean that pip now officially, actually supports PEP 518?

@pradyunsg
Copy link
Member

Thanks all! This moves us closer to pip 10.0 I suppose? :)

Yep! ^>^

So does this mean that pip now officially, actually supports PEP 518?

It's currently limited to binary-only build dependencies. I expect once we have some feedback on it, we'll know how this limitation fairs. I guess by pip 11, this constraint would be lifted.

@njsmith
Copy link
Member

njsmith commented Mar 1, 2018

To make sure I understand: the build dependencies have to have a wheel available?

@pradyunsg
Copy link
Member

pradyunsg commented Mar 1, 2018 via email

@ghost ghost deleted the pep518 branch March 1, 2018 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PEP 518 support: setup.py is called before build requirements have been installed
10 participants