Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of constraints for the new resolver #8136

Closed
wants to merge 10 commits into from

Conversation

pfmoore
Copy link
Member

@pfmoore pfmoore commented Apr 25, 2020

No description provided.

@pfmoore pfmoore added the skip news Does not need a NEWS file entry (eg: trivial changes) label Apr 25, 2020
@pfmoore
Copy link
Member Author

pfmoore commented Apr 25, 2020

A thought. Now that a version of the new resolver is released, should we start actually including news items in PRs? Or would we still be cluttering the changelog with too many bits about the resolver work? (I'm inclined to keep marking PRs as trivial, but I worry that just reflects the fact that I don't like writing docs 😉 )

@pradyunsg
Copy link
Member

pradyunsg commented Apr 25, 2020

Yea... The changelog is not the right place to do that. We'd want to communicate about the new resolver via a separate communication channel -- since there's likely specific things we want to call out, things we want users to do, setup alternative communication channels etc.

#8099 would serve as that communication channel, and we'd want to say "we have a beta of pip's next-gen resolver" in the 20.2 changelog. Or if we're happy enough to flip the defaults, just say "we're shipping the new resolver as a default". :)

(fun fact, since #8099 is locked, we don't see cross-references to that issue)

@pfmoore pfmoore marked this pull request as ready for review April 27, 2020 12:48
def __init__(self):
# type: () -> None
self._name = "<Dummy>"
self._version = Version("0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 Would it work to raise in the version property instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, maybe? Does it matter, though? I might be missing your point here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems awkward to me this class is having a property that’s never called, but returns an arbitrary placeholder value. It would be more obvious to raise in version instead to make it clear that we don’t plan to call it, and it’s a clear error if we ever do.

I think an even better way to do this is to re-work the class hierarchy, and let type checking handle this. Maybe DummyCandidate should inherit from something else that does not have version instead, or maybe we should remove version from Candidate. I’m not sure yet, so it’s likely a better idea to merge this first, and work on refactoring later. (I do want to try some of this.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I get what you mean. We need a dummy name, because the resolver uses name to identify things, but version is unused.

I'll switch to a propery that asserts (assuming that mypy doesn't whine at me).

if ireq.constraint:
return ConstraintRequirement(specifier, factory=self)

return specifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused for a while by this name (and the attribute ConstraintRequirement._specifier). The name yells packaging.specifiers.SpecifierSet but they are totally not 😖

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically it felt like specifier_requirement is too long. But I can't think of a reasonable abbreviation (I'd prefer not to use req as we use that for an InstallRequirement all over the place).

I've changed it to specifier_requirement.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor comment.

@@ -117,3 +117,36 @@ def is_satisfied_by(self, candidate):
# already implements the prerelease logic, and would have filtered out
# prerelease candidates if the user does not expect them.
return self.specifier.contains(candidate.version, prereleases=True)


class ConstraintRequirement(Requirement):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a docstring here, detailing how this works / "flows" through the resolver.

https://github.com/pypa/pip/projects/5#card-37162340

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

    """A requirement that represents a constraint.

    This requirement acts just like a SpecifierRequirement, except that
    it doesn't generate any real matches of its own. This is to ensure that
    if the user doesn't actually request that package X gets installed, a
    constraint on X doesn't trigger installation.

    ConstraintRequirement objects generate a singleton "dummy" candidate,
    so that they satisfy the condition that any requirement has at least
    one match - this is a resolvelib condition.
    """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, though we're going in a different direction now. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are. But I suspect you've just not caught up on my various ramblings below yet, as you're doing release management at the moment ;-)

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

There's a further complication here still to be addressed.

If resolvelib picks the "constraint" requirement to try first, the dummy candidate will be what goes into the candidate list. When we then check the actual requirement, it complains because the dummy candidate doesn't have the right project name (and we can't just say let it through anyway, as it'll break as soon as we try to get its version).

I think the fix here is to modify the provider's get_preference to never choose a constraint when there's a real requirement to pick first. But I need to think about that, as I don't really understand get_preference yet.

This was exposed by merging this PR with #8061.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

Alternative plan. Just have SpecifierRequirement not match dummy candidates, before we test the name. I may be overthinking things above 🤷

No, that doesn't work. We try the constraint, then try the real requirement, then bomb out because there are no candidates left. I'm not 100% sure why we don't backtrack and try the real requirement first. Backtracking is probably discarding it as the thing that caused the problem.

I feel like there's an unstated constraint on providers in resolvelib, that there has to be a "pool" of candidates, where a requirement must return from find_matches all of the candidates in that pool that satisfy the requirement (and nothing else).

At the very least, this constraint needs to be explicitly stated. And I'd very strongly prefer if it wasn't a constraint, but resolvelib did support more flexible requirements like I'm implementing here, so that my "alternative plan" worked.

I'll try to turn this case into a bug report for resolvelib.

[It's actually really hard to even describe the issue I'm hitting here, as resolvelib explicitly doesn't have a concept of "a set of candidates that we're considering"...]

@uranusjr @pradyunsg thoughts?

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

sarugaku/resolvelib#48

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

It's fixable by providing the root requirements so that non-constraints come first and constraints come last. But this feels really nasty, as it implies that we'll still have (at least some form of) order-dependency in how the resolver sees the requirements.

I'm going to go ahead and fix it by ordering the root reqs for now, but I'll add a comment explaining the issue.

@pradyunsg
Copy link
Member

Hmm... Semantically, I feel like we should add constraints first. My mental model for them is that they are for constraining what we end up with, and eliminating them earlier would result in avoiding exploration of that part of the search space.

That said, I'm also fine with doing this for now, and leaving this as a "we'll do this later if there's trouble" task.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

Semantically, I feel like we should add constraints first. My mental model for them is that they are for constraining what we end up with, and eliminating them earlier would result in avoiding exploration of that part of the search space.

If we could do that I'd be more than happy. My mental model says that resolve(reqs) shouldn't depend on the order of reqs at all, although certain orderings might help the resolver be more efficient (in theory that should be the job of get_preference, but no-one understands that 😉).

That's why I raised the resolvelib issue as a bug. The fix of putting constraints at the end relies on an undocumented implementation detail of resolvelib. I'd like it to be fixed so order didn't matter, but I'd be OK with it just ending up documented (someone needs to work out how to explain the behaviour, though, and I don't know how to do that without locking down resolvelib's behaviour).

BTW, internally, the error is coming from resolvelib's very first pass that effectively says "are the root requirements consistent?". It's definitely arguable that this implementation of constraints is broken, because a requirement "A" and a constraint "A" are only consistent if you do find_matches on the requirement, and merge in the constraint, but not if you do find_matches on the constraint and then merge the requirement. But the idea of finding matches and merging is an internal detail of resolvelib, and I'm not sure how to express this without discussing the internals.

(Note: This is all only an issue because getting project metadata is costly - we could avoid all of the issues here if we just treated constraints exactly like any other requirement. But that would mean we potentially had to download and prepare a whole load of projects that were only mentioned in constraints, and never in actual requirements.)

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

OK, there's still a potential problem even if we put the candidates at the end. Imagine:

Constraint B<3.0
Requirement A==1.0
A 1.0 depends on B

Then we feed requirements to the resolver in the order A==1.0 (non-constraint root), B<3.0 (root, constraint), B (dependency). We still get the constraint first, and that "poisons" the candidate set for B. I just added a test that confirms this issue 🙁

So I think we have to abandon this approach, and accept that we can't avoid treating constraints as "normal" requirements as far as the resolver is concerned. That means we hit the index to get the available candidates, even for constraints that we're never going to install. The best I think we can do is to minimise the impact by returning an empty list for the dependencies - so we only ever hit the index page to get name and version data, but never download an actual file.

OK, my brain is fried for the day. I've written this here so that I don't forget where I've got to, but I'm going to leave it now and work on a replacement implementation tomorrow.

self.factory.make_requirement_from_install_req(r)
for r in root_reqs
]
# Build the list of root requirements, making sure that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that constraints are only used for the root requirements, and not secondary requirements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, constraints only come into pip as root requirements (i.e. from a constraints file). They never appear as dependencies - dependencies are always "full" requirements, which will be installed.

Constraints are applied along with normal requirements to determine what are the allowable versions for any project.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that makes sense.

@techalchemy
Copy link
Member

techalchemy commented Apr 28, 2020

The best I think we can do is to minimise the impact by returning an empty list for the dependencies - so we only ever hit the index page to get name and version data, but never download an actual file.

This aligns with my understanding of how this needs to work and with how I initially implemented resolution in the iteration of the resolver that predates resolvelib. AS far as I understood, constraints need to be permitted at any time during resolution and should result in one of the following:

  1. An existing pool of matching candidates is narrowed, due to the constraint having a more strict specifier;
  2. An existing pool of candidates is left untouched due to a less narrow constraint;
  3. A resolution error (i.e. the constraint is not satisfied by the available candidate pool) leads to backtracking in attempt to find an alternative satisfiable path; or
  4. No candidate pool for the specified constraint exists, so one must be created (i.e. by retrieving it)

I'm not totally sure about how this impacts resolution overall in terms of whether that will need to trigger a restart or not, but this is how I have modeled resolution since I've been working on it

@pfmoore
Copy link
Member Author

pfmoore commented Apr 28, 2020

Hi @techalchemy that fits with my understanding too. Essentially a constraint acts exactly like a requirement as far as the resolver is concerned. The only difference is when it comes to the install phase, when we don't want to install selected candidates that are only defined by constraints and not by "real" requirements.

Ideally the resolver itself would have a first-class "constraint" concept - which restricts selection but doesn't contribute on its own accord to the candidates in the final solution - but it doesn't, so most of the games I am playing are designed to capture that information so that we can strip these unneeded candidates out of the final solve.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 29, 2020

I don't think it's possible to rescue this approach. So I'm going to abandon this PR in favour of a new one (#8170) which takes a different approach. Further discussion should probably be on that PR.

I'll leave this open for now, and close it once a working approach goes in.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 3, 2020
@pfmoore
Copy link
Member Author

pfmoore commented May 15, 2020

Closing this as an alternative approach has been implemented.

@pfmoore pfmoore closed this May 15, 2020
@pfmoore pfmoore deleted the nr_constraints branch May 15, 2020 13:06
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master skip news Does not need a NEWS file entry (eg: trivial changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants