Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Resolver.resolve take a plain list of InstallRequirement #7571

Closed
chrahunt opened this issue Jan 7, 2020 · 14 comments · Fixed by #7704 or #8061
Closed

Make Resolver.resolve take a plain list of InstallRequirement #7571

chrahunt opened this issue Jan 7, 2020 · 14 comments · Fixed by #7704 or #8061
Assignees
Labels
C: dependency resolution About choosing which dependencies to install type: refactor Refactoring code

Comments

@chrahunt
Copy link
Member

chrahunt commented Jan 7, 2020

What's the problem this feature will solve?

Currently pip's resolver logic is spread between pip._internal.req.req_set.RequirementSet and pip._internal.legacy_resolve.Resolver, among other classes. Regardless of the approach taken for implementing the new resolver, simplifying the interface to the resolve function will help us re-use more code before and after the call to resolve without having to consider the legacy resolution logic currently in RequirementSet.

Describe the solution you'd like

Several steps should let us get rid of RequirementSet as an input to Resolver:

  1. Globally configure and manage whether temporary directories should get deleted
  2. Switch all directories currently handled in InstallRequirement.remove_temporary_source to be globally managed and subject to the configuration from the previous step. Remove InstallRequirement.remove_temporary_source
  3. Remove RequirementSet.cleanup_files since it only calls InstallRequirement.remove_temporary_source
  4. Construct a new RequirementSet in Resolver.resolve from root_reqs and use that during resolution. Return the new RequirementSet, which callers should use for the next step of processing instead of the input RequirementSet they were using previously.
  5. Instead of passing a RequirementSet to Resolver.resolve, pass req_set.unnamed_requirements + list(req_set.requirements.values()) into Resolver.resolve similar to what is currently calculated within Resolver.resolve here.

After these steps, we should be able to fork RequirementSet into one class for use with initial requirement parsing here (which could be eventually simplified to a plain list), and another class for use within Resolver (which could potentially be inlined into Resolver itself).

Alternative Solutions

  • Continue using RequirementSet as the interface to Resolver.resolve and suffer since much of its logic is closely tied to the current method of dependency resolution/preparation, which would conflict with any new resolver.

Additional context

@chrahunt chrahunt added the type: refactor Refactoring code label Jan 7, 2020
@chrahunt
Copy link
Member Author

chrahunt commented Jan 8, 2020

One issue with globally managing the temporary build directory is that InstallCommand and WheelCommand catch a PreviousBuildDirError (see here) and then explicitly set options.no_clean. This prevents us from taking a straightforward approach for configuring global tempdir handling if options.no_clean is set.

One possible way to work around this is to reason about when PreviousBuildDirError could be raised here and eventually get rid of it.

@pradyunsg
Copy link
Member

/cc @uranusjr @pfmoore to flag it to their attention.

@pradyunsg
Copy link
Member

I don't think we need to get rid of all of RequirementSet. It's a useful container class as it stands, with one big "blot" -- other than 1 method, it's basically doing container-object things. I don't view this as a blocker for working on the new resolver, although we would benefit from doing this after the fact (based on learnings from implementing the new resolver).

The main bit that causes entanglement is the add_requirement method. That currently handles markers and "revisit" semantics for the current resolution process (I'm not clear on the details of the latter myself). This is the tight coupling that we want to not have going forward.

I think the approach to take for that, is to not use RequirementSet.add_requirement at all in the new resolver -- this was the reasoning that resulted in me adding the add_*_requirement methods, which do only one small task, as their name suggests. We could add "replace/update_*_requirement" methods if they're needed.

The main benefit of going this route, would be minimizing the amount of "turmoil" needed prior to working on the new resolver. To get rid of RequirementSet entirely (which I do think is a worthwhile goal), we'd have to break out the branches that are used for populating the RequirementSet initially as well as figure out what'd be a good place to do marker handling in the legacy resolver -- both of which are (I think) non-trivial tasks and our energy is probably better spent on getting the new implementation up to speed/polished since that'd either eliminate the need to handle them, or simplify the task significantly.

One change that IMO we should make here would be to start returning a RequirementSet from the resolver, instead of assuming that the initial input was mutated. That'll make it easier for the newer resolver to represent state however it wants and compose a RequirementSet at the end, so that it can be a drop-in replacement.

@pradyunsg
Copy link
Member

PS: I don't think this has anything to do with build logic -- this is almost completely around pip's resolution logic and semantics (what will be downloaded/built vs the actually process of running the build).

@chrahunt
Copy link
Member Author

chrahunt commented Jan 8, 2020

One change that IMO we should make here would be to start returning a RequirementSet from the resolver, instead of assuming that the initial input was mutated. That'll make it easier for the newer resolver to represent state however it wants and compose a RequirementSet at the end, so that it can be a drop-in replacement.

Agreed. Simplifying the interface is the main point.

@pfmoore
Copy link
Member

pfmoore commented Jan 8, 2020

Your items 1, 2 and 3 seem to be more around temporary directory management (disclaimer: I've not looked at the code yet, so this is based solely on your description). They seem like a good change to make in any case.

I'm less sure about 4 and 5, which are directly related to the resolver. At this point in time, I don't think we can really say what the best interface to the resolver will be, and I'm reluctant to make changes without a clear goal in mind.

For example, I'm currently thinking that there are two distinct types of "requirement":

  • Ones that for want of a better term, I'll call "constraints"1 which are a combination of a name and a version restriction. These need to be "resolved" in the sense that they match multiple distribution files and we need to pick one.
  • Things that just need installing (directly named wheel files, source directories, VCS URLs, etc). These don't get resolved as such (there's no choice to be made) but they do act as information for the resolver (a specific wheel means that, for example, foo version 1.0 will be in the final installation set, so the resolver needs to take that into account).

With that in mind, a RequirementSet may end up being a useful container for those two types of object, rather than trying to encode that information into a list of strings.

So basically I'm saying let's hold off on doing anything drastic with RequirementSet until we know what form its replacement will take.

1 I know that has a different, specific meaning to pip, I'm only using the term here for purposes of discussion.

@uranusjr
Copy link
Member

Reading the code (and trying to get an idea how the new resolver can be fitted in), I feel the problem is not the resolver mutates RequirementSet, but the InstallRequirement instances held inside. There are a lot of states in them I am not sure how to replicate. It may even be a good thing that the resolver mutates RequirementSet, since that makes it more obvious InstallRequirement instances inside are stateful. I’d say we refactor out the temp dir stuff (make them globally managed), but otherwise keep things mostly in-place.

Since I haven’t seen it mentioned, RequirementSet also holds successfully_downloaded. This is populated by the resolver, and used when by the download command. Not a big problem, just a thing the new resolver needs to implement.

@chrahunt
Copy link
Member Author

InstallRequirement being mutable is an important issue. I think it is separate from this. Both have implications on how easy it will be to fit in the new resolver. The point of normalizing the interface to Resolver.resolve is to reduce the surface area of the current Resolver (and its tightly-coupled types, like RequirementSet [due to add_requirement]). We wouldn't want to use RequirementSet in the new resolver because the bulk of its code does more than what a plain container would do, and it expects to be holding InstallRequirements.

Since I haven’t seen it mentioned, RequirementSet also holds successfully_downloaded. This is populated by the resolver, and used when by the download command. Not a big problem, just a thing the new resolver needs to implement.

We can avoid implementing this in the new resolver if we move the actual downloaded file copying to pip download instead of operations.prepare.

@uranusjr
Copy link
Member

I agree we don’t want use RequirementSet on the new resolving algorithm. On the other hand though, it is still needed by the legacy resolver before we can retire it, and IMO it is less work to write an adapter for the new resolver to consume a RequirementSet (so pip keeps all the current interface without the algorithm relying on RequirementSet) than trying to tinker the legacy resolver.

This might be because I am far less familiar with how legacy_resolve works than any of the proposed new resolver implementations, and , so I’d be glad to let you make the decision if you think I am overestimating the work required to refactor.

@pradyunsg
Copy link
Member

pradyunsg commented Jan 17, 2020

I’d be glad to let you make the decision if you think I am overestimating the work required to refactor.

I think it'll be as easy as you're thinking it'd be -- add the items using the add_*_requirement methods. :)

@pradyunsg
Copy link
Member

Quoting a past me:

Should build a better abstraction model for RequirementSet (basically, don’t manipulate in-place and create a new “CandidateSet” for tracking chosen candidates)

From #6607 (comment)

@uranusjr
Copy link
Member

Update from the future: Turns out this is still needed because RequirementSet still conflates requirements incorrectly in various cases, and we need to reduce its usage even more for the new resolver to get what it needs.

Issues impacted by this include:

@uranusjr uranusjr added C: dependency resolution About choosing which dependencies to install C: new resolver and removed auto-locked Outdated issues that have been locked by automation labels Apr 16, 2020
@uranusjr uranusjr self-assigned this Apr 16, 2020
@pelson
Copy link
Contributor

pelson commented Jul 2, 2020

Just to say that #7096 (and possibly #8036) were closed in favour of this issue, but this issue has since been closed by #8061 which doesn't solve the issue. I added xfail tests for #7096 in #8059, which are still xfailing. Is there a place we can track those issues, or should this issue be re-opened?

@uranusjr
Copy link
Member

uranusjr commented Jul 2, 2020

@pelson We opted to write an entirely different dependency resolver to solve the issue (among many others) instead. See #8099 for more details.

The new resolver should not hanve those issues (please report if they do!), and the current (legacy) resolver will be retired after the new resolver gains sufficient usage according to the plan laid out in #8371 (comment).

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: dependency resolution About choosing which dependencies to install type: refactor Refactoring code
Projects
None yet
5 participants