-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Logic Refactor #6607
Comments
Just filed #6622 containing the next step - moving logic into SourceDistribution. |
What is the relationship of this issue to the resolver issue (#6536)? |
This cleans up things so that adapting pip to resolvelib's abstractions becomes easier. Plus, I'll be less scared about having broken the general build process while swapping out the resolver logic. 🙃 |
It's not really a design consideration, but the refactoring may be easier if the TODOs and FIXMEs in e.g. |
@cjerdonek I notice that my previous response is super terse, so I'll elaborate a bit. I think this project, is closely related to the actually implementing the resolver. We have to perform the fetch-build-query during our resolution process. Ideally, the resolver should not see that this is happening (i.e. operate on a more "abstract" space, like the resolvelib/zazo abstractions). This, however, also means that we need to make it easier to actually modify the resolution and build logic, which have been entangled in the codebase since I started working on pip. 🙃 In essence, what this project is aimed at doing is make the build logic code easier to reason about and follow, which in turn would make it easier to separate it from our resolution logic and make life easier when we do go ahead with the new resolution logic -- #6536 is for figuring out the communication and overall approach for that. Another thing I'll note is that, this was the main blocker/issue/tricky-thing-that-I-sunk-time-into when I was working on a weekend-prototype (https://github.com/pradyunsg/pip/tree/resolver/make-it-inside). The build logic essentially had to be moved around a fair bit, and I realized that it'll be a worthwhile investment to decouple the two, and refactoring/improving the build logic was a great way to do that. :) |
@chrahunt and I had a couple of voice calls over the past few days, where we discussed this (pip's build logic). We discussed how the builds "flow" and what information we have available at each stage. To that end, @chrahunt has a GitHub Gist containing both a mental model that we came up with to understand how the build logic works currently as well as a structured file, for potentially generating insights from. All this is a little roughly sketched out right now, and we're still figuring things out in greater detail. Inputs are definitely welcome. :) |
I had an itch to scratch w/ graphviz and this problem lends itself perfectly to being visualized with a directed graph. @dreampuf helpfully built GraphvizOnline, which I used to prototype a graph that shows the code flow that we'd want to follow. There are definitely a lot more ways to visualize this but my goal here was to have something I can look at later and make sense of, which this is. :) Notes:
|
@techalchemy and I had a call recently, and we discussed some items related to this and related to the resolver situation. The following are (some of?) the actionable notes I have from our conversation:
Broader notes related to the resolver:
|
Writing a summary/progress report, for folks who haven't been following this super keenly, and (mostly) self-reference later. The "generate metadata" logic has been almost completely moved into With these changes, the current "generate metadata" flow is:
This is quite a few hoops jumped here, with back-and-forth between the My aim for this stage is to get rid of
I was going to work on To avoid duplicated effort, I'll be skipping working on this part of the codebase and progressing to the next "stage", since we can always come back to this if needed. I've filed #7259 for discussing the overall design of what the result of the refactor should look like. That'll make it easier to figure out where stuff should go, since after that discussion, we'd have (mostly) decided what should go to which part of the codebase. Installation code hasn't been touched yet. I have some rough plans w.r.t. how we'd want to handle this. It'll likely be the same approach as metadata generation:
Metadata directory handling also got some much needed attention (#7100, #7087, #7086, #7064). These changes should make how metadata directories are handled, more consistent across PEP 517/"legacy" build logic. Other than that, some less ambiguous names are a nice improvement and make me feel a little better around this bit of the codebase. There's still more work to do here, but I think we can only chip away on that after installation code is also better decoupled. Cleaning up install options + build options, is something that I want to work on eventually but I don't think I can come around to this at this time. I imagine this would take the form of converting CLI options + flags into 2 objects One of the things to consider here is that, how we handle this, very directly affects the Scientific Python stack, since they're very closely tied to our build approaches. |
I would also mention that some progress is being made on breaking up our existing logic for initial creation and preparation of requirements (#7245, #7180, #7176, #7175, #7068, #7066, #7046, #7025). This lays some of the groundwork we'll need for implementing an |
[Arriving from #7589 (comment)] @pradyunsg @chrahunt The diagram from #6607 (comment) is very nice and extremely useful! It would be nice to have it in a place where can do PRs on it, and maintain the diagram across refactorings (for instance when the unpack arrow from local wheel is removed, or if/when the copy arrow from local non-editable directory is removed). For instance one improvement suggestion I would make is to remove "Local editable named VCS" and make the download arrow go directly from "Remote editable named VCS" to "Local editable directory", because for all installation and build purposes, pip does not care if a local editable directory came from a VCS origin. |
Modified as suggested. :) |
#7709 -- seems like we might've made a mistake in "how does pip install from sdist" logic (in the graph above) because pip seems to have variable build behavior, depending on how it got the sdist. |
For future reference, that turned out not to be true. :) |
I believe a bunch of the work this issue has been tracking is now done, but am leaving it open and moving it to the "far future" column in the Resolver Rollout project board, to leave room for future improvements. |
Picking this back up, there's a bunch of new stuff that we'd want to do with our build logic now:
This issue is about refactoring though, so I think we should track those separately. But, I'm noting these here since we'd want to be mindful of how this would fit in with this broader refactor. I started looking at this again yesterday and @pfmoore also expressed interest in helping out with this. Looking at this with somewhat fresh eyes now, it seems to me that we're still untangling the build logic from the rest of the codebase. Obligatory shoutout to @sbidoul for being accomodating with my comments in the PEP 660 PRs! The most obvious next-step here seems to be #7049. I think a good follow up to that would be to remove/clean up the "prepare" verb throughout the codebase -- we're using After that, if I'm understanding correctly, we'd basically be ready to start moving to the immutable-objects-that-transition model. That thing still needs a bunch of thought, since that's where we'd be getting rid of the mutate-me-however-you-want |
Oh, and here's the new-and-updated graph for how our build-system works currently.
Graphviz sourcesdigraph G {
// All the nodes
subgraph user_provided {
node [shape=box, peripheries=2];
subgraph remote {
remote_sdist [label="Remote sdist"];
remote_wheel [label="Remote wheel"];
remote_unnamed_vcs [label="Remote unnamed VCS"];
remote_named_vcs [label="Remote named VCS"];
remote_archive [label="Remote archive"];
}
subgraph local_sources {
local_wheel [label="Local wheel"];
local_sdist [label="Local sdist"];
local_directory [label="Local directory"];
subgraph downloadables {
local_named_vcs [label="Local named VCS"];
local_unnamed_vcs [label="Local unnamed VCS"];
local_archive [label="Local archive"];
}
}
}
subgraph intermediate {
unpacked_sdist [label="Unpacked sdist"];
unpacked_sources [label="Unpacked sources", color=gray, fontcolor=gray];
project_legacy [label="setup.py-based project", color=brown, fontcolor=brown];
project [label="pyproject.toml-based project", color=royalblue, fontcolor=royalblue];
local_wheel_editable[label="Local editable wheel"];
}
// All the clusters
subgraph cluster_remote {
label = "Remote";
style = dashed;
subgraph cluster_remote_one {
style = invis;
label = "";
remote_unnamed_vcs;
remote_named_vcs;
remote_archive;
}
subgraph cluster_remote_two {
style = invis;
label = "";
remote_sdist;
remote_wheel;
}
}
subgraph cluster_installed {
node [shape=box];
label = "Installed";
style = dashed;
installed [label="Installed distribution"];
installed_editable [label="Installed editable distribution"];
}
// All the edges
subgraph connection_download {
edge [label="download", color=green, fontcolor=green];
remote_unnamed_vcs -> local_unnamed_vcs;
remote_named_vcs -> local_named_vcs;
remote_sdist -> local_sdist;
remote_wheel -> local_wheel;
remote_archive -> local_archive;
}
subgraph connection_unpack {
edge [color=violet, fontcolor=violet];
local_sdist -> unpacked_sdist [label = "unpack"];
local_directory -> unpacked_sdist;
local_archive -> unpacked_sdist;
local_unnamed_vcs -> unpacked_sdist;
local_named_vcs -> unpacked_sdist;
}
subgraph connection_future {
edge [color=gray, fontcolor=gray, style=dashed];
local_directory -> unpacked_sources;
local_archive -> unpacked_sources;
local_unnamed_vcs -> unpacked_sources;
local_named_vcs -> unpacked_sources;
unpacked_sources -> local_sdist [label="setup.py sdist"];
unpacked_sources -> local_sdist [label="build_sdist"];
}
subgraph connection_install {
edge [label="install", color=purple, fontcolor=purple];
local_wheel -> installed;
local_wheel_editable -> installed_editable;
}
subgraph connection_modern {
edge [color=royalblue, fontcolor=royalblue];
unpacked_sdist -> project;
project -> local_wheel_editable [label="build_editable_wheel"];
project -> local_wheel [label="build_wheel"];
}
subgraph connection_legacy {
edge [color=brown, fontcolor=brown];
unpacked_sdist -> project_legacy;
project_legacy -> local_wheel [label="setup.py bdist_wheel"];
project_legacy -> installed [label="setup.py install"];
project_legacy -> installed_editable [label="setup.py develop"];
}
} |
@pradyunsg thanks for the update, this diagram remains very useful! A couple of question pop up in my mind:
|
I’ve been thinking about how to implement PEP 658 (metadata-only resolution); it could be useful if Remote metadata is a state in the build logic, I think. |
One point I'd like to try to capture in the model is where we can reasonably apply the rule that all instances of package XXX version X.Y are equivalent. I think the only such cases are "local wheel", "remote wheel", and "installed distribution". The motivation here is to know when it's OK to not build or reinstall, because we know we already have that version. So, for example, we don't need to install a local wheel if it's the same version as the installed distribution. But are we OK to skip building an unpacked sdist if we have a local wheel with the same version? I think if we do this we might want to split the decision making into 2 parts - standards (wheels having the same project name and version MUST be equivalent) and pip policy (pip won't rebuild a sdist if we have the wheel, but will rebuild a VCS checkout). The standards part should be straightforward (packaging standards have pretty much always assumed that name/version uniquely identifies an installable object, just never said so outright) but the pip policy side may be less so (reality gets in the way of theory 😉). This may be too big a question to cram into this one diagram, but I think we have been working on unstated assumptions for too long, and it's something we should be explicit about. |
Whether they contain sdist-related metadata.
Between the first two, no (hence the lack of a label on the arrow, although I think I messed up with the colors). The unpacked sdist would contain sdist-related metadata for the package? |
This is an issue for the broader design discussions for the overall build logic refactor work that I'll be doing.
Related: #5051
Tracking project board: https://github.com/pypa/pip/projects/3
@pypa/pip-committers I was thinking about this a bit and IMO we should try to keep the "broad" discussion here since it'd probably make life easier for me to be able to look back and reference things. I've also created a project board to better organize myself while working on this.
The text was updated successfully, but these errors were encountered: