Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use data-dist-info-metadata (PEP 658) to decouple resolution from downloading #11111

Merged
merged 18 commits into from
Sep 10, 2022

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented May 11, 2022

Problem

#10748 implements several things at once, which makes it large and difficult to review. After a recent discussion on that PR (#10748 (comment)), I realized that support for PEP 658 is totally orthogonal to generating a resolution report.

Solution

  • Implement PEP 658 support on top of the existing --use-feature=fast-deps work from @McSinyx.
    • This will look for and fetch *.metadata files if provided by the index instead of using LazyZipOverHTTP. If --use-feature=fast-deps is enabled, we then fall back to LazyZipOverHTTP if this is unavailable (which is the case for PyPI currently, and will remain the case for any --find-links repos).
  • Add thorough testing for success and failure modes of PEP 658 (this wasn't done in metadata-only resolve with pip download --dry-run --report! #10748 already).

Result

When PyPI implements PEP 658, we will immediately be able to use it to speed up resolves!

cc @uranusjr

@pradyunsg
Copy link
Member

I'm here to say that it's almost 01:11:11 where I am, and I really like this PR's number. :)

@cosmicexplorer cosmicexplorer force-pushed the pep-658-fast-deps branch 4 times, most recently from cfc14e8 to e9205ee Compare May 12, 2022 00:30
@cosmicexplorer
Copy link
Contributor Author

@sbidoul: I was able to disentangle ArchiveInfo and LinkHash here and remove a lot of code in the process (it now avoids touching direct_url.py at all). I think it might make sense to have ArchiveInfo retain an Optional[LinkHash] instance instead of an Optional[str], especially since link.py no longer imports direct_url.py, so there is no import cycle. What do you think?

@cosmicexplorer cosmicexplorer force-pushed the pep-658-fast-deps branch 2 times, most recently from 52cb80c to 69e28eb Compare May 12, 2022 03:45
@pradyunsg
Copy link
Member

Hmm... I don't think the PEP 658 code path needs to be behind the fast-deps flag. It is standard-backed behaviour, which should be implemented in a way that makes things work.

@pradyunsg
Copy link
Member

Beyond that, I have to pack for a trip -- so I won't be able to look at this properly for ~a month (unless I end up looking at this at the airport or whatever). :)

I'm reallly excited that you've done this work though @cosmicexplorer! While I've been a bit busy getting the 21.1 release out, I'm looking forward to getting this PEP landed, hopefully in time for the next release cycle in a couple of months. ^.^

@uranusjr
Copy link
Member

I don’t think this should be behind a flag either. Instead, it should be done in a way that works by default, and falls back gracefully if the server does not provide this information.

Thanks a ton for doing this though! This is something I’ve been thinking about but never had the time for.

@cosmicexplorer cosmicexplorer changed the title use data-dist-info-metadata (PEP 658) when --use-feature=fast-deps is enabled use data-dist-info-metadata (PEP 658) for faster resolves May 15, 2022
@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented May 15, 2022

It is standard-backed behaviour, which should be implemented in a way that makes things work.

Instead, it should be done in a way that works by default, and falls back gracefully if the server does not provide this information.

This totally makes sense! Luckily the implementation from @McSinyx was so nice that it was very easy to incorporate this metadata-only resolution without needing to activate --use-feature=fast-deps -- take a look at b36ae5a!

@cosmicexplorer cosmicexplorer force-pushed the pep-658-fast-deps branch 2 times, most recently from e43e167 to b36ae5a Compare May 15, 2022 16:34
@cosmicexplorer cosmicexplorer changed the title use data-dist-info-metadata (PEP 658) for faster resolves use data-dist-info-metadata (PEP 658) to decouple resolution from downloading May 15, 2022
@cosmicexplorer
Copy link
Contributor Author

@pradyunsg:

While I've been a bit busy getting the 21.1 release out, I'm looking forward to getting this PEP landed, hopefully in time for the next release cycle in a couple of months. ^.^

Thanks so much for the timeline!!! I really appreciate these updates. There's no rush on my end, so please just come back to this as you have time. For my part I'll work to make this mergeable so you can hopefully breeze through it.

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Jun 1, 2022
@sbidoul sbidoul added PEP implementation Involves some PEP type: enhancement Improvements to functionality labels Jun 26, 2022
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation and tests LGTM. One minor comment. :)

Also, could you rebase this? :)

src/pip/_internal/operations/prepare.py Outdated Show resolved Hide resolved
@pradyunsg pradyunsg added this to the 22.3 milestone Aug 15, 2022
@pfmoore
Copy link
Member

pfmoore commented Aug 15, 2022

Also, could you rebase this?

You should be aware that we have an implementation of PEP 691 now. It's possible that the merge conflicts might not indicate that you might need some parallel logic for the case of a JSON format index.

Also, while it will be great to have this in pip, PyPI still doesn't actually serve any projects with data-dist-info-metadata set. It looks like that's blocked on pypi/warehouse#8254.

@pradyunsg
Copy link
Member

Yea, I'd prefer to take the same approach as PEP 591 -- we implemented the data-yanked handling in pip quite a while before PyPI implemented it on the index server side. :)

@pradyunsg
Copy link
Member

FWIW, I'm also likely going to touch this code as a part of #7049 (which in turn, blocks #6607). I do think it makes sense to defer making my changes until we can land this tho. :)

@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Sep 5, 2022
@pfmoore
Copy link
Member

pfmoore commented Sep 5, 2022

Please let me know if that looks like a nice way to integrate the PEP 691 work.

I've only had time to skim the code, but this looks like a very reasonable refactoring, yes.

@cosmicexplorer
Copy link
Contributor Author

@pfmoore: ah, I just reviewed PEP 691 and saw that it also incorporates the dist-info-metadata attribute. In 8da1bdc I ensure that Link.from_json() uses the same code path for its dist-info-metadata attribute that we do for data-dist-info-metadata in Link.from_element(). I'll add a test for that presently.

@cosmicexplorer
Copy link
Contributor Author

Hm, I'm seeing that there aren't already test cases for PEP 691 unless I'm misreading 6f167b5, so I'm going to do that now but put it in a separate PR to avoid inflating this one further.

@cosmicexplorer
Copy link
Contributor Author

Oh nvm, I see it as test_parse_links_json()! Will stop pinging this thread and use that.

@cosmicexplorer
Copy link
Contributor Author

Ok, this should be ready to review again! cc @uranusjr

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment on the (internal) interface. Logic looks fine to me.

src/pip/_internal/index/collector.py Outdated Show resolved Hide resolved
src/pip/_internal/metadata/importlib/_dists.py Outdated Show resolved Hide resolved
src/pip/_internal/exceptions.py Outdated Show resolved Hide resolved
src/pip/_internal/metadata/importlib/_dists.py Outdated Show resolved Hide resolved
@pradyunsg pradyunsg changed the title use data-dist-info-metadata (PEP 658) to decouple resolution from downloading Use data-dist-info-metadata (PEP 658) to decouple resolution from downloading Sep 10, 2022
@pradyunsg pradyunsg merged commit bad03ef into pypa:main Sep 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
PEP implementation Involves some PEP type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants