Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong key name used for PEP 658 metadata files in the JSON index #13705

Closed
chriskuehl opened this issue May 20, 2023 · 12 comments · Fixed by #13706
Closed

Wrong key name used for PEP 658 metadata files in the JSON index #13705

chriskuehl opened this issue May 20, 2023 · 12 comments · Fixed by #13706
Labels

Comments

@chriskuehl
Copy link

Describe the bug

PEP 691 states that the key name for metadata files in the JSON index should be dist-info-metadata:

dist-info-metadata: An optional key that indicates that metadata for this file is available, via the same location as specified in PEP 658 ({file_url}.metadata).

However, warehouse is providing it under the data-dist-info-metadata key instead:

$ curl -H 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/fluffy-server/ | jq .files
[...]
  {
    "data-dist-info-metadata": {
      "sha256": "4db99543165cbdeef42ccb6257545911ccd7865d65e304e3e056f383a25f309c"
    },
    "filename": "fluffy_server-1.39.2-py3-none-any.whl",
    [...]

This is causing pip to not use the metadata files as it is looking for the dist-info-metadata key only:
https://github.com/pypa/pip/blob/f25f8fffbbd16fdb13a4f8977946afe9a3248453/src/pip/_internal/models/link.py#L265

Additional context

There are two bugs discovered recently in pip which may make this tricky to fix:

I believe if we simply fix the key name in pypi.org, it will break existing pip versions as it will cause users to encounter these bugs. It may be necessary to coordinate this fix with fixes to the above bugs in pip to avoid disruption?

@ewdurbin
Copy link
Member

Are there versions of pip currently using the incorrect name? This was just shipped last week: #13649

@chriskuehl
Copy link
Author

I don't think any versions of pip are using the incorrect name. pypa/pip@bad03ef#diff-3919ca53335487395177edeffff8b60bd360f1ad18c95593287f6814b9ecefb8R249 is the commit which first introduced the support in pip, and it is using dist-info-metadata already. (It does reference data-dist-info-metadata but that is for the PEP 658 HTML-based index and not the JSON index.)

In my testing I couldn't get pip to use the metadata files at all with pypi.org when using the JSON index. My guess is that the current key is not being widely used at the moment since if it was, other people would probably have encountered the pip bugs I referenced above before I did.

@ewdurbin
Copy link
Member

ewdurbin commented May 20, 2023

Ah, I understand.

For what its worth, pip is not the only consumer of PEP 658 data via the JSON api.

I suppose we will need to coordinate with the pip team though. @pfmoore do you have thoughts on how you'd anticipate such coordination happening?

@pfmoore
Copy link
Contributor

pfmoore commented May 21, 2023

See the discussion in pypa/pip#12042

Specifically, as dist-info-metadata is optional, I think it should be omitted unless there's an actual file (rather than including it with a value false). And the backfill to existing wheels shouldn't be started until we have a fixed pip released, and people have a chance to upgrade.

@ewdurbin
Copy link
Member

Hm, that's an odd split.

Would it make sense to just stop serving the key altogether until pip is able to handle the spec "Where this is present, it MUST be either a boolean to indicate if the file has an associated metadata file, or a dictionary mapping hash names to a hex encoded digest of the metadata’s hash."

@pfmoore
Copy link
Contributor

pfmoore commented May 21, 2023

I don't think it's that odd - the status quo is no key, indicating the legacy situation (technically "no indication if the metadata file is present", but in practice pip will assume it isn't as it's not an optimisation if we have to check) and if the key is present, that means the index has explicitly extracted (or not) the metadata. The fact that there's never a case "dist-info-metadata": false simply reflects the fact that Warehouse never doesn't extract the metadata if it's processed the wheel at all.

Honestly, until I read the spec just before posting here, I hadn't even realised that "dist-info-metadata": false was permitted. And I'm still not sure what value it has to a client over an omitted key.

But I guess I don't really mind if you stop serving the key for now, I'm just not sure how that helps you determine when it's OK to re-introduce it. We're talking about doing an emergency pip release to fix the parsing issue, so there should be a fix available pretty soon, but I've no idea how long it will be before we can safely assume "enough" users will have upgraded. The advantage of omitting for files that haven't been backfilled is that fewer pip users will be impacted immediately, so the message might get out before everything grinds to a halt. I'll be honest, I don't have any feel for how much of an improvement it would be, though.

@ewdurbin
Copy link
Member

I'll wait to hear from pip maintainers before merging #13706.

Honestly, the spec is really easy to implement as-is in warehouse with the False value. I'd rather keep it as is.

@pfmoore
Copy link
Contributor

pfmoore commented May 21, 2023

That sounds fair enough. But I suggest we have a plan in place to do a fast revert just in case it turns out we misjudged when pip users were going to have had enough time to pick up the fixed version 🙁 That may be straightforward, in which case ignore me - I don't know much about how PyPI releases work.

@ewdurbin
Copy link
Member

worst case, rollback is ~15min and is a git revert away.

@hugovk
Copy link
Contributor

hugovk commented May 22, 2023

We're talking about doing an emergency pip release to fix the parsing issue, so there should be a fix available pretty soon, but I've no idea how long it will be before we can safely assume "enough" users will have upgraded.

The pip download stats at PePy suggest the majority of people upgrade to the latest version quickly:

image

For a recent day, of the 5.46m downloads for >= v22, 4.18m (77%) for the latest 23.1.2, and 1.28m for the rest (23%):

image

@dstufft
Copy link
Member

dstufft commented Jun 7, 2023

@di di removed the requires triaging maintainers need to do initial inspection of issue label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants