Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post release source dist gets grouped with stable release wheel #15749

Closed
konstin opened this issue Apr 10, 2024 · 11 comments · Fixed by #15795
Closed

Post release source dist gets grouped with stable release wheel #15749

konstin opened this issue Apr 10, 2024 · 11 comments · Fixed by #15795
Labels
blocked Issues we can't or shouldn't get to yet help needed We'd love volunteers to advise on or help fix/implement this.

Comments

@konstin
Copy link
Contributor

konstin commented Apr 10, 2024

Describe the bug

The docutils 0.21 has two files (https://inspector.pypi.io/project/docutils/0.21/, https://pypi.org/project/docutils/0.21/#files):

  • docutils-0.21.post1.tar.gz
  • docutils-0.21-py3-none-any.whl

The source distributions is part of the 0.21 release, while it's actually of version 0.21.post1. Pip agrees that this version is 0.21.post1:

$  virtualenv -p 3.12 --clear -q .venv
$  . .venv/bin/activate
(.venv) $  pip install --no-binary docutils docutils==0.21
ERROR: Could not find a version that satisfies the requirement docutils==0.21 (from versions: 0.3, 0.3.5, 0.3.7, 0.3.9, 0.4, 0.5, 0.6, 0.7, 0.8, 0.8.1, 0.9, 0.9.1, 0.10, 0.11, 0.12, 0.13.1, 0.14rc1, 0.14rc2, 0.14, 0.15, 0.15.1.post1, 0.15.2, 0.16b0.dev0, 0.16rc1, 0.16, 0.17b1, 0.17, 0.17.1b1.dev0, 0.17.1, 0.18b1, 0.18, 0.18.1b0, 0.18.1, 0.19b1, 0.19, 0.20rc1, 0.20, 0.20.1, 0.21rc1, 0.21.post1)
ERROR: No matching distribution found for docutils==0.21
(.venv) $ pip install docutils==0.21
Collecting docutils==0.21
  Downloading docutils-0.21-py3-none-any.whl.metadata (2.7 kB)
Downloading docutils-0.21-py3-none-any.whl (587 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 587.4/587.4 kB 4.6 MB/s eta 0:00:00
Installing collected packages: docutils
Successfully installed docutils-0.21

Under the right circumstances, this can break uv too.

Expected behavior

Either docutils-0.21.post1.tar.gz has its own version 0.21.post1, or source dists with this naming scheme are rejected by pypi.

To Reproduce
See description

My Platform
Ubuntu 22.04, Python 3.12

Additional context
n/a

@konstin konstin added bug 🐛 requires triaging maintainers need to do initial inspection of issue labels Apr 10, 2024
@di
Copy link
Member

di commented Apr 10, 2024

Looks like we have a few of these:

warehouse=> SELECT count(rf.filename)
FROM release_files rf
JOIN releases r ON rf.release_id = r.id
JOIN projects p ON r.project_id = p.id
WHERE rf.filename ILIKE '%post%'
AND p.name NOT ILIKE '%post%'
AND r.version NOT ILIKE '%post%';
 count
-------
   674
(1 row)

They don't happen very often but have been happening relatively consistently:

warehouse=> SELECT DATE_TRUNC('year', rf.upload_time) AS upload_year,
       COUNT(*) AS file_count_per_year
FROM release_files rf
JOIN releases r ON rf.release_id = r.id
JOIN projects p ON r.project_id = p.id
WHERE rf.filename ILIKE '%post%'
AND p.name NOT ILIKE '%post%'
AND r.version NOT ILIKE '%post%'
GROUP BY upload_year
ORDER BY upload_year;
     upload_year     | file_count_per_year
---------------------+---------------------
 2009-01-01 00:00:00 |                   1
 2010-01-01 00:00:00 |                   2
 2015-01-01 00:00:00 |                  52
 2016-01-01 00:00:00 |                  24
 2017-01-01 00:00:00 |                  95
 2018-01-01 00:00:00 |                 115
 2019-01-01 00:00:00 |                  96
 2020-01-01 00:00:00 |                 139
 2021-01-01 00:00:00 |                 111
 2022-01-01 00:00:00 |                  20
 2023-01-01 00:00:00 |                  17
 2024-01-01 00:00:00 |                   2
(12 rows)

We now take the version from the metadata for the file:

canonical_version = packaging.utils.canonicalize_version(meta.version)

If we look at the artifact in question, we see that the version in the metadata is wrong: https://inspector.pypi.io/project/docutils/0.21/packages/67/9a/ff2ff8e922f3b97c4b4864ca6c78d76ca5969bd730560001167b7054ac48/docutils-0.21.post1.tar.gz/docutils-0.21/PKG-INFO#line.3

So the issue is that we aren't rejecting files whose filename versions disagree with their internal metadata, which we can't do until we only accept normalized sdist filenames (PEP 625) because otherwise the filenames can be ambiguous.

Given that, I think this is a duplicate of #12245, albeit with a nice real-world example of this happening.

@dimbleby
Copy link

#12245 is about insisting that all sdists are pep-625 compliant, and looks semi-permanently blocked on the fear of breaking users who are uploading non-compliant sdists.

but it would still have value today to write code that asked: if pep-625 compliant, does the version (and name) match the metadata?

@konstin
Copy link
Contributor Author

konstin commented Apr 11, 2024

Is it possible to check after you know the package name, or is that what #12245 is about? Something like this:

from packaging.version import Version, InvalidVersion


def validate(name: str, version: Version, filename: str) -> bool:
    after_name = filename[len(name):].removesuffix(".tar.gz").removesuffix(".zip")
    if not after_name.startswith("-"):
        return False
    file_version = after_name[1:]
    try:
        file_version = Version(file_version)
    except InvalidVersion:
        return False
    return file_version == version


if __name__ == '__main__':
    print(validate("docutils", Version("0.21"), "docutils-0.21.tar.gz"))
    print(validate("docutils", Version("0.21"), "docutils-0.21.post1.tar.gz"))
    # Not sensitive to name normalization ...
    print(validate("flit_core", Version("0.1"), "flit-core-0.1.tar.gz"))
    print(validate("flit_core", Version("0.1"), "flit_core-0.1.tar.gz"))
    print(validate("flit_core", Version("0.1"), "flit.core-0.1.tar.gz"))
    # ... or to version normalization
    print(validate("flit_core", Version("0.1a1"), "flit-core-0.1.alpha1.tar.gz"))
    print(validate("flit_core", Version("0.1a1"), "flit_core-0.1alpha1.tar.gz"))
    print(validate("flit_core", Version("0.1a1"), "flit.core-0.1a1.tar.gz"))

This is basically what we do in uv currently (https://github.com/astral-sh/uv/blob/8d721830db8ad75b8b7ef38edc0e346696c52e3d/crates/distribution-filename/src/source_dist.rs#L73-L157)

@dimbleby
Copy link

dimbleby commented Apr 11, 2024

just splitting on the last hyphen should be good enough to extract the version, even in the face of non-PEP625 distributions

edit to expand on this a bit. Here is what PEP625 said (a few years ago but changes since then are likely to have been in the right direction):

a review of the filenames on PyPI determined that 37% of files are obviously legacy (because they contain multiple or no hyphens) and of the remainder, parsing according to this PEP gives the correct answer in all but 0.004% of cases.

so two possible approaches that are not blocked on #12245

  • no new check for new uploads that fall in the "multiple or no hyphens" category, but check correctness per PEP625 when seeing one hyphen. ie reject new uploads that are contributing to the 0.004%

  • or, also make checks for the "multiple hyphens" category, requiring that the part after the last hyphen be the version

    • packaging.utils.parse_sdist_filename() works this way which I offer as evidence that there can be very little out there in the wild where this goes wrong, but this approach of course rejects strictly more uploads than the first

@dimbleby
Copy link

dimbleby commented Apr 11, 2024

@konstin it occurs to me that the code you posted might be wrong because normalization eg of foo__bar would result in canonical name foo-bar and sdist name foo_bar-1.0.0.tar.gz. That is, length is not preserved by normalization.

It is not clear to me whether you are comparing normalized things throughout, I did not check whether this is a problem in uv.

@konstin
Copy link
Contributor Author

konstin commented Apr 11, 2024

You are correct, the name input should be the normalized name. I think our parsing in uv (which uses the normalized name) works in practice because we don't have source dists with more than one consecutive [-_.], at least i haven't seen any.

@miketheman miketheman added help needed We'd love volunteers to advise on or help fix/implement this. requires triaging maintainers need to do initial inspection of issue blocked Issues we can't or shouldn't get to yet bug 🐛 and removed requires triaging maintainers need to do initial inspection of issue bug 🐛 labels Apr 12, 2024
@miketheman
Copy link
Member

Looks like the docutils maintainers has recognized the issue, and have corrected it in 0.21.1
Refs: https://sourceforge.net/p/docutils/bugs/483/#f648

Another command that can show the actual rejection message:

$ pip install --no-binary docutils "docutils>=0.21,<0.21.1"
Collecting docutils<0.21.1,>=0.21
  Using cached docutils-0.21.post1.tar.gz (2.2 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Discarding https://files.pythonhosted.org/packages/67/9a/ff2ff8e922f3b97c4b4864ca6c78d76ca5969bd730560001167b7054ac48/docutils-0.21.post1.tar.gz (from https://pypi.org/simple/docutils/) (requires-python:>=3.9): Requested docutils<0.21.1,>=0.21 from https://files.pythonhosted.org/packages/67/9a/ff2ff8e922f3b97c4b4864ca6c78d76ca5969bd730560001167b7054ac48/docutils-0.21.post1.tar.gz has inconsistent version: expected '0.21.post1', but metadata has '0.21'
ERROR: Could not find a version that satisfies the requirement docutils<0.21.1,>=0.21 (from versions: 0.3, 0.3.5, 0.3.7, 0.3.9, 0.4, 0.5, 0.6, 0.7, 0.8, 0.8.1, 0.9, 0.9.1, 0.10, 0.11, 0.12, 0.13.1, 0.14rc1, 0.14rc2, 0.14, 0.15, 0.15.1.post1, 0.15.2, 0.16b0.dev0, 0.16rc1, 0.16, 0.17b1, 0.17, 0.17.1b1.dev0, 0.17.1, 0.18b1, 0.18, 0.18.1b0, 0.18.1, 0.19b1, 0.19, 0.20rc1, 0.20, 0.20.1, 0.21rc1, 0.21.post1, 0.21.1)
ERROR: No matching distribution found for docutils<0.21.1,>=0.21

Looks like PEP625 implementation is the answer here.

@miketheman miketheman removed bug 🐛 requires triaging maintainers need to do initial inspection of issue labels Apr 12, 2024
@dimbleby
Copy link

Looks like PEP625 implementation is the answer here.

just want to reiterate that real improvement could be made first, without being blocked on full PEP625

@miketheman
Copy link
Member

just want to reiterate that real improvement could be made first, without being blocked on full PEP625

@dimbleby please send a PR for review, happy to take a look.

@dimbleby
Copy link

can you point at roughly where such a check should be made? thanks

@miketheman
Copy link
Member

@dimbleby Per the dev docs, you might want to look into forklift to start: https://warehouse.pypa.io/application.html#file-and-directory-structure

Fokko added a commit to apache/iceberg-python that referenced this issue Apr 18, 2024
This release is bodged, and causes Poetry to fail when it tries
to fetch the tar:

python-poetry/poetry#9293 (comment)

It is being tracked:
pypi/warehouse#15749

A fix is inbound here:
pypi/warehouse#15795

This is a hotfix for this particular package, until the fixes from `pypi/warehouse` are out. The `post` notation is very unique: pypi/warehouse#15749 (comment) and is being used in 0.004% of the releases.
Fokko added a commit to apache/iceberg-python that referenced this issue Apr 18, 2024
This release is bodged, and causes Poetry to fail when it tries
to fetch the tar:

python-poetry/poetry#9293 (comment)

It is being tracked:
pypi/warehouse#15749

A fix is inbound here:
pypi/warehouse#15795

This is a hotfix for this particular package, until the fixes from `pypi/warehouse` are out. The `post` notation is very unique: pypi/warehouse#15749 (comment) and is being used in 0.004% of the releases.
Fokko added a commit to Fokko/iceberg-python that referenced this issue Apr 18, 2024
This release is bodged, and causes Poetry to fail when it tries
to fetch the tar:

python-poetry/poetry#9293 (comment)

It is being tracked:
pypi/warehouse#15749

A fix is inbound here:
pypi/warehouse#15795

This is a hotfix for this particular package, until the fixes from `pypi/warehouse` are out. The `post` notation is very unique: pypi/warehouse#15749 (comment) and is being used in 0.004% of the releases.
HonahX pushed a commit to HonahX/iceberg-python that referenced this issue Apr 18, 2024
This release is bodged, and causes Poetry to fail when it tries
to fetch the tar:

python-poetry/poetry#9293 (comment)

It is being tracked:
pypi/warehouse#15749

A fix is inbound here:
pypi/warehouse#15795

This is a hotfix for this particular package, until the fixes from `pypi/warehouse` are out. The `post` notation is very unique: pypi/warehouse#15749 (comment) and is being used in 0.004% of the releases.
HonahX pushed a commit to apache/iceberg-python that referenced this issue Apr 18, 2024
This release is bodged, and causes Poetry to fail when it tries
to fetch the tar:

python-poetry/poetry#9293 (comment)

It is being tracked:
pypi/warehouse#15749

A fix is inbound here:
pypi/warehouse#15795

This is a hotfix for this particular package, until the fixes from `pypi/warehouse` are out. The `post` notation is very unique: pypi/warehouse#15749 (comment) and is being used in 0.004% of the releases.
@di di closed this as completed in #15795 Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Issues we can't or shouldn't get to yet help needed We'd love volunteers to advise on or help fix/implement this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants