Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: Warehouse's handling and validation of distribution filenames #12316

Open
woodruffw opened this issue Oct 4, 2022 · 2 comments
Open

Meta: Warehouse's handling and validation of distribution filenames #12316

woodruffw opened this issue Oct 4, 2022 · 2 comments
Labels
meta Meta issues (rollouts, etc)

Comments

@woodruffw
Copy link
Member

This is a meta-issue, filed to track multiple independent problems and potential solutions to Warehouse's handling of distribution filenames (i.e., sdist and wheel filenames). I'm going to attempt to index all of them, but I'll almost certainly miss one or more.

Background material

Key PEPs and PyPA standards:

  • PEP 427 defines the wheel distribution format, including the wheel filename format. PEP 427 is unfortunately internally inconsistent about distribution name normalization, as mentioned in this comment.
  • PEP 625 is the most recent sdist filename PEP. It punts to PEP 427 for distribution name normalization, meaning that it carries some of the same ambiguity.
  • PyPA's Binary Distribution Format Spec is the living standard copy of PEP 427. It eliminates the ambiguity in the original PEP, making it clear that the normalization only applies to the distribution name and is strictly equivalent to PEP 503 normalization, followed by replacing - with _.

Key discussions:

Outstanding issues and PRs:

Outstanding issues

Warehouse does not support normalized namespace package names

Per both the discuss thread and #10030: namespace packages are commonly denoted as package.foo, which gets normalized to package-foo (PEP 503) and package_foo (wheel-style distribution name).

As such, Warehouse should accept wheels and sdists that start with package_foo for the package.foo package. But it currently doesn't, and complains about a mismatched prefix instead.

The relevant code:

# Make sure that our filename matches the project that it is being uploaded
# to.
prefix = pkg_resources.safe_name(project.name).lower()
if not pkg_resources.safe_name(filename).lower().startswith(prefix):
raise _exc_with_message(
HTTPBadRequest,
"Start filename for {!r} with {!r}.".format(project.name, prefix),
)

Warehouse accepts invalid wheel filenames

Separately, Warehouse's current wheel filename validation is probably overly permissive.

This happens in a few different places:

  • _is_valid_dist_file fails open rather than closed. In particular, anything that ends with .whl and contains a WHEEL file is treated as valid, even if it does not have all of the PyPA/PEP 427 required filename components.

  • Extended wheel filename validation uses a regular expression, but doesn't actually check all parts of the resulting match:

    # Check that if it's a binary wheel, it's on a supported platform
    if filename.endswith(".whl"):
    wheel_info = _wheel_file_re.match(filename)
    plats = wheel_info.group("plat").split(".")
    for plat in plats:
    if not _valid_platform_tag(plat):
    raise _exc_with_message(
    HTTPBadRequest,
    "Binary wheel '{filename}' has an unsupported "
    "platform tag '{plat}'.".format(filename=filename, plat=plat),
    )

    In particular, the build, pyver, and abi components are never checked, meaning that they might be missing entirely.

    As a result, there is at least one invalid wheel filename (pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl) already present on PyPI, with correspondingly invalid metadata available via the JSON API (note the incorrect python_version field):

        {
          "comment_text": "",
          "digests": {
            "md5": "d8a9fddd534dc56bfad1343c0f4d0cec",
            "sha256": "962c2d87ee264cfedace8cd1186efe6d898095b74783e9bdba356d15ccd91f64"
          },
          "downloads": -1,
          "filename": "pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl",
          "has_sig": false,
          "md5_digest": "d8a9fddd534dc56bfad1343c0f4d0cec",
          "packagetype": "bdist_wheel",
          "python_version": "2.0.5",
          "requires_python": null,
          "size": 11052093,
          "upload_time": "2021-05-15T16:29:06",
          "upload_time_iso_8601": "2021-05-15T16:29:06.646801Z",
          "url": "https://files.pythonhosted.org/packages/56/2c/e25e4322c12a75e9f478106b8919c0a011b28edb32171fa21ebe14513022/pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl",
          "yanked": false,
          "yanked_reason": null
        },
@woodruffw
Copy link
Member Author

So, to summarize: there are some distribution filenames that PyPI incorrectly accepts, and some other filenames that PyPI incorrectly rejects, all modulo the current PEPs and living PyPA specifications.

Separately, there's a whole rats' nest of presentation issues and ambiguity between package names, distribution names, etc. I think these are mostly separate from the question of distribution filename acceptance and validation, but they'll be important to consider as well.

@woodruffw
Copy link
Member Author

xref #12245 as the PEP 625 tracking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Meta issues (rollouts, etc)
Projects
None yet
Development

No branches or pull requests

2 participants