Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a Custom Validation Policy for Metadata #732

Open
dstufft opened this issue Oct 8, 2023 · 1 comment
Open

Support a Custom Validation Policy for Metadata #732

dstufft opened this issue Oct 8, 2023 · 1 comment

Comments

@dstufft
Copy link
Member

dstufft commented Oct 8, 2023

I'm working on porting Warehouse over to using packaging.metadata to drive our validation, and as I'm teasing apart the existing code it's become clear to me that our metadata validation in Warehouse falls into two categories:

  • Validation that is required by the core metadata spec for the metadata to be valid at all.
  • Validation that Warehouse adds on top of the core metadata spec for various application specific policies.

I'm currently implementing this in Warehouse by basically doing this:

from packaging.metadata import Metadata, InvalidMetadata

def parse_metadata(data: bytes) -> Metadata:
    metadata = Metadata.from_email(data)

    errors: list[InvalidMetadata] = []

    # Additional checks

    if errors:
        raise ExceptionGroup("invalid metadata", errors)

    return metadata

This of course works, but it has a few negatives:

  • The metadata validation natively uses ExceptionGroup to collect all of the validation errors and raising them all at once, however there's not a clear and easy way to do that with our additional validation, so we don't see the extra validation errors if any "built in" validation errors happen.
  • We went to great lengths to support on demand validation of fields, allowing applications to have partial validation of only the data that they personally care about. However, implementing on demand "extra" validation would require either a wrapper around Metadata that dispatches to the underlying Metadata, fetches the value, then adds the extra validation.

An idea that popped into my mind is allowing passing in a "validation policy" of some sort, that would be used to inform the Metadata class of these additional validation checks, and have them handled inline with all of the other validation.

A policy could look something like:

import readme

class MetadataValidationPolicy:
    # TODO: These returns an InvalidMetadata, mostly to keep it similar to full_validate, however
    #            it's probably more ergonomic/expected to raise it? Maybe even full_vallidate should
    #            raise and when the metadata class calls it, it supports either raising an InvalidMetadata
    #            or a ExceptionGroup containing InvalidMetadata.

    def validate_metadata_version(self, metadata_version: str) -> InvalidMetadata | None:
        # PyPI currently only supports *some* Metadata versions, but not all of them, and even if
        # we did support all current ones, new ones require integration work so we don't want to
        # start accepting them implicitly.
        if metadata_version not in {"1.0", "1.1", "1.2", "2.0", "2.1"}:
            return InvalidMetadata("metadata-version", f"{metadata_version!r} is an unsupported metadata version")

    def validate_version(self, version: Version) -> InvalidMetadata | None:
        if version.local:
            return InvalidMetadata("version", f"The use of local versions in {version!r} is not allowed.")

    def full_validate(self, metadata: Metadata) -> list[InvalidMetadata]:
        errors: list[InvalidMetadata] = []

        # This is sort of a bad example, because we wouldn't put this in here because we wouldn't
        # want to throw away the rendered value, but just as an example.
        if metadata.description:
            description_content_type = metadata.description_content_type or "text/x-rst"
            rendered = readme.render(metadata.description, description_content_type, use_fallback=False)
            if rendered is None:
                errors.append(InvalidMetadata("description", f"Failed to render the description using {description_content_type!r}"))

        return errors

Given a policy, the question then becomes how would we inform the Metadata class about it.

The easiest mechanism for doing so I think would be as an additional keyword parameter to the Metadata.from_* methods, so something like:

metadata = Metadata.from_email(b"...", validation_policy=MetadataValidationPolicy())
metadata.metadata_version

We don't currently support creating a Metadata instance dynamically other than through the from_* methods, but you could imagine if we did then we could still use a keyword argument to handle that:

metadata = Metadata(..., validation_policy=MetadataValidationPolicy())
metadata.metadata_version

Of course that would mean we can't ever have an core metadata field named validation_policy that we want to set through the constructor, but I think it's unlikely that we ever do (and of course if we are worried about it there are other tricks like using _validation_policy or something.

We could even support subclassing Metadata and setting a subclass wide validation policy, if a project wants that rather than manually passing it in to each invocation, using something like:

from packaging.metadata import Metadata as _Metadata


class Metadata(_Metadata, validation_policy=MetadataValidationPolicy):
    pass

Of course, supporting this use case does add a non trivial amount of additional complexity to packaging.metadata, and since these validations are purely additive it is entirely possible to implement this outside of the metadata library with relatively minor downsides, so I think it would be perfectly reasonable to declare this out of scope. I wanted to raise it as an idea though, as it kind of surfaced during my work of integrating packaging.metadata into Warehouse.

@brettcannon
Copy link
Member

It's at least out of scope of my free time. 😉 But I don't inherently object to this idea either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants