Skip to content

Public Dataset for distribution metadata #7403

@di

Description

@di

As requested in pypa/packaging-problems#323, we should explore publishing the metadata for each released distribution in a public dataset via BigQuery.

I'm imagining that each row would contain all the core metadata fields included in each release, as well as filename, digests, file size, upload time, URL to the distribution, etc. Essentially everything in the "Release" JSON API, with the per-release info field included for every individual distribution.

Once we're publishing to the dataset on upload, we'd also need to backfill prior distributions as well.

Not entirely sure what we'd name it, does the-psf:pypi.distributions make sense?

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions