As requested in pypa/packaging-problems#323, we should explore publishing the metadata for each released distribution in a public dataset via BigQuery.
I'm imagining that each row would contain all the core metadata fields included in each release, as well as filename, digests, file size, upload time, URL to the distribution, etc. Essentially everything in the "Release" JSON API, with the per-release info field included for every individual distribution.
Once we're publishing to the dataset on upload, we'd also need to backfill prior distributions as well.
Not entirely sure what we'd name it, does the-psf:pypi.distributions make sense?
As requested in pypa/packaging-problems#323, we should explore publishing the metadata for each released distribution in a public dataset via BigQuery.
I'm imagining that each row would contain all the core metadata fields included in each release, as well as filename, digests, file size, upload time, URL to the distribution, etc. Essentially everything in the "Release" JSON API, with the per-release
infofield included for every individual distribution.Once we're publishing to the dataset on upload, we'd also need to backfill prior distributions as well.
Not entirely sure what we'd name it, does
the-psf:pypi.distributionsmake sense?