Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPi metadata database #2912

Closed
rth opened this issue Feb 10, 2018 · 5 comments
Closed

PyPi metadata database #2912

rth opened this issue Feb 10, 2018 · 5 comments

Comments

@rth
Copy link

rth commented Feb 10, 2018

Hello,

I'm trying to build a PyPi dependency graph and was wondering if the package metadata was available somewhere.

It can be accessed via the XML-RPC service with,

import xmlrpc.client

client = xmlrpc.client.ServerProxy('https://pypi.python.org/pypi')

package_name = "some_package"

releases = client.package_releases(package_name)
metadata = client.release_data(package_name, releases[0])  # take the latest release

but that would require two API calls per package so this doesn't scale.

Are there any plans e.g. to export the metadata to Google BigQuery as it's done for download statistics? (Possibly after removing the author_name, author_email for privacy reasons and the description field to keep the size manageable..)

If the XML-RPC is the only way to get the metadata in the foreseeable future, what would be a reasonable rate limit? The documentation doesn't say anything about it. Thanks.

@di
Copy link
Member

di commented Feb 10, 2018

Hi @rth,

Are there any plans e.g. to export the metadata to Google BigQuery as it's done for download statistics?

I don't think there are any plans to do this at the moment (see below). I'll leave this issue open as a feature request.

If the XML-RPC is the only way to get the metadata in the foreseeable future, what would be a reasonable rate limit?

There is also a JSON API, e.g.:

>>> import requests
>>> resp = requests.get('https://pypi.org/pypi/twine/json').json()
>>> resp['info']['requires_dist']
["keyring; extra == 'keyring'", "pyblake2; extra == 'with-blake2' and python_version < '3.6'", "argparse; python_version == '2.6'", 'setuptools >= 0.7.0', 'pkginfo >= 1.0', 'requests-toolbelt >= 0.8.0', 'requests >= 2.5.0, != 2.15, != 2.16', 'tqdm >= 4.11']

However, for a number of reasons, this is not guaranteed to actually contain the dependencies of a package. See pypi/legacy#622 and #474.

As far as a rate limit goes, if you're using pypi.org, these endpoints are not rate-limited. I can't really speak for pypi.python.org.

The documentation doesn't say anything about it. Thanks.

I've added #2913 to address the need for additional documentation, thanks for the report!

@rth
Copy link
Author

rth commented Feb 10, 2018

Thanks a lot for these explanations @di !

@brainwane
Copy link
Contributor

Thank you for the suggestion! As I believe you know, on our development roadmap the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. Since this feature isn't something that's on that immediate path, I've moved it to a future milestone.

Thanks and sorry for the wait.

@rth
Copy link
Author

rth commented Mar 3, 2018

Thanks @brainwane !

Actually, I will close this because the existence of the JSON API addresses my initial question. It was documented in #2913 that was fixed while #474 tracks the situation with missing dependencies.

@pradyunsg
Copy link
Contributor

pradyunsg commented May 6, 2021

Folks reading this would also be interesting in #8254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants