API to get dependencies without full download #474

guettli · 2015-04-08T08:03:43Z

Here the issue to the discussion on python-distutils: http://code.activestate.com/lists/python-distutils-sig/25409/

To get a dependency resolver for python, there needs to be a way to get the dependencies of a package. To avoid useless network traffic the dependencies of a packages ("install_requires" in setup.py) need to be accessible via an API.

domenkozar · 2015-04-12T00:36:30Z

This is going to be possible once PEP426 is in place.

guettli · 2015-04-13T06:25:47Z

Off topic: How does PEP426 get developed? It is soon three years old. What can I do to get it implemented?

domenkozar · 2015-04-13T13:55:55Z

Finish the draft, here are some issues: https://bitbucket.org/pypa/pypi-metadata-formats/issues?status=new&status=open&component=Metadata%202.x (note that issues are in bitbucket but the code is now in github)
Implement PEP 426 for pip

nealmcb · 2017-04-13T17:24:37Z

It looks like https://github.com/python/peps/blob/master/pep-0426.txt is the current PEP 426 draft, right?
@domenkozar, I don't see anything that looks like a current issue at that bitbucket link. Have they been addressed?
Any other current issues?
Thanks :)

nealmcb · 2017-04-13T17:53:04Z

Is there anything currently available and up-to-date that's better than downloading all the metadata as json via the pypi-data app, and doing

jq '{name: .info.name, requires: .info.requires_dist}'  */* > requires.json

I note some issues with pypi-data at nathforge/pypi-data#2

aaron-prindle · 2017-11-21T23:35:26Z

Is there any update on this? Wondering if it is now possible to get the list of a package's dependencies without a full download of the package.

dstufft · 2017-11-22T06:31:28Z

It is not in the general case, because of limitations in the packaging formats.

rth · 2018-02-11T15:01:53Z

From quickly analyzing the package metadata using the JSON API, it looks like out of ~120k packages in the PyPi index, only ~17k have a non null info->requires_dist field. While some packages don't indeed have any dependencies, I imagine most do. Which means that currently this field cannot be relied upon for dependency resolution.

I saw that PEP 426 has a deferred status and was wondering if there were some open issues that aimed to improve somewhat the situation with the requires_dist in the current system, without necessarily doing an in-depth redesign of the metatadata API discussed in PEP 426? Thanks.

brainwane · 2018-02-16T23:13:40Z

Thanks for bringing up and discussing this issue, and sorry for the slow response! (Context, for those who don't already know it: Warehouse needs to get to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. Towards that end, the folks working on Warehouse have gotten limited funding to concentrate on improving and deploying it, and have kicked off work towards our development roadmap. Along the way we've been able to reply to some of the older issues in the repo as well.)

Since this feature isn't something that the legacy site has, and we're prioritizing replacing the old site, I've moved it to a future milestone.

@ncoghlan am I right that @rth should be looking at pypa/packaging-problems#102 and pypa/packaging-problems#54?

Thanks and sorry again for the wait.

rth · 2018-02-17T00:10:39Z

Thanks for the detailed response @brainwane and for linking to those issues!

I know that there are higher priority issues with the migration to Warehouse (and thank you for working on that!), I just commented for future reference while experimenting with the PyPi JSON API...

brainwane · 2018-02-26T16:24:35Z

Glad to help, @rth.

For reference: PEP 426 has been withdrawn.

As you're experimenting with the JSON API, check out the other API/feeds issues in case any of them have questions you can answer! And if you have questions, please feel free to ask them here, on #pypa-dev on Freenode, or on the pypa-dev mailing list.

ncoghlan · 2018-02-26T21:06:11Z

Note that while PEP 426 (metadata 2.0) has been withdrawn, PEP 566 (metadata 2.1) has been accepted, and that includes a canonical conversion from the baseline key:value representation to a JSON compatible representation: https://www.python.org/dev/peps/pep-0566/#json-compatible-metadata

This means that at least for projects that upload wheel files, it will be feasible for Warehouse to extract and publish the corresponding dependency metadata in a standards-backed way (since the conversion rules can also be applied to metadata 1.2).

rth · 2018-02-26T21:09:42Z

PEP 566 (metadata 2.1) has been accepted,
This means that at least for projects that upload wheel files, it will be feasible for Warehouse to extract and publish the corresponding dependency metadata in a standards-backed way

That's really good news. Thank you for the explanations!

wimglenn · 2018-03-05T02:14:17Z

What determines the value of "requires_dist" given in the json api response? When I look at one of my uploads it's there, e.g. https://pypi.org/pypi/oyaml/0.2/json which correctly says requires_dist=[pyyaml]. But then on boto3 https://pypi.org/pypi/boto3/1.6.3/json it's got requires_dist=[] and that's not right, it should have botocore, jmespath, s3transfer..
Both of these projects are specifying the metadata the same way, by passing install_requires in the setup.py:setup kwargs. I heard somewhere that it's related to whether you upload a wheel or an sdist first, but this explanation doesn't make much sense to me..?

ncoghlan · 2018-03-05T07:00:56Z

Metadata extraction currently only happens for the first uploaded artifact, and unlike wheel archives, sdists aren't required to contain metadata in a format that an index server knows how to read.

Allowing subsequent wheel uploads to supplement the metadata extracted from an sdist would be a nice Warehouse enhancement (but is separate from this issue).

ncoghlan · 2018-03-06T23:59:45Z

After checking with @dstufft in relation to pypa/packaging.python.org#450, it seems recent versions of twine and setuptools should be uploading full project metadata regardless of the nature of the first uploaded artifact (sdist or wheel).

So the most likely cause of incomplete metadata now is the use of older upload clients (and older releases will be missing this data as well, since it needs to be generated client side and then delivered to PyPI as part of the release publication process).

wimglenn · 2018-03-07T00:06:52Z

@ncoghlan It would be a different story if the "requires_dist" key was not returned at all, which would be PyPI saying "I don't have this information". But my issue is that it's actually returning incorrect data, i.e. the "requires_dist" key is there, and it has a value (empty array):

$ curl -s "https://pypi.org/pypi/boto3/1.6.3/json" | jq ".info.requires_dist"
[]

Users don't seem to have a way to tell the difference between a package which genuinely has no 3rd-party requirements, and one with incorrectly parsed requirements, apart from downloading the distribution.

I think perhaps you should backfill these on all existing distributions, or at least all existing distributions which have a bdist present in index, so it no longer returns incorrect data. That should be easy enough to script and run as a once-off. Thoughts?

ncoghlan · 2018-03-07T01:10:31Z

@wimglenn I think that's 3 separate questions:

Distinguishing between "metadata never extracted" (requires_dist not set) and "no dependencies" (empty list). This gets back to the problem where this information is provided by clients as part of the component upload [1], so from PyPI's point of view, those two cases currently look the same. (This kind of limitation in the current upload API is a big part of why Warehouse's emulation of the pypi.python.org upload API has legacy in the name: it needs to be replaced with something more robust, but doing so becomes yet another compatibility migration to manage for index server implementations and upload clients).
Retrofitting metadata for existing releases that have uploaded wheel or egg files. This seems plausible, since metadata can be extracted from those without running arbitrary Python code (and explicit rules could be defined for handling the cases where different binary artifacts include different metadata files). The constraint is then a combination of developer time, compute resources, and privileged backend database access, so it seems unlikely that will happen without specific funding from a user organisation or redistributor that wants to see it happen (or a successful grant application from the PSF Packaging Working Group).
Retrofitting metadata for existing source-only releases. This is a similar problem to 2, but with the extra difficulty that doing it reliably requires running arbitrary Python code (and the output of that code may be platform dependent).

[1] For example, see https://github.com/pypa/twine/blob/master/twine/repository.py#L122 for upload, https://github.com/pypa/twine/blob/master/twine/package.py#L83 for extraction

dstufft · 2018-03-07T04:30:28Z

Yea, we don't have a way to differentiate these cases currently. The other case we can't differentiate is "has the metadata, but the artifact was created with an older tool that didn't understand it". That mostly only applies to sdists though.
Not only platform dependent, but also dependency on enviornment variables, CLI flags, the state of the current system, and the state of external systems sometimes! Since setup.py is just Python, peolpe can and do, do a lot of stuff in there.

brainwane · 2020-01-27T17:24:14Z

The pip maintainers would like this because it would really help with the resolver improvements and automated testing improvements they're making over the next few months.

dstufft · 2020-01-27T17:35:59Z

PyPI's JSON API does not come from a PEP, so we're either stuck trying to add this to the existing simple API, standardizing PyPI's JSON API, or defining an entirely new replacement to the simple API. Personally I'd lean towards the last option there, but if we're going to do that, then we probably want to spend some time figuring out exactly what problems with the simple API we're trying to solve are.

brainwane · 2020-04-03T21:46:49Z

@pradyunsg @uranusjr @pfmoore As we work to roll out and test the new resolver pypa/pip#6536, or think about future versions, how much would you benefit from even a prototype or minimal version of this feature?

pfmoore · 2020-04-04T10:50:35Z

It would help, but it would need to be an extension to the standard for the simple API to be of significant benefit. We definitely don't want to end up special-casing PyPI/Warehouse in pip's code, and while we could add a test for whether an index supports a new API, I'd want that to be standardised (at least provisionally) or we're going to hit all sorts of backward compatibility and maintenance issues down the line.

Also, this would only be of minimal benefit unless it exposed metadata for sdists, which is a hard problem. If it only handled wheels, the only benefit would be reduced download volumes for wheels that ended up not being used. And pip's caches probably make that cost relatively small.

Personally, unless it was a standardised feature that provided sdist metadata¹ I feel that the benefits would be marginal enough that I'd expect us to defer any work to use it until after the release of the new resolver, as "post-release performance tuning" work.

¹ Or better still, metadata based on project and version alone, but that's not realistically achievable in the timescales we're looking at.

pradyunsg · 2020-11-19T16:22:51Z

#8254 is a proposal, that would address this as well.

pfmoore · 2020-11-19T16:45:36Z

Assuming PEP 643 gets approved, we will have reliable metadata available for wheels and (increasing numbers of) sdists. Extracting that metadata and exposing it via PyPI becomes an even more attractive prospect at that point.

Pip could likely work with either the JSON API or an extension to the simple API, but either one would need standardising first.

astrojuanlu · 2021-07-26T15:09:33Z

PEP 643 got approved! What's not clear to me is how this issue interacts with #8254.

uranusjr · 2021-07-28T06:00:54Z

With PEP 643 and PEP 658 (assuming the latter is accepted as-is), the procedure would be

Download and parse https://pypi.org/simple/{project_name}/
Find the <a> you want on the page (a wheel or sdist)
Append .METADATA to the <a> tag’s href to format the URL for the metadata (this is PEP 658)
Download the metadata file; each line that starts with Requires-Dist: would be a dependency (this is PEP 643)

pfmoore · 2021-07-28T07:37:54Z

each line that starts with Requires-Dist: would be a dependency (this is PEP 643)

Minor clarification: If it's a sdist, and either Metadata-Version is less than 2.2, or Requires-Dist is present in a line starting with Dynamic:, then you cannot rely on this approach to get dependencies, and you must still build a wheel from the sdist.

dstufft · 2023-05-23T04:14:03Z

I'm going to close this, PEP 658 is deployed on PyPI now (though there is a bug with it, but that will be fixed soon) and that's our current path for fetching any artifact's metadata without downloading the entire artifact.

wimglenn · 2023-05-23T05:16:40Z

@dstufft This is a nice enhancement for wheels uploaded, but it's falling a bit short for an API - is it really that programs/tools should parse the simple html and read those href attrs? Index still seems to show requires_dist as null in the json API (https://test.pypi.org/pypi/johnnydep/json uploaded just now for example)

Even for simple, only the wheel got the attr - sdists don't seem to though the wording in PEP (second para of the rationale) suggests that standards-compliant sdists are still in scope. Or is it that the sdist was not standards compliant somehow? It was setuptools build, which put a PKG-INFO file in the sdist and the deps into an .egg-info/requires.txt file.

Thanks

dstufft · 2023-05-23T05:23:36Z

The simple API supports JSON since PEP 691, using something like:

import email.message
import requests

def parse_content_type(header: str) -> str:
    m = email.message.Message()
    m["content-type"] = header
    return m.get_content_type()

# Construct our list of acceptable content types, we want to prefer
# that we get a v1 response serialized using JSON, however we also
# can support a v1 response serialized using HTML. For compatibility
# we also request text/html, but we prefer it least of all since we
# don't know if it's actually a Simple API response, or just some
# random HTML page that we've gotten due to a misconfiguration.
CONTENT_TYPES = [
    "application/vnd.pypi.simple.v1+json",
    "application/vnd.pypi.simple.v1+html;q=0.2",
    "text/html;q=0.01",  # For legacy compatibility
]
ACCEPT = ", ".join(CONTENT_TYPES)


# Actually make our request to the API, requesting all of the content
# types that we find acceptable, and letting the server select one of
# them out of the list.
resp = requests.get("https://pypi.org/simple/", headers={"Accept": ACCEPT})

# If the server does not support any of the content types you requested,
# AND it has chosen to return a HTTP 406 error instead of a default
# response then this will raise an exception for the 406 error.
resp.raise_for_status()


# Determine what kind of response we've gotten to ensure that it is one
# that we can support, and if it is, dispatch to a function that will
# understand how to interpret that particular version+serialization. If
# we don't understand the content type we've gotten, then we'll raise
# an exception.
content_type = parse_content_type(resp.headers.get("content-type", ""))
match content_type:
    case "application/vnd.pypi.simple.v1+json":
        handle_v1_json(resp)
    case "application/vnd.pypi.simple.v1+html" | "text/html":
        handle_v1_html(resp)
    case _:
        raise Exception(f"Unknown content type: {content_type}")

If you don't want to support HTML repositories (if for instance, you only talk to PyPI or are OK not supporting repositories that haven't implemented PEP 691) you can simplify that down to:

import email.message
import requests

def parse_content_type(header: str) -> str:
    m = email.message.Message()
    m["content-type"] = header
    return m.get_content_type()

# Construct our list of acceptable content types, we only accept a
# a v1 response serialized using JSON.
CONTENT_TYPES = [
    "application/vnd.pypi.simple.v1+json",
]
ACCEPT = ", ".join(CONTENT_TYPES)


# Actually make our request to the API, requesting all of the content
# types that we find acceptable, and letting the server select one of
# them out of the list.
resp = requests.get("https://pypi.org/simple/", headers={"Accept": ACCEPT})

# If the server does not support any of the content types you requested,
# AND it has chosen to return a HTTP 406 error instead of a default
# response then this will raise an exception for the 406 error.
resp.raise_for_status()


# Determine what kind of response we've gotten to ensure that it is one
# that we can support, and if it is, dispatch to a function that will
# understand how to interpret that particular version+serialization. If
# we don't understand the content type we've gotten, then we'll raise
# an exception.
content_type = parse_content_type(resp.headers.get("content-type", ""))
match content_type:
    case "application/vnd.pypi.simple.v1+json":
        handle_v1_json(resp)
    case _:
        raise Exception(f"Unknown content type: {content_type}")

Support for sdists is blocked on PEP 643 support: #9660

wimglenn · 2023-05-23T05:51:29Z

Thanks, I've just tried accepting application/vnd.pypi.simple.v1+json and it works, but all I could find in result was the sha256 of the METADATA file. Does it mean that tools should make a second request to get the metadata file (if available), and parse that content as an email message? I was wondering about a way to get requires_dist directly in a JSON response without needing an additional request and parse, apologies if I've missed something obvious here.

Ack on the PEP 643 for sdists ("dynamic" field details etc)

dstufft · 2023-05-23T06:04:44Z

Yes, that's the way that PEP 658 exposed that information.

There's a trade off here between making more requests, and making the singular request larger. Putting the metadata in /simple/<project>/ will make those responses much larger, forcing people to download metadata for all releases of a project that they don't care about. Splitting them up into multiple requests lets downloaders scope their requests to just the releases they need. This gets exacerbated by the fact that it's not just different metadata per version, but also each file can have it's own independent metadata.

Of course the flip side then is that if all you want is one piece of metadata out of the entire METADATA file, then you're forced to download the entire thing. That's the other trade off in that we force you to download a bit extra, to prevent having to make hundreds of requests just to get the complete metadata.

It is a little crummy to have to parse the content as an email message rather than JSON, but that's the format that our METADATA files take, and it was safer to return that unmodified rather than parsing it in PyPI and uploading it.

It's possible that a future PEP will turn METADATA into a json file, or lift specific fields to be on the base json response. But for now this is our intended API for fetching any metadata for a file without downloading the entire file.

nlhkabu added the requires triaging maintainers need to do initial inspection of issue label Jul 2, 2016

di added feature request and removed requires triaging maintainers need to do initial inspection of issue labels Dec 7, 2017

di mentioned this issue Feb 10, 2018

PyPi metadata database #2912

Closed

brainwane added the APIs/feeds label Feb 12, 2018

brainwane added this to the 6. Post Legacy Shutdown milestone Feb 16, 2018

waseem18 mentioned this issue Mar 5, 2018

Advanced search #727

Open

pradyunsg mentioned this issue Mar 5, 2018

Guide request: gracefully dropping support for older Python versions pypa/packaging.python.org#450

Closed

ncoghlan mentioned this issue Mar 7, 2018

RFE: Allow project maintainers to edit Requires-Python metadata directly #3138

Closed

jakirkham mentioned this issue Apr 10, 2018

bump dependencies regro/cf-scripts#22

Open

brainwane mentioned this issue Apr 12, 2018

Package index should list dependencies and dependents pypa/packaging-problems#54

Open

David-OConnor mentioned this issue Aug 6, 2019

Missing dependencies on Matplotlib #6375

Closed

s117 mentioned this issue Mar 24, 2020

Figure out how to access the content of local PyPI mirror werwty/dependency-analysis-pypi#6

Closed

achimnol mentioned this issue Apr 25, 2020

Distributing large binary payloads as separate downloads #7852

Open

pradyunsg mentioned this issue Nov 19, 2020

[2020-resolver] Pip downloads lots of different versions of the same package pypa/pip#8713

Closed

ddelange mentioned this issue Nov 19, 2020

Is there any chance of discovering package dependencies without building wheels? ddelange/pipgrip#40

Closed

brainwane mentioned this issue Dec 3, 2020

New resolver takes a very long time to complete pypa/pip#9187

Closed

This was referenced Apr 23, 2021

New download API for PyPI psf/fundable-packaging-improvements#22

Open

Audit and update package metadata psf/fundable-packaging-improvements#28

Open

pietakio mentioned this issue Jun 24, 2021

Allow other ways to install cadquery CadQuery/cadquery#153

Closed

astrojuanlu mentioned this issue Jul 26, 2021

Resolver preformance regression in 21.2 pypa/pip#10201

Closed

1 task

lorengordon mentioned this issue Jul 30, 2021

Missing metadata causing dependency resolution slowdown aws/aws-cli#5701

Closed

2 tasks

aryarm mentioned this issue Apr 1, 2022

missing requires_dist in PyPI JSON makes installation slow statsmodels/statsmodels#8200

Closed

machow mentioned this issue May 31, 2022

Dependency packages aren't listed in PyPI API machow/siuba#425

Closed

miketheman mentioned this issue Jun 14, 2022

Wrong data reported from Pypi API #11587

Closed

Avasam mentioned this issue Jan 15, 2023

TODO tracking issue typeshed-internal/stub_uploader#65

Open

dstufft closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API to get dependencies without full download #474

API to get dependencies without full download #474

guettli commented Apr 8, 2015

domenkozar commented Apr 12, 2015

guettli commented Apr 13, 2015

domenkozar commented Apr 13, 2015

nealmcb commented Apr 13, 2017

nealmcb commented Apr 13, 2017

aaron-prindle commented Nov 21, 2017

dstufft commented Nov 22, 2017

rth commented Feb 11, 2018

brainwane commented Feb 16, 2018

rth commented Feb 17, 2018

brainwane commented Feb 26, 2018

ncoghlan commented Feb 26, 2018

rth commented Feb 26, 2018

wimglenn commented Mar 5, 2018

ncoghlan commented Mar 5, 2018

ncoghlan commented Mar 6, 2018

wimglenn commented Mar 7, 2018 •

edited

Loading

ncoghlan commented Mar 7, 2018

dstufft commented Mar 7, 2018

brainwane commented Jan 27, 2020

dstufft commented Jan 27, 2020

brainwane commented Apr 3, 2020

pfmoore commented Apr 4, 2020

pradyunsg commented Nov 19, 2020

pfmoore commented Nov 19, 2020

astrojuanlu commented Jul 26, 2021

uranusjr commented Jul 28, 2021

pfmoore commented Jul 28, 2021

dstufft commented May 23, 2023

wimglenn commented May 23, 2023 •

edited

Loading

dstufft commented May 23, 2023

wimglenn commented May 23, 2023 •

edited

Loading

dstufft commented May 23, 2023

API to get dependencies without full download #474

API to get dependencies without full download #474

Comments

guettli commented Apr 8, 2015

domenkozar commented Apr 12, 2015

guettli commented Apr 13, 2015

domenkozar commented Apr 13, 2015

nealmcb commented Apr 13, 2017

nealmcb commented Apr 13, 2017

aaron-prindle commented Nov 21, 2017

dstufft commented Nov 22, 2017

rth commented Feb 11, 2018

brainwane commented Feb 16, 2018

rth commented Feb 17, 2018

brainwane commented Feb 26, 2018

ncoghlan commented Feb 26, 2018

rth commented Feb 26, 2018

wimglenn commented Mar 5, 2018

ncoghlan commented Mar 5, 2018

ncoghlan commented Mar 6, 2018

wimglenn commented Mar 7, 2018 • edited Loading

ncoghlan commented Mar 7, 2018

dstufft commented Mar 7, 2018

brainwane commented Jan 27, 2020

dstufft commented Jan 27, 2020

brainwane commented Apr 3, 2020

pfmoore commented Apr 4, 2020

pradyunsg commented Nov 19, 2020

pfmoore commented Nov 19, 2020

astrojuanlu commented Jul 26, 2021

uranusjr commented Jul 28, 2021

pfmoore commented Jul 28, 2021

dstufft commented May 23, 2023

wimglenn commented May 23, 2023 • edited Loading

dstufft commented May 23, 2023

wimglenn commented May 23, 2023 • edited Loading

dstufft commented May 23, 2023

wimglenn commented Mar 7, 2018 •

edited

Loading

wimglenn commented May 23, 2023 •

edited

Loading

wimglenn commented May 23, 2023 •

edited

Loading