Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: PEP 345 dependency metadata extraction command #4691

Closed
ncoghlan opened this issue Aug 24, 2017 · 9 comments
Closed

RFE: PEP 345 dependency metadata extraction command #4691

ncoghlan opened this issue Aug 24, 2017 · 9 comments
Labels
C: list/show 'pip list' or 'pip show' state: awaiting PR Feature discussed, PR is needed type: feature request Request for a new feature

Comments

@ncoghlan
Copy link
Member

In order to work out the dependencies that it needs to download, pip already knows how to interrogate wheel METADATA files and setup.py egg_info invocations. Once PEP 517 reaches an acceptable state, pip will presumably also gain support for extracting this metadata from pyproject.toml based sdists as well.

Currently, there's no straightforward way for a user to request the PEP 345 dependency metadata for a project of interest that takes advantage of features like pip's cache of downloaded sdists and built wheel files.

Independently of any future enhancements to PyPI to make this kind of information available over HTTPS, it could be useful to offer something like a client side pip metadata command that extracts the METADATA info and writes it to stdout.

@pradyunsg
Copy link
Member

Would extending pip show to be enough for this?

Off topic: Having this metadata stored statically in the sdist would be covered in an sdist 2.0 PEP or as a part of PEP 517?

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Aug 26, 2017
@ncoghlan
Copy link
Member Author

@pradyunsg Yes, I think pip show --metadata <package> would be a decent way of spelling it.

As far as static extraction goes, this info is usually already there as PKG-INFO, but it isn't 100% reliable, since the key requirement for making an installable sdist is to provide a setup.py command that does the right thing.

@dstufft Does Warehouse currently extract file listings from uploaded archives? I'm thinking there are some interesting questions we could ask & answer around the actual formats used to publish sdists based on that data (most notably, setup.py vs setup.cfg vs PKG-INFO, and various combinations thereof).

If Warehouse doesn't have it, I'll see what kind of access I can get to the openshift.io data set (as I know we're extracting full archive manifests, but I'm not sure whether or not it's currently possible for me to run arbitrary queries over that data)

@pradyunsg pradyunsg added state: awaiting PR Feature discussed, PR is needed C: list/show 'pip list' or 'pip show' labels Aug 29, 2017
@pradyunsg
Copy link
Member

Yes, I think pip show --metadata would be a decent way of spelling it.

Awesome!


2 questions:

  • <package> would be a proper PEP 440 Version Specifier and the distribution whose metadata is shown would be the latest among what would be selected? (yes?)
  • Should pip expose this information in a standard data interchange format (like JSON) -- since the motivation is to provide this metadata to the user for programatic use? (probably?)

@ncoghlan
Copy link
Member Author

pip show is currently limited to querying installed packages, which means I'm now less certain as to the suitability of using it for this purpose and would defer to folks like @dstufft and @pfmoore on that front.

It may more make sense to offer a new pip pkginfo subcommand that uses the selection criterion you describe, downloads (or builds) and caches the wheel file, then emits the parsed metadata.

As far as formatting goes, I agree it would make sense to emit a JSON-ified version of METADATA, rather than the Key:Value format used by PEP 345 and pip show.

@pfmoore
Copy link
Member

pfmoore commented Aug 30, 2017

I think we should leave pip show as relating to installed packages only. For querying an index, we need a whole extra batch of options (--index-url, --find-links, etc) and they don't make sense on pip show. I'd suggest pip query as the name. We could then have a relatively consistent set of subcommands:

  • pip show Show info for installed pacakges
  • pip query Get info for any package, installed or not
  • pip list Get summary info for all installed packages
  • pip search Search the package index

We should strive for a somewhat consistent interface, so (for example) I'd recommend following pip list and using --format=json for getting JSON data from pip query and a default format that's human readable (probably key-value like pip show uses). The human-readable form would be --format=default. This same formatting could later be extended cleanly to pip show if there were any interest.

I don't think any pip command should return JSON format by default - the default output should be for humans. But having an option to emit machine readable data (i.e. JSON) is a great idea.

@dstufft
Copy link
Member

dstufft commented Aug 31, 2017

I was originally against this idea when I first started reading this as a "grab the dependency information" command seemed too niche to really promote to a top level command. However, I think the idea of a pip query command that is similar to pip show, but operates on a repository makes a lot of sense, and the dependency information can be just one of the pieces of information that it shows.

I also agree entirely that pip's interface should default to a human readable one, and it should use a --format=json option to return JSONified data. That is probably something we should try to extend to as many of our commands as possible TBH.

And to answer @ncoghlan's question: No, there is no file content extraction happening in Warehouse.

@ncoghlan
Copy link
Member Author

Aye, while dependency extraction was the main use case I had in mind (hence the issue title), what I really meant was extraction of all the PEP 345 metadata in a way that's implicitly compatible with pip's local artifact caching, such that folks wanting to do their own automated analysis of PyPI components can more readily do things like:

  1. Use pip query directly to explore already downloaded & cached components (e.g. their own dependencies)
  2. Use bandersnatch + pip query to download and analyse a complete PyPI snapshot

If/when Warehouse does gain this metadata extraction capability for uploaded sdists, running the command as a strictly time limited client operation in a sandboxed environment with no network access also seems like it would be the safest way of actually doing it.

@cjerdonek
Copy link
Member

This seems related to issue #484.

@pradyunsg
Copy link
Member

pip inspect now provides this information.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: list/show 'pip list' or 'pip show' state: awaiting PR Feature discussed, PR is needed type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

5 participants