Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to PyPI simple API PEP 691 #629

Closed
JamieMagee opened this issue Jun 26, 2023 · 2 comments
Closed

Migrate to PyPI simple API PEP 691 #629

JamieMagee opened this issue Jun 26, 2023 · 2 comments
Assignees
Labels
detector:pip The pip detector type:refactor Refactoring or improving of existing code

Comments

@JamieMagee
Copy link
Member

JamieMagee commented Jun 26, 2023

Currently our pip detector uses PyPI's JSON API. However this API is deprecated in favour of the 'simple' API outlined in PEP 691. We should migrate our detector from the deprecated API to the simple API, and perform a couple of cleanup items along the way.

Current state

In a simplified description, the current PipComponentDetector first calls PythonResolver.ResolveRootsAsync, which in turn calls PyPiClient.FetchPackageDependenciesAsync which finally makes an HTTP request to https://pypi.org/pypi/{package-name}/json. This happens for each top-level package listed in requirements.txt, and the valid versions are saved in a dictionary.

Once the top-level dependencies are registered, PythonResolver.ProcessQueueAsync walks the top-level dependencies and calls PyPiClient.FetchPackageDependenciesAsync to download a Python wheel. From the Python wheel, the METADATA file is extracted to read the dependencies for the specific package. This process is then repeated for each node in the dependency graph, until the entire graph is resolved.

This currently has 3 major problems:

  1. We're using the old PyPI API, which is deprecated
  2. PyPiClient is doing too much
    • It not only makes HTTP calls to pypi.org, but also extracts files from wheels and adds packages to the graph
    • This makes it very difficult to unit test
    • It needs to be split into a client and a service that calls the client
  3. PyPiClient caches the entire HttpResponseMessage
    • This includes the entire content of the response, the headers, as well as the request
    • Instead the cache should return:
      • A SimpleProject object for calls to https://pypi.org/simple/{package-name}/
      • A Stream for calls to https://files.pythonhosted.org/packages/... (or maybe just a Stream for the METADATA file specifically. TBD)

Step 1: SimplyPyPiClient

Step one of this refactoring involves writing a SimplePyPiClient class to replace the current PyPiClient. It should expose two methods:

  • The first should accept a package name, and return an object representing the JSON response from https://pypi.org/simple/{package-name}/
    • PEP 691 has full details for this API, but do note that the application/vnd.pypi.simple.v1+html content type is required to receive JSON responses
  • The second should accept a path under https://files.pythonhosted.org/packages/ and return a Stream of the wheel file (or perhaps a Stream of the METADATA file in the wheel. TBD.)

Aside from that, a couple of features that the client needs to have:

Step 2: A new service

The current PyPiClient not only makes HTTP calls, but also resolved dependencies, and adds them to the graph. For example in GetReleasesAsync and FetchPackageDependenciesAsync. This makes testing these methods very difficult, as the test setup required is a lot. Therefore, we need to move this into a new service. This can perhaps be combined with the existing logic in PythonResolver into a NewPythonResolver, so we can test both in parallel using experiments.

Step 3: A new detector

In order to allow us to test both approaches at the same time, we'll need to create a NewPipDetector, which implements IExperimentalDetector. We can then register as an experiment.

AB#2140980

@JamieMagee JamieMagee added type:refactor Refactoring or improving of existing code detector:pip The pip detector labels Jun 26, 2023
@JamieMagee
Copy link
Member Author

Side note, can we use pip install report to handle this instead?

pip install \
  --requirement requirements.txt
  --ignore-installed \
  --dry-run \
  --quiet \
  --report - 

This is likely a separate issue.

@cobya
Copy link
Contributor

cobya commented Jan 19, 2024

Closing this issue as #672 has been merged using the PEP 691 API. I'll open a new issue to investigate usage of pip install report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detector:pip The pip detector type:refactor Refactoring or improving of existing code
Projects
None yet
Development

No branches or pull requests

3 participants