You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently our pip detector uses PyPI's JSON API. However this API is deprecated in favour of the 'simple' API outlined in PEP 691. We should migrate our detector from the deprecated API to the simple API, and perform a couple of cleanup items along the way.
Current state
In a simplified description, the current PipComponentDetector first calls PythonResolver.ResolveRootsAsync, which in turn calls PyPiClient.FetchPackageDependenciesAsync which finally makes an HTTP request to https://pypi.org/pypi/{package-name}/json. This happens for each top-level package listed in requirements.txt, and the valid versions are saved in a dictionary.
It not only makes HTTP calls to pypi.org, but also extracts files from wheels and adds packages to the graph
This makes it very difficult to unit test
It needs to be split into a client and a service that calls the client
PyPiClient caches the entire HttpResponseMessage
This includes the entire content of the response, the headers, as well as the request
Instead the cache should return:
A SimpleProject object for calls to https://pypi.org/simple/{package-name}/
A Stream for calls to https://files.pythonhosted.org/packages/... (or maybe just a Stream for the METADATA file specifically. TBD)
Step 1: SimplyPyPiClient
Step one of this refactoring involves writing a SimplePyPiClient class to replace the current PyPiClient. It should expose two methods:
The first should accept a package name, and return an object representing the JSON response from https://pypi.org/simple/{package-name}/
PEP 691 has full details for this API, but do note that the application/vnd.pypi.simple.v1+html content type is required to receive JSON responses
The second should accept a path under https://files.pythonhosted.org/packages/ and return a Stream of the wheel file (or perhaps a Stream of the METADATA file in the wheel. TBD.)
Aside from that, a couple of features that the client needs to have:
The current PyPiClient not only makes HTTP calls, but also resolved dependencies, and adds them to the graph. For example in GetReleasesAsync and FetchPackageDependenciesAsync. This makes testing these methods very difficult, as the test setup required is a lot. Therefore, we need to move this into a new service. This can perhaps be combined with the existing logic in PythonResolver into a NewPythonResolver, so we can test both in parallel using experiments.
Step 3: A new detector
In order to allow us to test both approaches at the same time, we'll need to create a NewPipDetector, which implements IExperimentalDetector. We can then register as an experiment.
Currently our pip detector uses PyPI's JSON API. However this API is deprecated in favour of the 'simple' API outlined in PEP 691. We should migrate our detector from the deprecated API to the simple API, and perform a couple of cleanup items along the way.
Current state
In a simplified description, the current
PipComponentDetector
first callsPythonResolver.ResolveRootsAsync
, which in turn callsPyPiClient.FetchPackageDependenciesAsync
which finally makes an HTTP request tohttps://pypi.org/pypi/{package-name}/json
. This happens for each top-level package listed inrequirements.txt
, and the valid versions are saved in a dictionary.Once the top-level dependencies are registered,
PythonResolver.ProcessQueueAsync
walks the top-level dependencies and callsPyPiClient.FetchPackageDependenciesAsync
to download a Python wheel. From the Python wheel, theMETADATA
file is extracted to read the dependencies for the specific package. This process is then repeated for each node in the dependency graph, until the entire graph is resolved.This currently has 3 major problems:
PyPiClient
is doing too muchPyPiClient
caches the entireHttpResponseMessage
SimpleProject
object for calls tohttps://pypi.org/simple/{package-name}/
Stream
for calls tohttps://files.pythonhosted.org/packages/...
(or maybe just aStream
for theMETADATA
file specifically. TBD)Step 1:
SimplyPyPiClient
Step one of this refactoring involves writing a
SimplePyPiClient
class to replace the currentPyPiClient
. It should expose two methods:https://pypi.org/simple/{package-name}/
application/vnd.pypi.simple.v1+html
content type is required to receive JSON responseshttps://files.pythonhosted.org/packages/
and return aStream
of the wheel file (or perhaps aStream
of theMETADATA
file in the wheel. TBD.)Aside from that, a couple of features that the client needs to have:
HttpResponseMessage
object, but this object is HUGEUser-Agent
headers on every requestuser-agent
headers to requests with pypiclient #622Step 2: A new service
The current
PyPiClient
not only makes HTTP calls, but also resolved dependencies, and adds them to the graph. For example inGetReleasesAsync
andFetchPackageDependenciesAsync
. This makes testing these methods very difficult, as the test setup required is a lot. Therefore, we need to move this into a new service. This can perhaps be combined with the existing logic inPythonResolver
into aNewPythonResolver
, so we can test both in parallel using experiments.Step 3: A new detector
In order to allow us to test both approaches at the same time, we'll need to create a
NewPipDetector
, which implementsIExperimentalDetector
. We can then register as an experiment.AB#2140980
The text was updated successfully, but these errors were encountered: