Really implement locate_file method in pex meta path finder#208
Conversation
The importlib.metadata documentation in Python 3.13 implies that it's fine for a `Distribution`'s implementation of the `locate_file` abstract method to raise an error if the distribution doesn't exist as real files on a real file system, but this error will bubble up when the `files` method is called (because `files` gets a `pathlib.Path`-like object from `locate_file` for each file in `RECORD` and then checks that the path exists by calling `exists()` on each return value). In `locate_file`, construct and return a `zipfile.Path` object pointing to the given file in the distribution in the pex file. `zipfile.Path` takes care of the path existence check itself, so `files` works as expected when running under Python >= 3.12.
| return zipfile.Path( | ||
| self._pex_file, | ||
| at=os.path.join(self._prefix, path) if self._prefix else path, | ||
| ) |
There was a problem hiding this comment.
importlib.metadata's files method is slow - it calls locate_file for every file listed in the distribution's RECORD file, then confirms that each of those paths exists by calling exists on the zipfile.Path that is returned. This triggers a sequential scan for the file's local file header in the pex file for every file in the distribution.
//test:distribution_metadata_test, which calls files for pygments, takes 0.9 seconds to run on Python >= 3.12. In earlier Python versions, importlib.metadata takes it as given that the files listed in RECORD actually exist. I guess this is a cheap operation for real files on a real file system and they weren't thinking of any use case more exotic than that. I'm not sure there's much we can or should do about this here.
The importlib.metadata documentation in Python 3.13 implies that it's fine for a
Distribution's implementation of thelocate_fileabstract method to raise an error if the distribution doesn't exist as real files on a real file system, but this error will bubble up when thefilesmethod is called (becausefilesgets apathlib.Path-like object fromlocate_filefor each file inRECORDand then checks that the path exists by callingexists()on each return value).In
locate_file, construct and return azipfile.Pathobject pointing to the given file in the distribution in the pex file.zipfile.Pathtakes care of the path existence check itself, sofilesworks as expected when running under Python >= 3.12.