Skip to content

Really implement locate_file method in pex meta path finder#208

Merged
chrisnovakovic merged 1 commit intoplease-build:masterfrom
chrisnovakovic:pexdistribution-locate_file-for-real
Nov 8, 2024
Merged

Really implement locate_file method in pex meta path finder#208
chrisnovakovic merged 1 commit intoplease-build:masterfrom
chrisnovakovic:pexdistribution-locate_file-for-real

Conversation

@chrisnovakovic
Copy link
Contributor

The importlib.metadata documentation in Python 3.13 implies that it's fine for a Distribution's implementation of the locate_file abstract method to raise an error if the distribution doesn't exist as real files on a real file system, but this error will bubble up when the files method is called (because files gets a pathlib.Path-like object from locate_file for each file in RECORD and then checks that the path exists by calling exists() on each return value).

In locate_file, construct and return a zipfile.Path object pointing to the given file in the distribution in the pex file. zipfile.Path takes care of the path existence check itself, so files works as expected when running under Python >= 3.12.

The importlib.metadata documentation in Python 3.13 implies that it's
fine for a `Distribution`'s implementation of the `locate_file` abstract
method to raise an error if the distribution doesn't exist as real files
on a real file system, but this error will bubble up when the `files`
method is called (because `files` gets a `pathlib.Path`-like object from
`locate_file` for each file in `RECORD` and then checks that the path
exists by calling `exists()` on each return value).

In `locate_file`, construct and return a `zipfile.Path` object pointing
to the given file in the distribution in the pex file. `zipfile.Path`
takes care of the path existence check itself, so `files` works as
expected when running under Python >= 3.12.
Comment on lines +159 to +162
return zipfile.Path(
self._pex_file,
at=os.path.join(self._prefix, path) if self._prefix else path,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib.metadata's files method is slow - it calls locate_file for every file listed in the distribution's RECORD file, then confirms that each of those paths exists by calling exists on the zipfile.Path that is returned. This triggers a sequential scan for the file's local file header in the pex file for every file in the distribution.

//test:distribution_metadata_test, which calls files for pygments, takes 0.9 seconds to run on Python >= 3.12. In earlier Python versions, importlib.metadata takes it as given that the files listed in RECORD actually exist. I guess this is a cheap operation for real files on a real file system and they weren't thinking of any use case more exotic than that. I'm not sure there's much we can or should do about this here.

@chrisnovakovic chrisnovakovic merged commit 54a8661 into please-build:master Nov 8, 2024
@chrisnovakovic chrisnovakovic deleted the pexdistribution-locate_file-for-real branch November 8, 2024 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants