New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix vendored attrs sys.path
leak.
#2328
Fix vendored attrs sys.path
leak.
#2328
Conversation
Whenever a given Python interpreter on a machine was 1st identified by Pex during the PEX boot process, Pex's own vendored attrs would be leaked onto the hermetic `sys.path` of that interpreter forevermore. This would lead to Pex's vendored attrs not being scrubbed from the `sys.path` during PEX boot handoff to user code.
if not os.path.isabs(entry): | ||
return False | ||
for internal_entry in internal_entries: | ||
if internal_entry == commonpath((internal_entry, entry)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of this check was the bug. Something like ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6/pex/vendor/_vendored/attrs
would be (and still is) on the sys.path
during PEX boot, but only the ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6
entry was in PYTHONPATH since attrs
was imported via the pex.third_party
custom importer, which imports from ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6
using the pex.vendor._vendored.attrs
package prefix since attrs
is not third_party.exposed()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is good, but the comment above has some BS in it I just detected. I'll amend the comment to spell out the situation a bit better in a few hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there is a deeper fix that can be made such that the old check would work. The real bug is that third_party.exposed()
is empty when it should include attrs. This is a very old bug in the pex.third_party
custom Importer code. I'll proceed with releasing the current fix and circle back to make the fix cleaner (it'll probably be more ugly / more code, but it will be in the right place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True | ||
return False | ||
|
||
sys_path = OrderedSet(entry for entry in sys.path if entry and not is_internal_entry(entry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming that both this sys.path
and the internal_entries
lists will be relatively small, so the nested loops are unlikely to be a performance problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. But even so, the cross product must be checked anyhow. There is no clever shortcut I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was a problem, I have a suspicion one could build a trie from the components of internal_entries
, and traverse it using the components of entry
, but much better if we can avoid that complexity.
if not os.path.isabs(entry): | ||
return False | ||
for internal_entry in internal_entries: | ||
if internal_entry == commonpath((internal_entry, entry)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like this is conceptually equivalent to to something like:
if internal_entry == commonpath((internal_entry, entry)): | |
if entry.startswith(internal_entry): |
Is that correct?
Is commonpath
version is required to ensure that entry = "/foobar"
, internal_entry = "/foo"
doesn't return an incorrect value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commonpath
function is not commonprefix
and is not subject to that sort of bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, perhaps you sorta got that and we're asking for confirmation. If so, yes, exactly.
# type: (str) -> bool | ||
if entry in internal_entries: | ||
return True | ||
if not os.path.isabs(entry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might internal_entries
also have relative paths in it (e.g. PYTHONPATH="./foo/bar"
)? Does that need handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot, no.
@@ -252,7 +288,7 @@ def iter_tags(): | |||
env_markers = MarkerEnvironment(**values.pop("env_markers")) | |||
return cls( | |||
version=cast("Tuple[int, int, int]", version), | |||
pypy_version=pypy_version, | |||
pypy_version=cast("Optional[Tuple[int, int, int]]", pypy_version), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change and the __format_version__
handling related to the attrs fix? If so, can you explain how it connects? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was already logic to bust the cache when a new INTERP-INFO field was added (see the comparison of # entries to 16 on LHS of diff), but in this case, the number of entries did not change (well, it did because I added this new version format field, but ignore that) but the contents of an existing entry, sys_path
, did. The format version field allows busting the cache when no new entries are added, but contents has changed / a bug has been fixed and it's desired to have the old cache entries auto-evicted.
I can confirm this bug exists on my local machine by running:
And seeing some entries have |
@@ -227,12 +241,34 @@ def get(cls, binary=None): | |||
configured_macosx_deployment_target=configured_macosx_deployment_target, | |||
) | |||
|
|||
# Increment this integer version number when changing the encode / decode format or content. | |||
_FORMAT_VERSION = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just tie this to the current pex version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but at the expense of thrashing caches for no reason. The INTERP-INFO format changes rarely relative to Pex releases.
This cleans up pex-tool#2328 by fixing the root cause of the interpreter identification leak of exposed `attrs`. The API to report exposed `sys.path` elements, `third_party.exposed()`, worked at build time, with a full Pex distribution available, but returned an empty iterator at run time, with the subset of Pex available in the runtime `.bootstrap/`.
This cleans up #2328 by fixing the root cause of the interpreter identification leak of exposed `attrs`. The API to report exposed `sys.path` elements, `third_party.exposed()`, worked at build time, with a full Pex distribution available, but returned an empty iterator at run time, with the subset of Pex available in the runtime `.bootstrap/`.
Whenever a given Python interpreter on a machine was 1st identified by
Pex during the PEX boot process, Pex's own vendored attrs would be
leaked onto the hermetic
sys.path
of that interpreter forevermore.This would lead to Pex's vendored attrs not being scrubbed from the
sys.path
during PEX boot handoff to user code.