Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vendored attrs sys.path leak. #2328

Merged
merged 2 commits into from Jan 16, 2024

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Jan 15, 2024

Whenever a given Python interpreter on a machine was 1st identified by
Pex during the PEX boot process, Pex's own vendored attrs would be
leaked onto the hermetic sys.path of that interpreter forevermore.
This would lead to Pex's vendored attrs not being scrubbed from the
sys.path during PEX boot handoff to user code.

Whenever a given Python interpreter on a machine was 1st identified by
Pex during the PEX boot process, Pex's own vendored attrs would be
leaked onto the hermetic `sys.path` of that interpreter forevermore.
This would lead to Pex's vendored attrs not being scrubbed from the
`sys.path` during PEX boot handoff to user code.
if not os.path.isabs(entry):
return False
for internal_entry in internal_entries:
if internal_entry == commonpath((internal_entry, entry)):
Copy link
Member Author

@jsirois jsirois Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of this check was the bug. Something like ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6/pex/vendor/_vendored/attrs would be (and still is) on the sys.path during PEX boot, but only the ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6 entry was in PYTHONPATH since attrs was imported via the pex.third_party custom importer, which imports from ~/.pex/isolated/975c556eea71292a09d930db2ca41875066d8be6 using the pex.vendor._vendored.attrs package prefix since attrs is not third_party.exposed().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is good, but the comment above has some BS in it I just detected. I'll amend the comment to spell out the situation a bit better in a few hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there is a deeper fix that can be made such that the old check would work. The real bug is that third_party.exposed() is empty when it should include attrs. This is a very old bug in the pex.third_party custom Importer code. I'll proceed with releasing the current fix and circle back to make the fix cleaner (it'll probably be more ugly / more code, but it will be in the right place).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, actually a clean fix. This just required removing an always un-needed check present in the original introduction of the vendoring / pex.third_party mechanism in #624 way back in 2018. See #2330.

return True
return False

sys_path = OrderedSet(entry for entry in sys.path if entry and not is_internal_entry(entry))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming that both this sys.path and the internal_entries lists will be relatively small, so the nested loops are unlikely to be a performance problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. But even so, the cross product must be checked anyhow. There is no clever shortcut I can see.

Copy link
Collaborator

@huonw huonw Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a problem, I have a suspicion one could build a trie from the components of internal_entries, and traverse it using the components of entry, but much better if we can avoid that complexity.

if not os.path.isabs(entry):
return False
for internal_entry in internal_entries:
if internal_entry == commonpath((internal_entry, entry)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like this is conceptually equivalent to to something like:

Suggested change
if internal_entry == commonpath((internal_entry, entry)):
if entry.startswith(internal_entry):

Is that correct?

Is commonpath version is required to ensure that entry = "/foobar", internal_entry = "/foo" doesn't return an incorrect value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commonpath function is not commonprefix and is not subject to that sort of bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, perhaps you sorta got that and we're asking for confirmation. If so, yes, exactly.

# type: (str) -> bool
if entry in internal_entries:
return True
if not os.path.isabs(entry):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might internal_entries also have relative paths in it (e.g. PYTHONPATH="./foo/bar")? Does that need handling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot, no.

@@ -252,7 +288,7 @@ def iter_tags():
env_markers = MarkerEnvironment(**values.pop("env_markers"))
return cls(
version=cast("Tuple[int, int, int]", version),
pypy_version=pypy_version,
pypy_version=cast("Optional[Tuple[int, int, int]]", pypy_version),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change and the __format_version__ handling related to the attrs fix? If so, can you explain how it connects? Thanks.

Copy link
Member Author

@jsirois jsirois Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was already logic to bust the cache when a new INTERP-INFO field was added (see the comparison of # entries to 16 on LHS of diff), but in this case, the number of entries did not change (well, it did because I added this new version format field, but ignore that) but the contents of an existing entry, sys_path, did. The format version field allows busting the cache when no new entries are added, but contents has changed / a bug has been fixed and it's desired to have the old cache entries auto-evicted.

@zmanji
Copy link
Collaborator

zmanji commented Jan 15, 2024

I can confirm this bug exists on my local machine by running:

fd -t f .  /Users/zmanji/.pex/interpreters/ -X 'jq' '.sys_path'

And seeing some entries have /Users/zmanji/.pex/isolated/194b5174b81632b65d29e39489cf61230eecfa7a/pex/vendor/_vendored/attrs which is unexpected.

@@ -227,12 +241,34 @@ def get(cls, binary=None):
configured_macosx_deployment_target=configured_macosx_deployment_target,
)

# Increment this integer version number when changing the encode / decode format or content.
_FORMAT_VERSION = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just tie this to the current pex version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but at the expense of thrashing caches for no reason. The INTERP-INFO format changes rarely relative to Pex releases.

@jsirois jsirois merged commit 4fb9444 into pex-tool:main Jan 16, 2024
26 checks passed
@jsirois jsirois deleted the PythonInterpreter/fix_sys_path_leak branch January 16, 2024 00:25
jsirois added a commit to jsirois/pex that referenced this pull request Jan 16, 2024
This cleans up pex-tool#2328 by fixing the root cause of the interpreter
identification leak of exposed `attrs`. The API to report exposed
`sys.path` elements, `third_party.exposed()`, worked at build time,
with a full Pex distribution available, but returned an empty iterator
at run time, with the subset of Pex available in the runtime
`.bootstrap/`.
jsirois added a commit that referenced this pull request Jan 16, 2024
This cleans up #2328 by fixing the root cause of the interpreter
identification leak of exposed `attrs`. The API to report exposed
`sys.path` elements, `third_party.exposed()`, worked at build time,
with a full Pex distribution available, but returned an empty iterator
at run time, with the subset of Pex available in the runtime
`.bootstrap/`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants