Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pex binary hangs on startup at atomic_directory #1119

Closed
mbakhoff opened this issue Dec 4, 2020 · 2 comments · Fixed by #1126
Closed

pex binary hangs on startup at atomic_directory #1119

mbakhoff opened this issue Dec 4, 2020 · 2 comments · Fixed by #1126
Assignees
Labels

Comments

@mbakhoff
Copy link

mbakhoff commented Dec 4, 2020

My pex binary is hanging on startup. Using python3.7, pex 2.1.21, ubuntu 18.04. Here's the stack:

Traceback (most recent call first):
  <built-in method lockf of module object at remote 0x7fdf05a08f50>
  File "/var/zt/zt_consumers/current/.bootstrap/pex/common.py", line 385, in atomic_directory
  <built-in method next of module object at remote 0x7fdf06709d10>
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/var/zt/zt_consumers/current/.bootstrap/pex/util.py", line 182, in cache_distribution
  File "/var/zt/zt_consumers/current/.bootstrap/pex/environment.py", line 173, in _write_zipped_internal_cache
  File "/var/zt/zt_consumers/current/.bootstrap/pex/environment.py", line 197, in _load_internal_cache
  File "/var/zt/zt_consumers/current/.bootstrap/pex/environment.py", line 227, in _update_candidate_distributions
  File "/var/zt/zt_consumers/current/.bootstrap/pex/environment.py", line 416, in _activate
  File "/var/zt/zt_consumers/current/.bootstrap/pex/environment.py", line 260, in activate
  File "/var/zt/zt_consumers/current/.bootstrap/pex/pex.py", line 103, in _activate
  File "/var/zt/zt_consumers/current/.bootstrap/pex/pex.py", line 444, in execute
  File "/var/zt/zt_consumers/current/.bootstrap/pex/pex_bootstrapper.py", line 360, in bootstrap_pex
  File "/var/zt/zt_consumers/current/__main__.py", line 68, in <module>
  <built-in method exec of module object at remote 0x7fdf06709d10>
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

It's a bit tricky to reproduce because it's timing sensitive. The basic steps should be: build a pex binary containing some wheels, launch multiple copies of the binary in parallel. All processes must use the same pex_root.

The hang happens in pex/common.py atomic_directory. What I think happens is that the first process takes a file lock for the atomic dir, finalizes it and releases the lock. If the timing is right, then another process reaches https://github.com/pantsbuild/pex/blob/v2.1.21/pex/common.py#L390 so it also grabs the lock but never releases it.

The issue was introduced in #1062

@jsirois jsirois added the bug label Dec 4, 2020
@jsirois jsirois self-assigned this Dec 4, 2020
@jsirois
Copy link
Member

jsirois commented Dec 4, 2020

Thanks @mbakhoff. That's a facepalm bug. Thanks for identifying, I'll get out a fix for this tomorrow.

jsirois added a commit to jsirois/pex that referenced this issue Dec 5, 2020
Previously we would fail to unlock when we lost the atomic directory
creation race.

Fixes pex-tool#1119
jsirois added a commit that referenced this issue Dec 7, 2020
Previously we would fail to unlock when we lost the atomic directory
creation race.

Fixes #1119
@mbakhoff
Copy link
Author

mbakhoff commented Dec 7, 2020

@jsirois Thank you for the quick fix! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants