Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_numbered_dir() multi-process-safe #30

Closed
pytestbot opened this issue Sep 17, 2016 · 4 comments
Closed

make_numbered_dir() multi-process-safe #30

pytestbot opened this issue Sep 17, 2016 · 4 comments

Comments

@pytestbot
Copy link

py.path.local.make_numbered_dir() has a race condition that only shows up if more than "keep" processes start up at exactly the same time (which could occur when starting tests in parallel).

The fix I provide here, while not theoretically perfect, should be good enough. It prevents very recent directories from being deleted (even if they don't have (yet?) a .lock file in them). For what "very recent" means, I suppose that any value greater than 1 second is good enough, but I personally like to have enough time to get a chance to copy a directory that I want to keep, so the patch uses 5 minutes as the default.

https://bitbucket.org/pypy/pypy/commits/c6f52c21fe7e0e5385764cb5b1013cfbc39884be

@pytestbot
Copy link
Author

Original comment by @arigo

Ah, good solution. +1

@pytestbot
Copy link
Author

Original comment by @RonnyPfannschmidt

the problem i see is the combination of 2 different issues with just that patch\

the solution to just the race condition is the following

  • create directory with random name using mkdir (ensures atomic directory creation)
  • create lock
  • atomically rename to target (it should be acceptable to replace empty directories on unix, as the make_numbered_dir code will never create a empty target directory thats in use once the rename strategy is used)

the issue with keeping things alive time based is another one

@aurzenligl
Copy link
Contributor

@RonnyPfannschmidt: two issues with mkdir + rename strategy:

@arigo: leaving tmpdirs for 5 minutes by default could make tmp size unacceptable, if test sessions are short and generate lot of filesystem data.

Flock strategy would be best, but has limitations (not all filesystems, not all OS) and demands external library or 500+ lines of code.

My proposition is this:
https://github.com/aurzenligl/study/blob/master/python-pyrace/racefree_by_cookie.py
Removal by cookie is a completely safe solution for normal case and timeout-based solutions for cornercases (process stops and kills):

  • no lock and cookie - completely safe removal (cookie has unique name, only one process will remove),
  • no lock and no cookie - wait "zombie" timeout and then remove, probably safe (happens when cookie is removed manually or dir removal is stopped in the middle),
  • lock outdated - wait by default 2 days and then remove, probably safe (happens when test is killed).

Race problems with different implementations can be reproduced using this tiny suite:
https://github.com/aurzenligl/study/blob/master/python-pyrace/test_repro.py

@RonnyPfannschmidt
Copy link
Member

closing as solved by #143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants