-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the resource_tracker to unlink shared temporary memmaps #966
Use the resource_tracker to unlink shared temporary memmaps #966
Conversation
Note that in order to work, this PR will need the improvements brought by tomMoral/loky#228. |
17110f0
to
b48c976
Compare
bd9c1d4
to
f08cd13
Compare
rebased with master + vendored last loky version including joblib/loky#228 |
f08cd13
to
c5ebc67
Compare
I pushed the new CI into your branch :) |
Codecov Report
@@ Coverage Diff @@
## master #966 +/- ##
=========================================
Coverage ? 72.36%
=========================================
Files ? 46
Lines ? 6749
Branches ? 0
=========================================
Hits ? 4884
Misses ? 1865
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #966 +/- ##
===========================================
- Coverage 94.78% 72.36% -22.42%
===========================================
Files 45 46 +1
Lines 6803 6749 -54
===========================================
- Hits 6448 4884 -1564
- Misses 355 1865 +1510
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #966 +/- ##
=======================================
Coverage 94.19% 94.19%
=======================================
Files 47 47
Lines 6649 6649
=======================================
Hits 6263 6263
Misses 386 386 Continue to review full report at Codecov.
|
The implementation works everywhere on posix systems, and works fine for simple cases on my windows VM, but I still need to debug some more complex tests involving memmaps. I am also going to make a summary of my discussions with @ogrisel and post it here for the record. |
Thanks @pierreglaser and @ogrisel for this! |
I'm running over non-deterministic deadlocks during pool termination in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass of comments. More to come later today or tomorrow.
I merged master to get the fix for the macos CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another pass. I think we should never use a global JOBLIB_MMAPS
module variable in the parent.
To track temporary memmap on the worker side (to avoid sending them back as memmap in the backward reducer) we should probably instead use a "_joblib_temporary = True" attribute on them at unpickle time.
joblib/_memmapping_reducer.py
Outdated
def __init__(self, max_nbytes, temp_folder, mmap_mode, verbose=0, | ||
prewarm=True): | ||
def __init__(self, max_nbytes, temp_folder, mmap_mode, | ||
track_memmap_in_child, verbose=0, prewarm=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track_memmap_in_child
should be renamed to unlink_memmap_on_gc_collect
as ArrayMemmapReducer
has no notion of parent and child by itself. Only the callers do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with unlink_on_gc_collect
. I like it as an argument of memmap
re-constructors, but for the reducer itself, I like better give_ownership_to_loader
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the code would be simpler to follow if we had two classes: ArrayMemmapForwardReducer
and a simpler ArrayMemmapBackwardReducer
.
That might induce a bit of code duplication but that would also probably simplify the arguments handling. WDYT?
Yes it's definitely possible to do this instead. I think the most important benefit is that some branches of the
|
@ogrisel I still have to push a few changes before you can review again. |
I have failing tests when running But I guess I should wait for the loky update?
|
This PR should work out of the box - it vendors the Which tests are failing? Can you find an easy explanation of why the CI passes but the test suite fails locally? |
Output of ============================================= FAILURES ============================================== _____________________ test_permission_error_windows_memmap_sent_to_parent[loky] _____________________
backend = 'loky'
@with_numpy
@parametrize("backend", ["multiprocessing", "loky"])
def test_permission_error_windows_memmap_sent_to_parent(backend):
# Second non-regression test for:
# https://github.com/joblib/joblib/issues/806
# previously, child process would not convert temporary memmaps to numpy
# arrays when sending the data back to the parent process. This would lead
# to permission errors on windows when deleting joblib's temporary folder,
# as the memmaped files handles would still opened in the parent process.
cmd = '''if 1:
import os
import numpy as np
from joblib import Parallel, delayed
from testutils import return_slice_of_data
data = np.ones(int(2e6))
if __name__ == '__main__':
slice_of_data = Parallel(n_jobs=2, verbose=5, backend='{b}')(
delayed(return_slice_of_data)(data, 0, 20) for _ in range(10))
'''.format(b=backend)
env = os.environ.copy()
env['PYTHONPATH'] = os.path.dirname(__file__)
p = subprocess.Popen([sys.executable, '-c', cmd],
stderr=subprocess.PIPE,
stdout=subprocess.PIPE, env=env)
p.wait()
out, err = p.communicate()
assert p.returncode == 0, err
assert out == b''
if sys.version_info[:3] not in [(3, 8, 0), (3, 8, 1)]:
# In early versions of Python 3.8, a reference leak
# https://github.com/cloudpipe/cloudpickle/issues/327, holds references
# to pickled objects, generating race condition during cleanup
# finalizers of joblib and noisy resource_tracker outputs.
> assert b'resource_tracker' not in err
E assert b'resource_tracker' not in b"[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.\r\n[Parallel(n_jobs=2)]: Done 10 out of ...kl: FileNotFoundError(2, 'The system cannot find the path specified')\r\n 'resource_tracker: %s: %r' % (name, e))\r\n"
joblib\test\test_memmapping.py:381: AssertionError
________________ test_resource_tracker_silent_when_reference_cycles[multiprocessing] ________________
backend = 'multiprocessing'
@with_numpy
@with_multiprocessing
@parametrize("backend", ["multiprocessing", param("loky", marks=xfail)])
def test_resource_tracker_silent_when_reference_cycles(backend):
# There is a variety of reasons that can make joblib with loky backend
# output noisy warnings when a reference cycle is preventing a memmap from
# being garbage collected. Especially, joblib's main process finalizer
# deletes the temporary folder if it was not done before, which can
# interact badly with the resource_tracker. We don't risk leaking any
# resources, but this will likely make joblib output a lot of low-level
# confusing messages. This test is marked as xfail for now: but a next PR
# should fix this behavior.
# Note that the script in ``cmd`` is the exact same script as in
# test_permission_error_windows_reference_cycle.
cmd = """if 1:
import numpy as np
from joblib import Parallel, delayed
data = np.random.rand(int(2e6)).reshape((int(1e6), 2))
# Build a complex cyclic reference that is likely to delay garbage
# collection of the memmapped array in the worker processes.
first_list = current_list = [data]
for i in range(10):
current_list = [current_list]
first_list.append(current_list)
if __name__ == "__main__":
results = Parallel(n_jobs=2, backend="{b}")(
delayed(len)(current_list) for i in range(10))
assert results == [1] * 10
""".format(b=backend)
p = subprocess.Popen([sys.executable, '-c', cmd], stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
p.wait()
out, err = p.communicate()
assert p.returncode == 0, out.decode()
> assert b"resource_tracker" not in err, err.decode()
E AssertionError: ~\repos\joblib\joblib\externals\loky\backend\resource_tracker.py:303: UserWarning: resource_tracker: ~\AppData\Local\Temp\joblib_memmapping_folder_18632_2571083238816\18632-2571083638264-0a27cd2d120c4a09a79517f41e91bf00.pkl: FileNotFoundError(2, 'The system cannot find the path specified')
E 'resource_tracker: %s: %r' % (name, e))
E
E assert b'resource_tracker' not in b"~\\repos\\joblib\\joblib\\externals\\loky\\backend\\resource_tracker.py:303: UserWarning: resource_track...kl: FileNotFoundError(2, 'The system cannot find the path specified')\r\n 'resource_tracker: %s: %r' % (name, e))\r\n"
joblib\test\test_memmapping.py:433: AssertionError
=================== 2 failed, 28 passed, 6 skipped, 1 xfailed in 61.70s (0:01:01) =================== |
The one that is always failing is |
I just sent you a gitter mesage to work out the details out of this thread. |
So after discussing with @albertcthomas, it looks like in some cases, the As a first remark, I think we should downgrade the logging level of the With regards to addressing the
|
Thanks very much for investigating this @pierreglaser! I just wanted to add that although the test fails because of the warnings associated to |
+1 in favor of a Windows specific retry mechanism in the resource tracker that would only log an error after a bunch of iterations on in retry / sleep loop. |
But I am also in favor of merging this as it is and do a second PR for the noisy error messages caused by the race condition. |
Merged! |
@pierreglaser I let you do the retry mechanism of #966 (comment) in another PR? |
Yes, let’s do this before releasing. |
Would you rather patch |
I get alot of warnings if I use the new patch... I am working with a tkinter program and when I click a button the parallel execution should be performed. The first time it works flawless, but if I click the button again I get a lot of :
and the execution gets really slow. My guess is that the tkinter app still holds on to some of the memory allocated |
Closes #806
Roadmap:
Make use of the new
resource_tracker
API to cleanup shared, temporary resources (e.g memory-mapped files backingnp.memmap
objects) of long lasting process as early as possible.This was done before in an aggressive and not cross-OS manner by deleting the whole temporary folder entry in the file system upon the end of a
Parallel
call. Inlinux
, this would only remove one reference to each underlying temporary file - their collection would be triggered once all other references to this files (open file handles) are closed. InWindows
however, removing an entry from a filesystem is possible only if all file handles to this file are closed. This typically does not happen if:The proposed solution instead relies on the capability of the
resource_tracker
to carry out reference counting (and resource unlinking if the reference count reaches 0) of named resources. Parent and child processes now delegate resource management to theresource_tracker
, which acts as a synchronizer, and will only delete resources when all of their associated file handles are closed.This behavior is only implemented for the
loky
backend, asjoblib
's multiprocessing backend does not have a sharedresource_tracker
between the parent and child processes. It does not matter though asmultiprocessing
-based pools terminate child process at the end of eachpool.map()
call, so all temporary named resources have a limited lifetime anyway.In addition, we decided, in order to limit the complexity of resource management, to have child processes convert temporary
joblib
memmaps back tonumpy
arrays when sending data to the parent process.Regarding the temporary folder (and not files), it is less critical to remove it as soon as possible. In order to limit potentially bad interactions between folder unlinking and the unlinking of the files it contains, we implement a "best effort" deletion attempt of
joblib
's temporary folder if empty at the end of each parallel call, and otherwise register an atexit finalizer to remove the said folder.Add comprehensive tests.