Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix auto memmap gc failure #294

Merged
merged 4 commits into from Jan 18, 2016

Conversation

@ogrisel
Copy link
Contributor

ogrisel commented Jan 11, 2016

The automatic memmap feature of joblib can be mislead by the reuse of id() values for recently garbage collected numpy arrays as demonstrated in the non-regression test of this PR. This problem is actually quite frequent when generating large numpy arrays with a Python generator.

This PR fixes the issues by systematically hashing the arrays to find a robust unique identifier for the filenames of the temporary memmap'ed arrays.

I ran the benchmark script varying the parameters and could not find a configuration where hashing could cause a very large performance degradation w.r.t. the current master.

This fixes scikit-learn/scikit-learn#6063.

@ogrisel ogrisel added the bug label Jan 11, 2016
@ogrisel ogrisel changed the title Fix auto memmap gc failure [MRG] Fix auto memmap gc failure Jan 11, 2016
if context_id is not None:
warnings.warn('context_id is deprecated and ignored in joblib'
' 0.9.4 and will be removed in 0.11',
DeprecationWarning)

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Jan 11, 2016

Member

I thhink that we should consider making pool.py private to not have to worry about these things.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Jan 13, 2016

Author Contributor

I agree but I would rather not make this code change in the bugfix release to keep the diff readable.

for i in range(n):
yield np.ones(10, dtype=np.float32) * i
# Use max_nbytes=1 to force the use of memory-mapping
results = Parallel(n_jobs=4, max_nbytes=1)(

This comment has been minimized.

Copy link
@lesteve

lesteve Jan 12, 2016

Member

n_jobs=2 seems more common in tests, not sure whether there is a good reason.

@with_numpy
@with_multiprocessing
def test_auto_memmap_on_arrays_from_generator():
def generate_arrays(n):

This comment has been minimized.

Copy link
@lesteve

lesteve Jan 12, 2016

Member

Maybe it would be good to explain what is going on here. Short summary + a link to this PR would be enough since this PR description is rather well written.

This comment has been minimized.

Copy link
@lesteve

lesteve Jan 12, 2016

Member

Maybe it would be good to explain what is going on here

By that I meant a comment that explains the intent of the test.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Jan 12, 2016

BTW this AppVeyor failure seems genuine (from there):

======================================================================
ERROR: joblib.test.test_parallel.test_auto_memmap_on_arrays_from_generator
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Python27\lib\site-packages\joblib\test\test_parallel.py", line 575, in test_auto_memmap_on_arrays_from_generator
    delayed(check_memmap)(a) for a in generate_arrays(100))
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 819, in __call__
    self._terminate_pool()
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 549, in _terminate_pool
    self._pool.terminate()  # terminate does a join()
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 575, in terminate
    delete_folder(self._temp_folder)
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 427, in delete_folder
    shutil.rmtree(folder_path)
  File "C:\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\joblib_memmaping_pool_2860_48143952\\2860-81511728-03797be121e43221444fc2914fb93286.pkl_01.npy'
@@ -406,7 +391,7 @@ def test_memmaping_on_dev_shm():
# pickling procedure generate a .pkl and a .npy file:
assert_equal(len(os.listdir(pool_temp_folder)), 2)

b = np.ones(100, dtype=np.float64)
b = np.ones(100, dtype=np.float64) * 2

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Jan 12, 2016

Member

Why the "* 2"?

This comment has been minimized.

Copy link
@ogrisel

ogrisel Jan 13, 2016

Author Contributor

otherwise the 2 arrays a and b have the same content and I cannot test that b has been memmaped to a new file.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux via email Jan 13, 2016

Member

This comment has been minimized.

Copy link
@ogrisel

ogrisel Jan 13, 2016

Author Contributor

we are memmaping with r (because of a windows specific constraint). I will add a comment.

@ogrisel ogrisel force-pushed the ogrisel:fix-auto-memmap-gc-failure branch from 41b8fa1 to 7dc1c8a Jan 13, 2016
@ogrisel ogrisel added this to the 0.9.4 milestone Jan 13, 2016
@ogrisel

This comment has been minimized.

Copy link
Contributor Author

ogrisel commented Jan 13, 2016

@lesteve comments addressed. The status is red because coverage has decreased because of the new windows specific code, we can ignore it.

@ogrisel ogrisel force-pushed the ogrisel:fix-auto-memmap-gc-failure branch from fc6dea7 to 82964c7 Jan 13, 2016
raise
else:
i += 1
sleep(delay)

This comment has been minimized.

Copy link
@lesteve

lesteve Jan 13, 2016

Member

I think you would never raise because i < n_retries. Also a for loop may be a tad simpler.

for i in range(n_retries)
    try:
        if os.path.exists(folder_path):
            shutil.rmtree(folder_path)
        break
    except WindowsError:
        # A worker process might still hold an open file descriptors
        # on one of the memory mapped arrays in the temporary folder
        # let's wait a bit an retry
        if i >= n_retries - 1:
            raise
        else:
            sleep(delay)

This comment has been minimized.

Copy link
@ogrisel

ogrisel Jan 13, 2016

Author Contributor

good catch

@@ -406,10 +391,13 @@ def test_memmaping_on_dev_shm():
# pickling procedure generate a .pkl and a .npy file:
assert_equal(len(os.listdir(pool_temp_folder)), 2)

b = np.ones(100, dtype=np.float64)
# create a new array with a content that is different from a so that

This comment has been minimized.

Copy link
@lesteve

lesteve Jan 13, 2016

Member

maybe use from 'a' (i.e. quotes around a) to make it clearer that you are talking about the 'a' variable and not the 'a' article.

@ogrisel ogrisel force-pushed the ogrisel:fix-auto-memmap-gc-failure branch from 82964c7 to 6109123 Jan 13, 2016
@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Jan 13, 2016

Hmmm now AppVeyor fails in case you have missed it ...

@ogrisel

This comment has been minimized.

Copy link
Contributor Author

ogrisel commented Jan 14, 2016

Yes I am looking into it.

@ogrisel ogrisel force-pushed the ogrisel:fix-auto-memmap-gc-failure branch from 6109123 to e51760f Jan 14, 2016
if not isinstance(a, np.memmap):
raise TypeError('Expected np.memmap instance, got %r',
type(a))
return a.copy() # return a regular array instead of a memmap

This comment has been minimized.

Copy link
@ogrisel

ogrisel Jan 18, 2016

Author Contributor

This was the cause of the previous failure under windows: passing a temporary memmap instance back to the master process prevents the deletion of the pool temp folder as long as. I changed the delete_folder method to tolerate this (rare case) warning the user.

Under POSIX, there is no problem: the folder is deleted while the memmap instances are still fully functional until they get garbage collected.

@ogrisel ogrisel force-pushed the ogrisel:fix-auto-memmap-gc-failure branch from e51760f to 552c514 Jan 18, 2016
@ogrisel

This comment has been minimized.

Copy link
Contributor Author

ogrisel commented Jan 18, 2016

Ok I added a changelog entry, merging this.

ogrisel added a commit that referenced this pull request Jan 18, 2016
[MRG] Fix auto memmap gc failure
@ogrisel ogrisel merged commit b6af20c into joblib:master Jan 18, 2016
3 checks passed
3 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.1%) to 88.42%
Details
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016
* tag '0.9.4': (46 commits)
  Release 0.9.4
  DOC add missing changelog entry for joblib#296
  DOC add entry to changelog for joblib#294
  ENH spare one file descriptor / syscall in automemmap
  FIX auto-memmap gc bug by always hashing arrays
  TST non-regression test for auto-memmap / gc bug
  Add link to github issues for 0.9.4 changelog entries
  Fix my_exceptions._mk_exception when input exception is not inheritable
  add entry in changelog
  fixing hashing with mixed dtype + test
  Use _compat.PY3_OR_LATER where possible
  COSMIT fix some PEP8 horizontal misalignments
  Move definition of PY3_OR_LATER to _compat.py
  Do not use inspect.getargspec
  FIX joblib#295: deadlock between async dispatch and exception handling
  Add section in CHANGES.rst
  TRAVIS use numpy 1.10
  FIX style and pyflakes in test_pool.py
  Fix Parallel hanging with exhausted iterator
  remove useless section about versions of python prior to 2.6
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.