Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix auto memmap gc failure #294

Merged
merged 4 commits into from
Jan 18, 2016

Conversation

ogrisel
Copy link
Contributor

@ogrisel ogrisel commented Jan 11, 2016

The automatic memmap feature of joblib can be mislead by the reuse of id() values for recently garbage collected numpy arrays as demonstrated in the non-regression test of this PR. This problem is actually quite frequent when generating large numpy arrays with a Python generator.

This PR fixes the issues by systematically hashing the arrays to find a robust unique identifier for the filenames of the temporary memmap'ed arrays.

I ran the benchmark script varying the parameters and could not find a configuration where hashing could cause a very large performance degradation w.r.t. the current master.

This fixes scikit-learn/scikit-learn#6063.

@ogrisel ogrisel added the bug label Jan 11, 2016
@ogrisel ogrisel changed the title Fix auto memmap gc failure [MRG] Fix auto memmap gc failure Jan 11, 2016
if context_id is not None:
warnings.warn('context_id is deprecated and ignored in joblib'
' 0.9.4 and will be removed in 0.11',
DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thhink that we should consider making pool.py private to not have to worry about these things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I would rather not make this code change in the bugfix release to keep the diff readable.

@lesteve
Copy link
Member

lesteve commented Jan 12, 2016

BTW this AppVeyor failure seems genuine (from there):

======================================================================
ERROR: joblib.test.test_parallel.test_auto_memmap_on_arrays_from_generator
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Python27\lib\site-packages\joblib\test\test_parallel.py", line 575, in test_auto_memmap_on_arrays_from_generator
    delayed(check_memmap)(a) for a in generate_arrays(100))
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 819, in __call__
    self._terminate_pool()
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 549, in _terminate_pool
    self._pool.terminate()  # terminate does a join()
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 575, in terminate
    delete_folder(self._temp_folder)
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 427, in delete_folder
    shutil.rmtree(folder_path)
  File "C:\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\joblib_memmaping_pool_2860_48143952\\2860-81511728-03797be121e43221444fc2914fb93286.pkl_01.npy'

@@ -406,7 +391,7 @@ def test_memmaping_on_dev_shm():
# pickling procedure generate a .pkl and a .npy file:
assert_equal(len(os.listdir(pool_temp_folder)), 2)

b = np.ones(100, dtype=np.float64)
b = np.ones(100, dtype=np.float64) * 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "* 2"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise the 2 arrays a and b have the same content and I cannot test that b has been memmaped to a new file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are memmaping with r (because of a windows specific constraint). I will add a comment.

@ogrisel ogrisel added this to the 0.9.4 milestone Jan 13, 2016
@ogrisel
Copy link
Contributor Author

ogrisel commented Jan 13, 2016

@lesteve comments addressed. The status is red because coverage has decreased because of the new windows specific code, we can ignore it.

raise
else:
i += 1
sleep(delay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would never raise because i < n_retries. Also a for loop may be a tad simpler.

for i in range(n_retries)
    try:
        if os.path.exists(folder_path):
            shutil.rmtree(folder_path)
        break
    except WindowsError:
        # A worker process might still hold an open file descriptors
        # on one of the memory mapped arrays in the temporary folder
        # let's wait a bit an retry
        if i >= n_retries - 1:
            raise
        else:
            sleep(delay)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@lesteve
Copy link
Member

lesteve commented Jan 13, 2016

Hmmm now AppVeyor fails in case you have missed it ...

@ogrisel
Copy link
Contributor Author

ogrisel commented Jan 14, 2016

Yes I am looking into it.

if not isinstance(a, np.memmap):
raise TypeError('Expected np.memmap instance, got %r',
type(a))
return a.copy() # return a regular array instead of a memmap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cause of the previous failure under windows: passing a temporary memmap instance back to the master process prevents the deletion of the pool temp folder as long as. I changed the delete_folder method to tolerate this (rare case) warning the user.

Under POSIX, there is no problem: the folder is deleted while the memmap instances are still fully functional until they get garbage collected.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jan 18, 2016

Ok I added a changelog entry, merging this.

ogrisel added a commit that referenced this pull request Jan 18, 2016
@ogrisel ogrisel merged commit b6af20c into joblib:master Jan 18, 2016
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016
* tag '0.9.4': (46 commits)
  Release 0.9.4
  DOC add missing changelog entry for joblib#296
  DOC add entry to changelog for joblib#294
  ENH spare one file descriptor / syscall in automemmap
  FIX auto-memmap gc bug by always hashing arrays
  TST non-regression test for auto-memmap / gc bug
  Add link to github issues for 0.9.4 changelog entries
  Fix my_exceptions._mk_exception when input exception is not inheritable
  add entry in changelog
  fixing hashing with mixed dtype + test
  Use _compat.PY3_OR_LATER where possible
  COSMIT fix some PEP8 horizontal misalignments
  Move definition of PY3_OR_LATER to _compat.py
  Do not use inspect.getargspec
  FIX joblib#295: deadlock between async dispatch and exception handling
  Add section in CHANGES.rst
  TRAVIS use numpy 1.10
  FIX style and pyflakes in test_pool.py
  Fix Parallel hanging with exhausted iterator
  remove useless section about versions of python prior to 2.6
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KFold on large data yields overlapping train and test sets
3 participants