Skip to content

[MRG] Fix auto memmap gc failure#294

Merged
ogrisel merged 4 commits into
joblib:masterfrom
ogrisel:fix-auto-memmap-gc-failure
Jan 18, 2016
Merged

[MRG] Fix auto memmap gc failure#294
ogrisel merged 4 commits into
joblib:masterfrom
ogrisel:fix-auto-memmap-gc-failure

Conversation

@ogrisel

@ogrisel ogrisel commented Jan 11, 2016

Copy link
Copy Markdown
Contributor

The automatic memmap feature of joblib can be mislead by the reuse of id() values for recently garbage collected numpy arrays as demonstrated in the non-regression test of this PR. This problem is actually quite frequent when generating large numpy arrays with a Python generator.

This PR fixes the issues by systematically hashing the arrays to find a robust unique identifier for the filenames of the temporary memmap'ed arrays.

I ran the benchmark script varying the parameters and could not find a configuration where hashing could cause a very large performance degradation w.r.t. the current master.

This fixes scikit-learn/scikit-learn#6063.

@ogrisel ogrisel added the bug label Jan 11, 2016
@ogrisel ogrisel changed the title Fix auto memmap gc failure [MRG] Fix auto memmap gc failure Jan 11, 2016
Comment thread joblib/pool.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thhink that we should consider making pool.py private to not have to worry about these things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I would rather not make this code change in the bugfix release to keep the diff readable.

@lesteve

lesteve commented Jan 12, 2016

Copy link
Copy Markdown
Member

BTW this AppVeyor failure seems genuine (from there):

======================================================================
ERROR: joblib.test.test_parallel.test_auto_memmap_on_arrays_from_generator
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Python27\lib\site-packages\joblib\test\test_parallel.py", line 575, in test_auto_memmap_on_arrays_from_generator
    delayed(check_memmap)(a) for a in generate_arrays(100))
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 819, in __call__
    self._terminate_pool()
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 549, in _terminate_pool
    self._pool.terminate()  # terminate does a join()
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 575, in terminate
    delete_folder(self._temp_folder)
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 427, in delete_folder
    shutil.rmtree(folder_path)
  File "C:\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\joblib_memmaping_pool_2860_48143952\\2860-81511728-03797be121e43221444fc2914fb93286.pkl_01.npy'

Comment thread joblib/test/test_pool.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "* 2"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise the 2 arrays a and b have the same content and I cannot test that b has been memmaped to a new file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are memmaping with r (because of a windows specific constraint). I will add a comment.

@ogrisel ogrisel force-pushed the fix-auto-memmap-gc-failure branch from 41b8fa1 to 7dc1c8a Compare January 13, 2016 10:37
@ogrisel ogrisel added this to the 0.9.4 milestone Jan 13, 2016
@ogrisel

ogrisel commented Jan 13, 2016

Copy link
Copy Markdown
Contributor Author

@lesteve comments addressed. The status is red because coverage has decreased because of the new windows specific code, we can ignore it.

@ogrisel ogrisel force-pushed the fix-auto-memmap-gc-failure branch from fc6dea7 to 82964c7 Compare January 13, 2016 13:34
Comment thread joblib/pool.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would never raise because i < n_retries. Also a for loop may be a tad simpler.

for i in range(n_retries)
    try:
        if os.path.exists(folder_path):
            shutil.rmtree(folder_path)
        break
    except WindowsError:
        # A worker process might still hold an open file descriptors
        # on one of the memory mapped arrays in the temporary folder
        # let's wait a bit an retry
        if i >= n_retries - 1:
            raise
        else:
            sleep(delay)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@ogrisel ogrisel force-pushed the fix-auto-memmap-gc-failure branch from 82964c7 to 6109123 Compare January 13, 2016 14:59
@lesteve

lesteve commented Jan 13, 2016

Copy link
Copy Markdown
Member

Hmmm now AppVeyor fails in case you have missed it ...

@ogrisel

ogrisel commented Jan 14, 2016

Copy link
Copy Markdown
Contributor Author

Yes I am looking into it.

@ogrisel ogrisel force-pushed the fix-auto-memmap-gc-failure branch from 6109123 to e51760f Compare January 14, 2016 15:28

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cause of the previous failure under windows: passing a temporary memmap instance back to the master process prevents the deletion of the pool temp folder as long as. I changed the delete_folder method to tolerate this (rare case) warning the user.

Under POSIX, there is no problem: the folder is deleted while the memmap instances are still fully functional until they get garbage collected.

@ogrisel ogrisel force-pushed the fix-auto-memmap-gc-failure branch from e51760f to 552c514 Compare January 18, 2016 10:29
@ogrisel

ogrisel commented Jan 18, 2016

Copy link
Copy Markdown
Contributor Author

Ok I added a changelog entry, merging this.

ogrisel added a commit that referenced this pull request Jan 18, 2016
@ogrisel ogrisel merged commit b6af20c into joblib:master Jan 18, 2016
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016
* tag '0.9.4': (46 commits)
  Release 0.9.4
  DOC add missing changelog entry for joblib#296
  DOC add entry to changelog for joblib#294
  ENH spare one file descriptor / syscall in automemmap
  FIX auto-memmap gc bug by always hashing arrays
  TST non-regression test for auto-memmap / gc bug
  Add link to github issues for 0.9.4 changelog entries
  Fix my_exceptions._mk_exception when input exception is not inheritable
  add entry in changelog
  fixing hashing with mixed dtype + test
  Use _compat.PY3_OR_LATER where possible
  COSMIT fix some PEP8 horizontal misalignments
  Move definition of PY3_OR_LATER to _compat.py
  Do not use inspect.getargspec
  FIX joblib#295: deadlock between async dispatch and exception handling
  Add section in CHANGES.rst
  TRAVIS use numpy 1.10
  FIX style and pyflakes in test_pool.py
  Fix Parallel hanging with exhausted iterator
  remove useless section about versions of python prior to 2.6
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KFold on large data yields overlapping train and test sets

3 participants