Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.0.x: hook madvaise to fix threading memory allocation issues #4041

Merged
merged 3 commits into from
Aug 15, 2017

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 7, 2017

Original master issue: #3685
Master PR: #4026

The current VMA cache implementation backing rcache/grdma can run into
a deadlock situation in multi-threaded code when madvise is hooked and
the c library uses locks. In this case we may run into the following
situation:

Thread 1:

    ...
    free ()           <- Holding libc lock
    madvice_hook ()
    vma_iteration ()  <- Blocked waiting for vma lock

Thread 2:
    ...
    vma_insert ()     <- Holding vma lock
    vma_item_new ()
    malloc ()         <- Blocked waiting for libc lock

To fix this problem we chose to remove the madvise () hook but that
fix is causing issue open-mpi#3685. This commit aims to greatly reduce the
chance that the deadlock will be hit by putting vma items into a free
list. This moves the allocation outside the vma lock. In general there
are a relatively small number of vma items so the default is to
allocate 2048 vma items. This default is configurable but it is likely
the number is too large not too small.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit d0c4538)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
It is not possible to use the patcher based memory hooks without
hooking madvise (MADV_DONTNEED). This commit updates the patcher
memory hooks to always hook madvise. This should be safe with recent
rcache updates.

References open-mpi#3685. Close when merged into v2.0.x, v2.x, and v3.0.x.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 0176000)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn added the bug label Aug 7, 2017
@hjelmn hjelmn added this to the v3.0.0 milestone Aug 7, 2017
@hjelmn hjelmn requested a review from bosilca August 7, 2017 16:31
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have commented on the master PR. The same comments apply here: lack of documentation in a critical area, and either a confusion comment or a mishap in the code.

@jsquyres jsquyres changed the title v3.0.x fix 3685 v3.0.x: hook madvaise to fix threading memory allocation issues Aug 7, 2017
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit b870d15)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Aug 7, 2017

@bosilca Removed the erroneous comment. Please review again.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy approval for @bosilca

@bwbarrett bwbarrett merged commit 930ca2c into open-mpi:v3.0.x Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants