-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] use fallocate for fallback allocation to avoid SIGBUS #16824
Conversation
if (allocated_once && RayConfig::instance().plasma_unlimited()) { | ||
if (!MAP_POPULATE) { | ||
RAY_LOG(WARNING) | ||
<< "Fallback allocation: MAP_POPULATE is not available on this platform."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be too spammy since we will call this very often, consider LOG_DEBUG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a bit difficult to unit test, but could you at least manually test and check what happens if the disk is full and we trigger a fallback allocation?
LGTM pending testing (it would be also great to add the workload to nightly tests, though I'm not sure how easy it would be to replicate the out of disk issue. We might consider an alternate workload that specifically stresses disk space):
|
RAY_LOG(DEBUG) << "Enable MAP_POPULATE for fallback allocation."; | ||
flags |= MAP_POPULATE; | ||
} | ||
|
||
*pointer = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, *fd, 0); | ||
if (*pointer == MAP_FAILED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could change this to say out of disk space?
huh SIGBUS still happens with this fix. investigating. |
confirmed SIGBUS still happens even if we have
and later on worker node:
the proper fix might be installing signal handler... |
Ah, we must be getting sigbus during the populate then, is that right?
Otherwise I don't see how you can get sigbus if the pages are already
allocated.
…On Thu, Jul 1, 2021, 8:24 PM Chen Shen ***@***.***> wrote:
confirmed SIGBUS still happens even if we have MAP_POPULATE set:
[2021-07-02 03:18:08,731 I 27428 27457] dlmalloc.cc:114: create_and_mmap_buffer(29335560, /tmp/ray/plasmaXXXXXX)
[2021-07-02 03:18:08,731 D 27428 27457] dlmalloc.cc:153: Enable MAP_POPULATE for fallback allocation.
[2021-07-02 03:18:08,745 D 27428 27457] dlmalloc.cc:202: 0x7f4e56891008 = fake_mmap(29335560)
[2021-07-02 03:18:08,745 D 27428 27457] store.cc:373: create object c361e4b8895ba43cffffffffffffffffffffffff0100000023000000 succeeded
and later on worker node:
[2021-07-02 03:18:25,925 D 45086 45254] gcs_server_address_updater.cc:53: Getting gcs server address from raylet.
[2021-07-02 03:18:26,593 E 45086 96377] logging.cc:440: *** Aborted at 1625195906 (unix time) try "date -d @1625195906" if you are using GNU date ***
[2021-07-02 03:18:26,694 E 45086 96377] logging.cc:440: PC: @ 0x0 (unknown)
[2021-07-02 03:18:26,711 E 45086 96377] logging.cc:440: *** SIGBUS ***@***.***) received by PID 45086 (TID 0x7f45657fc700) from
the proper fix might be installing signal handler...
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#16824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSRUV4WAW5M5QO7K6LTTVUWO5ANCNFSM47VSSP6A>
.
|
That's a good point. Actually SIGBUS happens on the worker(client) side, where does MMAP too without the MAP_POPULATE flag. I can try to enable that to see what happens. Update: SIGBUS happens even if mmap succeeded with MAP_POPULATE flag. |
Doesn't work as expected |
Using fallocate instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can we add a test that fills up disk space and checks the error return?
A clever way to do this nondestructively would be to set the fallback dir to /dev/shm as well.
@clarkzinzow this turns SIGBUS crash to OOM exceptions, but I'm not sure if shuffle loader could recover from this OOM. I can confirm ray-project/ray_shuffling_data_loader#14 fixed the issue. |
The fix is to use the large disk for spilling. I think this will help you figuring out the problem earlier (but we shouldn't raise OOM, but out of disk space error instead). |
yup I think the behavior is the same. |
Tests seem to be failing on
|
Recently we had a SIGBUS bug that fallback allocation crashes with SIGBUS error when /tmp is full. Ideally we'd like to throw some OOM error instead of SIGBUS.
To address the problem, this PR uses fallocate which guarantees that the follow up write access won't fail if fallocate call succeed. Note that this only works for linux.
Another note is there is a posix_fallocate which falls back to emulation on non-linux systems. This is also worth considering but it may ends up with different performance behavior when fallocate/emulation is used. There is some additional caveat:
Test with shuffler:
in raylet.out log: