[core] Fix POSIX semaphore crash in experimental mutable objects (#62328)#62762
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves POSIX compliance and cross-node support for mutable object semaphores by prepending a leading slash to names and using O_CREAT in the reader path. Feedback suggests updating semaphore name length checks to account for the added prefix and wrapping platform-specific test code in conditional compilation for macOS compatibility.
|
Thanks for the review! Addressed all three points in 4792c0f.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 4792c0f. Configure here.
…-project#62328) `OpenSemaphores()` crashes with `Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0` (`experimental_mutable_object_manager.cc:177`) when registering a reader channel in multi-node setups — e.g. vLLM compiled graphs with tensor parallelism across DGX Spark nodes. This PR addresses two independent bugs: 1. Cross-node reader skips `O_CREAT` (primary bug): The `semaphores_created` atomic in `PlasmaObjectHeader` coordinates initialization within a single machine's shared memory. On a different node, the reader's header can observe `kDone` even though the named POSIX semaphores were never created on its machine. The reader then calls `sem_open(name, 0)` without `O_CREAT` and fails with `ENOENT`, returning `SEM_FAILED`. The `RAY_CHECK_NE` assertion then fires. Fixed by using `O_CREAT` (without `O_EXCL`) in the reader path: opens existing semaphores in the single-node case, creates them locally in the cross-node case, with the correct initial value of 1. 2. Non-POSIX semaphore names: POSIX requires `sem_open()` names to start with `/`. glibc is lenient (strips leading slashes), but other libc implementations (e.g., musl, some older libc) enforce this. Changed `"obj"`/`"hdr"` prefixes to `"/obj"`/`"/hdr"` for portability. The existing `RAY_CHECK_LE(name.size(), PSEMNAMLEN)` bound is preserved since `name` (from `unique_name`) is already capped at `PSEMNAMLEN` in `Init()`. Added two unit tests: - `TestSemaphoreNameHasLeadingSlash`: verifies that after registration the semaphores are reachable via their POSIX-correct `/obj<name>` / `/hdr<name>` paths. - `TestCrossNodeSemaphoreCreation`: pre-sets `semaphores_created=kDone` without creating the underlying semaphores, then registers a reader channel and verifies registration succeeds with semaphores initialized to 1. Related: ray-project#62373 (addresses the same issue with a similar approach). Signed-off-by: tobby168 <tobby168@gmail.com>
4792c0f to
a4746fe
Compare
- GetSemaphoreObjectName / GetSemaphoreHeaderName: keep the existing RAY_CHECK_LE(name.size(), PSEMNAMLEN) as defense-in-depth. An earlier iteration tightened this to check the final prefixed name, but that regresses on 64-bit Linux with pid_max=4194304 where a 7-digit PID produces a 27-byte name and 31-byte ret even though Linux sem_open accepts much longer names. If a name somehow slips through oversized on macOS, sem_open returns SEM_FAILED (ENAMETOOLONG) and the RAY_CHECK_NE below catches it. - TestCrossNodeSemaphoreCreation: guard the sem_getvalue() assertions with #ifdef __linux__. macOS does not implement sem_getvalue() (returns -1 with errno=ENOSYS per Darwin XNU), which would make the test fail on macOS builds. The RegisterChannel success assertion still runs on both platforms. - TestSemaphoreNameHasLeadingSlash comment: drop the inaccurate "Linux enforces strictly via glibc" phrasing in favour of the more accurate "glibc is lenient; musl and some older libcs reject" description, matching the inline comment in the .cc file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: tobby168 <tobby168@gmail.com>
a4746fe to
a816079
Compare
stephanie-wang
left a comment
There was a problem hiding this comment.
Thanks for this fix!
…-project#62328) (ray-project#62762) ## Why are these changes needed? Closes ray-project#62328. `OpenSemaphores()` in `experimental_mutable_object_manager.cc` crashes with ``` Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0 [src/ray/core_worker/experimental_mutable_object_manager.cc:177] ``` when a reader registers a channel in **multi-node** setups — e.g. vLLM compiled graphs running tensor parallelism across two DGX Spark nodes. This PR addresses **two independent bugs** observed while debugging ray-project#62328. ### 1. Cross-node reader skips `O_CREAT` (primary bug) `PlasmaObjectHeader::semaphores_created` is an atomic flag that sits inside the object's shared-memory header. It coordinates semaphore initialization **within a single machine's shared memory**, not across nodes. On a different node, the reader's mapped header can observe `kDone` (propagated through the object header contents) even though the underlying named POSIX semaphores were never created on the reader's machine — named semaphores are machine-local. The reader then calls ```cpp sem_open(name, 0); // no O_CREAT ``` which fails with `ENOENT` and returns `SEM_FAILED`. The subsequent `RAY_CHECK_NE(sem, SEM_FAILED)` fires and the worker aborts. **Fix:** use `O_CREAT` (without `O_EXCL`) on the reader path. In the single-node case this just opens the existing semaphores; in the cross-node case it creates them locally with the correct initial value of `1`, matching the writer path. ### 2. Non-POSIX semaphore names POSIX specifies that `sem_open()` names must start with `/`. glibc is lenient and silently strips a missing slash, so this works today on most Linux distros. Stricter implementations (musl, some older libcs) reject such names with `EINVAL`. **Fix:** change the prefixes from `"obj"`/`"hdr"` to `"/obj"`/`"/hdr"`, and bound the resulting full name against `PSEMNAMLEN` (not just the raw `name`) so the 4-byte prefix can't push the result over macOS's 30-byte limit. ## Related work ray-project#62373 addresses the same issue with a very similar approach. I wrote this independently while investigating ray-project#62328 on a DGX Spark multi-node setup; the only notable deltas here are: - Two regression tests (`TestSemaphoreNameHasLeadingSlash`, `TestCrossNodeSemaphoreCreation`) pinning both behaviors, including the cross-node `kDone`-without-semaphores scenario. - Updated inline comments explaining the cross-node reasoning and the glibc-vs-other-libc semantics. Happy to close this in favor of ray-project#62373 if @stephanie-wang prefers; flagging both options so whoever reviews can pick whichever they like. ## Related issue number Closes ray-project#62328 Related: ray-project#62373 ## Checks - [x] I've signed off every commit (by using the \`-s\` flag, i.e., \`git commit -s\`) in this PR. - [x] I've run \`scripts/format.sh\` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - (No doc changes; bug fix only.) - [x] I've added any new APIs to the API Reference. If the API is intended to be private, it has been properly renamed. - (No new APIs.) - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( cc @stephanie-wang @jackhumphries --------- Signed-off-by: tobby168 <tobby168@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Why are these changes needed?
Closes #62328.
OpenSemaphores()inexperimental_mutable_object_manager.cccrashes withwhen a reader registers a channel in multi-node setups — e.g. vLLM compiled graphs running tensor parallelism across two DGX Spark nodes.
This PR addresses two independent bugs observed while debugging #62328.
1. Cross-node reader skips
O_CREAT(primary bug)PlasmaObjectHeader::semaphores_createdis an atomic flag that sits inside the object's shared-memory header. It coordinates semaphore initialization within a single machine's shared memory, not across nodes.On a different node, the reader's mapped header can observe
kDone(propagated through the object header contents) even though the underlying named POSIX semaphores were never created on the reader's machine — named semaphores are machine-local. The reader then callswhich fails with
ENOENTand returnsSEM_FAILED. The subsequentRAY_CHECK_NE(sem, SEM_FAILED)fires and the worker aborts.Fix: use
O_CREAT(withoutO_EXCL) on the reader path. In the single-node case this just opens the existing semaphores; in the cross-node case it creates them locally with the correct initial value of1, matching the writer path.2. Non-POSIX semaphore names
POSIX specifies that
sem_open()names must start with/. glibc is lenient and silently strips a missing slash, so this works today on most Linux distros. Stricter implementations (musl, some older libcs) reject such names withEINVAL.Fix: change the prefixes from
"obj"/"hdr"to"/obj"/"/hdr", and bound the resulting full name againstPSEMNAMLEN(not just the rawname) so the 4-byte prefix can't push the result over macOS's 30-byte limit.Related work
#62373 addresses the same issue with a very similar approach. I wrote this independently while investigating #62328 on a DGX Spark multi-node setup; the only notable deltas here are:
TestSemaphoreNameHasLeadingSlash,TestCrossNodeSemaphoreCreation) pinning both behaviors, including the cross-nodekDone-without-semaphores scenario.Happy to close this in favor of #62373 if @stephanie-wang prefers; flagging both options so whoever reviews can pick whichever they like.
Related issue number
Closes #62328
Related: #62373
Checks
cc @stephanie-wang @jackhumphries