fix: semaphore open failures in MutableObjectManager cross-node setup#62373
fix: semaphore open failures in MutableObjectManager cross-node setup#62373volcano303 wants to merge 2 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the semaphore naming convention by adding a leading slash and modifies sem_open to use O_CREAT, facilitating cross-node semaphore creation. It also includes unit tests for these changes. The review feedback correctly identifies that the semaphore name length checks need to be updated to account for the additional characters in the prefix to ensure they remain within the PSEMNAMLEN limit.
4372bff to
2e58f80
Compare
…ay-project#62328) Signed-off-by: volcano303 <volcano1939@outlook.com>
2e58f80 to
0827b58
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 0827b58. Configure here.
…-project#62328) `OpenSemaphores()` crashes with `Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0` (`experimental_mutable_object_manager.cc:177`) when registering a reader channel in multi-node setups — e.g. vLLM compiled graphs with tensor parallelism across DGX Spark nodes. This PR addresses two independent bugs: 1. Cross-node reader skips `O_CREAT` (primary bug): The `semaphores_created` atomic in `PlasmaObjectHeader` coordinates initialization within a single machine's shared memory. On a different node, the reader's header can observe `kDone` even though the named POSIX semaphores were never created on its machine. The reader then calls `sem_open(name, 0)` without `O_CREAT` and fails with `ENOENT`, returning `SEM_FAILED`. The `RAY_CHECK_NE` assertion then fires. Fixed by using `O_CREAT` (without `O_EXCL`) in the reader path: opens existing semaphores in the single-node case, creates them locally in the cross-node case, with the correct initial value of 1. 2. Non-POSIX semaphore names: POSIX requires `sem_open()` names to start with `/`. glibc is lenient (strips leading slashes), but other libc implementations (e.g., musl, some older libc) enforce this. Changed `"obj"`/`"hdr"` prefixes to `"/obj"`/`"/hdr"` for portability. The existing `RAY_CHECK_LE(name.size(), PSEMNAMLEN)` bound is preserved since `name` (from `unique_name`) is already capped at `PSEMNAMLEN` in `Init()`. Added two unit tests: - `TestSemaphoreNameHasLeadingSlash`: verifies that after registration the semaphores are reachable via their POSIX-correct `/obj<name>` / `/hdr<name>` paths. - `TestCrossNodeSemaphoreCreation`: pre-sets `semaphores_created=kDone` without creating the underlying semaphores, then registers a reader channel and verifies registration succeeds with semaphores initialized to 1. Related: ray-project#62373 (addresses the same issue with a similar approach).
…-project#62328) `OpenSemaphores()` crashes with `Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0` (`experimental_mutable_object_manager.cc:177`) when registering a reader channel in multi-node setups — e.g. vLLM compiled graphs with tensor parallelism across DGX Spark nodes. This PR addresses two independent bugs: 1. Cross-node reader skips `O_CREAT` (primary bug): The `semaphores_created` atomic in `PlasmaObjectHeader` coordinates initialization within a single machine's shared memory. On a different node, the reader's header can observe `kDone` even though the named POSIX semaphores were never created on its machine. The reader then calls `sem_open(name, 0)` without `O_CREAT` and fails with `ENOENT`, returning `SEM_FAILED`. The `RAY_CHECK_NE` assertion then fires. Fixed by using `O_CREAT` (without `O_EXCL`) in the reader path: opens existing semaphores in the single-node case, creates them locally in the cross-node case, with the correct initial value of 1. 2. Non-POSIX semaphore names: POSIX requires `sem_open()` names to start with `/`. glibc is lenient (strips leading slashes), but other libc implementations (e.g., musl, some older libc) enforce this. Changed `"obj"`/`"hdr"` prefixes to `"/obj"`/`"/hdr"` for portability. The existing `RAY_CHECK_LE(name.size(), PSEMNAMLEN)` bound is preserved since `name` (from `unique_name`) is already capped at `PSEMNAMLEN` in `Init()`. Added two unit tests: - `TestSemaphoreNameHasLeadingSlash`: verifies that after registration the semaphores are reachable via their POSIX-correct `/obj<name>` / `/hdr<name>` paths. - `TestCrossNodeSemaphoreCreation`: pre-sets `semaphores_created=kDone` without creating the underlying semaphores, then registers a reader channel and verifies registration succeeds with semaphores initialized to 1. Related: ray-project#62373 (addresses the same issue with a similar approach). Signed-off-by: tobby168 <tobby168@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
) (#62762) ## Why are these changes needed? Closes #62328. `OpenSemaphores()` in `experimental_mutable_object_manager.cc` crashes with ``` Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0 [src/ray/core_worker/experimental_mutable_object_manager.cc:177] ``` when a reader registers a channel in **multi-node** setups — e.g. vLLM compiled graphs running tensor parallelism across two DGX Spark nodes. This PR addresses **two independent bugs** observed while debugging #62328. ### 1. Cross-node reader skips `O_CREAT` (primary bug) `PlasmaObjectHeader::semaphores_created` is an atomic flag that sits inside the object's shared-memory header. It coordinates semaphore initialization **within a single machine's shared memory**, not across nodes. On a different node, the reader's mapped header can observe `kDone` (propagated through the object header contents) even though the underlying named POSIX semaphores were never created on the reader's machine — named semaphores are machine-local. The reader then calls ```cpp sem_open(name, 0); // no O_CREAT ``` which fails with `ENOENT` and returns `SEM_FAILED`. The subsequent `RAY_CHECK_NE(sem, SEM_FAILED)` fires and the worker aborts. **Fix:** use `O_CREAT` (without `O_EXCL`) on the reader path. In the single-node case this just opens the existing semaphores; in the cross-node case it creates them locally with the correct initial value of `1`, matching the writer path. ### 2. Non-POSIX semaphore names POSIX specifies that `sem_open()` names must start with `/`. glibc is lenient and silently strips a missing slash, so this works today on most Linux distros. Stricter implementations (musl, some older libcs) reject such names with `EINVAL`. **Fix:** change the prefixes from `"obj"`/`"hdr"` to `"/obj"`/`"/hdr"`, and bound the resulting full name against `PSEMNAMLEN` (not just the raw `name`) so the 4-byte prefix can't push the result over macOS's 30-byte limit. ## Related work #62373 addresses the same issue with a very similar approach. I wrote this independently while investigating #62328 on a DGX Spark multi-node setup; the only notable deltas here are: - Two regression tests (`TestSemaphoreNameHasLeadingSlash`, `TestCrossNodeSemaphoreCreation`) pinning both behaviors, including the cross-node `kDone`-without-semaphores scenario. - Updated inline comments explaining the cross-node reasoning and the glibc-vs-other-libc semantics. Happy to close this in favor of #62373 if @stephanie-wang prefers; flagging both options so whoever reviews can pick whichever they like. ## Related issue number Closes #62328 Related: #62373 ## Checks - [x] I've signed off every commit (by using the \`-s\` flag, i.e., \`git commit -s\`) in this PR. - [x] I've run \`scripts/format.sh\` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - (No doc changes; bug fix only.) - [x] I've added any new APIs to the API Reference. If the API is intended to be private, it has been properly renamed. - (No new APIs.) - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( cc @stephanie-wang @jackhumphries --------- Signed-off-by: tobby168 <tobby168@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…-project#62328) (ray-project#62762) ## Why are these changes needed? Closes ray-project#62328. `OpenSemaphores()` in `experimental_mutable_object_manager.cc` crashes with ``` Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0 [src/ray/core_worker/experimental_mutable_object_manager.cc:177] ``` when a reader registers a channel in **multi-node** setups — e.g. vLLM compiled graphs running tensor parallelism across two DGX Spark nodes. This PR addresses **two independent bugs** observed while debugging ray-project#62328. ### 1. Cross-node reader skips `O_CREAT` (primary bug) `PlasmaObjectHeader::semaphores_created` is an atomic flag that sits inside the object's shared-memory header. It coordinates semaphore initialization **within a single machine's shared memory**, not across nodes. On a different node, the reader's mapped header can observe `kDone` (propagated through the object header contents) even though the underlying named POSIX semaphores were never created on the reader's machine — named semaphores are machine-local. The reader then calls ```cpp sem_open(name, 0); // no O_CREAT ``` which fails with `ENOENT` and returns `SEM_FAILED`. The subsequent `RAY_CHECK_NE(sem, SEM_FAILED)` fires and the worker aborts. **Fix:** use `O_CREAT` (without `O_EXCL`) on the reader path. In the single-node case this just opens the existing semaphores; in the cross-node case it creates them locally with the correct initial value of `1`, matching the writer path. ### 2. Non-POSIX semaphore names POSIX specifies that `sem_open()` names must start with `/`. glibc is lenient and silently strips a missing slash, so this works today on most Linux distros. Stricter implementations (musl, some older libcs) reject such names with `EINVAL`. **Fix:** change the prefixes from `"obj"`/`"hdr"` to `"/obj"`/`"/hdr"`, and bound the resulting full name against `PSEMNAMLEN` (not just the raw `name`) so the 4-byte prefix can't push the result over macOS's 30-byte limit. ## Related work ray-project#62373 addresses the same issue with a very similar approach. I wrote this independently while investigating ray-project#62328 on a DGX Spark multi-node setup; the only notable deltas here are: - Two regression tests (`TestSemaphoreNameHasLeadingSlash`, `TestCrossNodeSemaphoreCreation`) pinning both behaviors, including the cross-node `kDone`-without-semaphores scenario. - Updated inline comments explaining the cross-node reasoning and the glibc-vs-other-libc semantics. Happy to close this in favor of ray-project#62373 if @stephanie-wang prefers; flagging both options so whoever reviews can pick whichever they like. ## Related issue number Closes ray-project#62328 Related: ray-project#62373 ## Checks - [x] I've signed off every commit (by using the \`-s\` flag, i.e., \`git commit -s\`) in this PR. - [x] I've run \`scripts/format.sh\` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - (No doc changes; bug fix only.) - [x] I've added any new APIs to the API Reference. If the API is intended to be private, it has been properly renamed. - (No new APIs.) - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( cc @stephanie-wang @jackhumphries --------- Signed-off-by: tobby168 <tobby168@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…n-failures-62328 # Conflicts: # src/ray/core_worker/experimental_mutable_object_manager.cc

Description
Fix two bugs in
MutableObjectManager::OpenSemaphores()that cause aRAY_CHECK_NEcrash (0 vs 0) when using Ray compiled DAGs across multiple nodes (e.g. vLLM multi-node tensor parallel inference).Bug 1 — Missing
/prefix in semaphore names.GetSemaphoreObjectNameandGetSemaphoreHeaderNameproduced names like"obj..."and"hdr..."without a leading/. POSIXsem_openrequires names to start with/. Strict glibc versions (≤2.31, common on DGX Spark / Ubuntu 20.04) enforce this and returnEINVAL.Bug 2 — Cross-node reader fails to create semaphore. When a reader on a different node finds
semaphores_created == kDone(set by the writer on another machine via shared memory), it callssem_open(name, 0)withoutO_CREAT. Since named semaphores are per-machine, this returnsENOENT. Fixed by usingO_CREAT(withoutO_EXCL) so the semaphore is opened if it exists or created if it doesn't.Related issues
Fixes #62328
Additional information
TestSemaphoreNameHasLeadingSlashandTestCrossNodeSemaphoreCreationmutable_object_testtests continue to pass