Skip to content

fix: semaphore open failures in MutableObjectManager cross-node setup#62373

Open
volcano303 wants to merge 2 commits intoray-project:masterfrom
volcano303:fix/semaphore-open-failures-62328
Open

fix: semaphore open failures in MutableObjectManager cross-node setup#62373
volcano303 wants to merge 2 commits intoray-project:masterfrom
volcano303:fix/semaphore-open-failures-62328

Conversation

@volcano303
Copy link
Copy Markdown

Description

Fix two bugs in MutableObjectManager::OpenSemaphores() that cause a RAY_CHECK_NE crash (0 vs 0) when using Ray compiled DAGs across multiple nodes (e.g. vLLM multi-node tensor parallel inference).

Bug 1 — Missing / prefix in semaphore names. GetSemaphoreObjectName and GetSemaphoreHeaderName produced names like "obj..." and "hdr..." without a leading /. POSIX sem_open requires names to start with /. Strict glibc versions (≤2.31, common on DGX Spark / Ubuntu 20.04) enforce this and return EINVAL.

Bug 2 — Cross-node reader fails to create semaphore. When a reader on a different node finds semaphores_created == kDone (set by the writer on another machine via shared memory), it calls sem_open(name, 0) without O_CREAT. Since named semaphores are per-machine, this returns ENOENT. Fixed by using O_CREAT (without O_EXCL) so the semaphore is opened if it exists or created if it doesn't.

Related issues

Fixes #62328

Additional information

  • Added two unit tests: TestSemaphoreNameHasLeadingSlash and TestCrossNodeSemaphoreCreation
  • All existing mutable_object_test tests continue to pass
  • Full cross-node validation requires multi-node hardware (e.g. DGX Spark) as described in the issue

@volcano303 volcano303 requested a review from a team as a code owner April 6, 2026 20:57
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the semaphore naming convention by adding a leading slash and modifies sem_open to use O_CREAT, facilitating cross-node semaphore creation. It also includes unit tests for these changes. The review feedback correctly identifies that the semaphore name length checks need to be updated to account for the additional characters in the prefix to ensure they remain within the PSEMNAMLEN limit.

Comment thread src/ray/core_worker/experimental_mutable_object_manager.cc Outdated
Comment thread src/ray/core_worker/experimental_mutable_object_manager.cc Outdated
@volcano303 volcano303 force-pushed the fix/semaphore-open-failures-62328 branch 2 times, most recently from 4372bff to 2e58f80 Compare April 7, 2026 00:59
@volcano303 volcano303 force-pushed the fix/semaphore-open-failures-62328 branch from 2e58f80 to 0827b58 Compare April 7, 2026 01:02
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 0827b58. Configure here.

Comment thread src/ray/core_worker/experimental_mutable_object_manager.cc Outdated
@ray-gardener ray-gardener Bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Apr 7, 2026
tobby168 added a commit to tobby168/ray that referenced this pull request Apr 20, 2026
…-project#62328)

`OpenSemaphores()` crashes with `Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0`
(`experimental_mutable_object_manager.cc:177`) when registering a reader channel
in multi-node setups — e.g. vLLM compiled graphs with tensor parallelism across
DGX Spark nodes. This PR addresses two independent bugs:

1. Cross-node reader skips `O_CREAT` (primary bug): The `semaphores_created`
   atomic in `PlasmaObjectHeader` coordinates initialization within a single
   machine's shared memory. On a different node, the reader's header can observe
   `kDone` even though the named POSIX semaphores were never created on its
   machine. The reader then calls `sem_open(name, 0)` without `O_CREAT` and
   fails with `ENOENT`, returning `SEM_FAILED`. The `RAY_CHECK_NE` assertion
   then fires. Fixed by using `O_CREAT` (without `O_EXCL`) in the reader path:
   opens existing semaphores in the single-node case, creates them locally in
   the cross-node case, with the correct initial value of 1.

2. Non-POSIX semaphore names: POSIX requires `sem_open()` names to start with
   `/`. glibc is lenient (strips leading slashes), but other libc
   implementations (e.g., musl, some older libc) enforce this. Changed
   `"obj"`/`"hdr"` prefixes to `"/obj"`/`"/hdr"` for portability. The existing
   `RAY_CHECK_LE(name.size(), PSEMNAMLEN)` bound is preserved since `name`
   (from `unique_name`) is already capped at `PSEMNAMLEN` in `Init()`.

Added two unit tests:

- `TestSemaphoreNameHasLeadingSlash`: verifies that after registration the
  semaphores are reachable via their POSIX-correct `/obj<name>` / `/hdr<name>`
  paths.
- `TestCrossNodeSemaphoreCreation`: pre-sets `semaphores_created=kDone` without
  creating the underlying semaphores, then registers a reader channel and
  verifies registration succeeds with semaphores initialized to 1.

Related: ray-project#62373 (addresses the same issue with a similar approach).
tobby168 added a commit to tobby168/ray that referenced this pull request Apr 20, 2026
…-project#62328)

`OpenSemaphores()` crashes with `Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0`
(`experimental_mutable_object_manager.cc:177`) when registering a reader channel
in multi-node setups — e.g. vLLM compiled graphs with tensor parallelism across
DGX Spark nodes. This PR addresses two independent bugs:

1. Cross-node reader skips `O_CREAT` (primary bug): The `semaphores_created`
   atomic in `PlasmaObjectHeader` coordinates initialization within a single
   machine's shared memory. On a different node, the reader's header can observe
   `kDone` even though the named POSIX semaphores were never created on its
   machine. The reader then calls `sem_open(name, 0)` without `O_CREAT` and
   fails with `ENOENT`, returning `SEM_FAILED`. The `RAY_CHECK_NE` assertion
   then fires. Fixed by using `O_CREAT` (without `O_EXCL`) in the reader path:
   opens existing semaphores in the single-node case, creates them locally in
   the cross-node case, with the correct initial value of 1.

2. Non-POSIX semaphore names: POSIX requires `sem_open()` names to start with
   `/`. glibc is lenient (strips leading slashes), but other libc
   implementations (e.g., musl, some older libc) enforce this. Changed
   `"obj"`/`"hdr"` prefixes to `"/obj"`/`"/hdr"` for portability. The existing
   `RAY_CHECK_LE(name.size(), PSEMNAMLEN)` bound is preserved since `name`
   (from `unique_name`) is already capped at `PSEMNAMLEN` in `Init()`.

Added two unit tests:

- `TestSemaphoreNameHasLeadingSlash`: verifies that after registration the
  semaphores are reachable via their POSIX-correct `/obj<name>` / `/hdr<name>`
  paths.
- `TestCrossNodeSemaphoreCreation`: pre-sets `semaphores_created=kDone` without
  creating the underlying semaphores, then registers a reader channel and
  verifies registration succeeds with semaphores initialized to 1.

Related: ray-project#62373 (addresses the same issue with a similar approach).
Signed-off-by: tobby168 <tobby168@gmail.com>
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 21, 2026
edoakes pushed a commit that referenced this pull request Apr 21, 2026
) (#62762)

## Why are these changes needed?

Closes #62328.

`OpenSemaphores()` in `experimental_mutable_object_manager.cc` crashes
with

```
Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0
  [src/ray/core_worker/experimental_mutable_object_manager.cc:177]
```

when a reader registers a channel in **multi-node** setups — e.g. vLLM
compiled graphs running tensor parallelism across two DGX Spark nodes.

This PR addresses **two independent bugs** observed while debugging
#62328.

### 1. Cross-node reader skips `O_CREAT` (primary bug)

`PlasmaObjectHeader::semaphores_created` is an atomic flag that sits
inside the object's shared-memory header. It coordinates semaphore
initialization **within a single machine's shared memory**, not across
nodes.

On a different node, the reader's mapped header can observe `kDone`
(propagated through the object header contents) even though the
underlying named POSIX semaphores were never created on the reader's
machine — named semaphores are machine-local. The reader then calls

```cpp
sem_open(name, 0);   // no O_CREAT
```

which fails with `ENOENT` and returns `SEM_FAILED`. The subsequent
`RAY_CHECK_NE(sem, SEM_FAILED)` fires and the worker aborts.

**Fix:** use `O_CREAT` (without `O_EXCL`) on the reader path. In the
single-node case this just opens the existing semaphores; in the
cross-node case it creates them locally with the correct initial value
of `1`, matching the writer path.

### 2. Non-POSIX semaphore names

POSIX specifies that `sem_open()` names must start with `/`. glibc is
lenient and silently strips a missing slash, so this works today on most
Linux distros. Stricter implementations (musl, some older libcs) reject
such names with `EINVAL`.

**Fix:** change the prefixes from `"obj"`/`"hdr"` to `"/obj"`/`"/hdr"`,
and bound the resulting full name against `PSEMNAMLEN` (not just the raw
`name`) so the 4-byte prefix can't push the result over macOS's 30-byte
limit.

## Related work

#62373 addresses the same issue with a very similar approach. I wrote
this independently while investigating #62328 on a DGX Spark multi-node
setup; the only notable deltas here are:

- Two regression tests (`TestSemaphoreNameHasLeadingSlash`,
`TestCrossNodeSemaphoreCreation`) pinning both behaviors, including the
cross-node `kDone`-without-semaphores scenario.
- Updated inline comments explaining the cross-node reasoning and the
glibc-vs-other-libc semantics.

Happy to close this in favor of #62373 if @stephanie-wang prefers;
flagging both options so whoever reviews can pick whichever they like.

## Related issue number

Closes #62328
Related: #62373

## Checks

- [x] I've signed off every commit (by using the \`-s\` flag, i.e.,
\`git commit -s\`) in this PR.
- [x] I've run \`scripts/format.sh\` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
  - (No doc changes; bug fix only.)
- [x] I've added any new APIs to the API Reference. If the API is
intended to be private, it has been properly renamed.
  - (No new APIs.)
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
  - [x] Unit tests
  - [ ] Release tests
  - [ ] This PR is not tested :(

cc @stephanie-wang @jackhumphries

---------

Signed-off-by: tobby168 <tobby168@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
…-project#62328) (ray-project#62762)

## Why are these changes needed?

Closes ray-project#62328.

`OpenSemaphores()` in `experimental_mutable_object_manager.cc` crashes
with

```
Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0
  [src/ray/core_worker/experimental_mutable_object_manager.cc:177]
```

when a reader registers a channel in **multi-node** setups — e.g. vLLM
compiled graphs running tensor parallelism across two DGX Spark nodes.

This PR addresses **two independent bugs** observed while debugging
ray-project#62328.

### 1. Cross-node reader skips `O_CREAT` (primary bug)

`PlasmaObjectHeader::semaphores_created` is an atomic flag that sits
inside the object's shared-memory header. It coordinates semaphore
initialization **within a single machine's shared memory**, not across
nodes.

On a different node, the reader's mapped header can observe `kDone`
(propagated through the object header contents) even though the
underlying named POSIX semaphores were never created on the reader's
machine — named semaphores are machine-local. The reader then calls

```cpp
sem_open(name, 0);   // no O_CREAT
```

which fails with `ENOENT` and returns `SEM_FAILED`. The subsequent
`RAY_CHECK_NE(sem, SEM_FAILED)` fires and the worker aborts.

**Fix:** use `O_CREAT` (without `O_EXCL`) on the reader path. In the
single-node case this just opens the existing semaphores; in the
cross-node case it creates them locally with the correct initial value
of `1`, matching the writer path.

### 2. Non-POSIX semaphore names

POSIX specifies that `sem_open()` names must start with `/`. glibc is
lenient and silently strips a missing slash, so this works today on most
Linux distros. Stricter implementations (musl, some older libcs) reject
such names with `EINVAL`.

**Fix:** change the prefixes from `"obj"`/`"hdr"` to `"/obj"`/`"/hdr"`,
and bound the resulting full name against `PSEMNAMLEN` (not just the raw
`name`) so the 4-byte prefix can't push the result over macOS's 30-byte
limit.

## Related work

ray-project#62373 addresses the same issue with a very similar approach. I wrote
this independently while investigating ray-project#62328 on a DGX Spark multi-node
setup; the only notable deltas here are:

- Two regression tests (`TestSemaphoreNameHasLeadingSlash`,
`TestCrossNodeSemaphoreCreation`) pinning both behaviors, including the
cross-node `kDone`-without-semaphores scenario.
- Updated inline comments explaining the cross-node reasoning and the
glibc-vs-other-libc semantics.

Happy to close this in favor of ray-project#62373 if @stephanie-wang prefers;
flagging both options so whoever reviews can pick whichever they like.

## Related issue number

Closes ray-project#62328
Related: ray-project#62373

## Checks

- [x] I've signed off every commit (by using the \`-s\` flag, i.e.,
\`git commit -s\`) in this PR.
- [x] I've run \`scripts/format.sh\` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
  - (No doc changes; bug fix only.)
- [x] I've added any new APIs to the API Reference. If the API is
intended to be private, it has been properly renamed.
  - (No new APIs.)
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
  - [x] Unit tests
  - [ ] Release tests
  - [ ] This PR is not tested :(

cc @stephanie-wang @jackhumphries

---------

Signed-off-by: tobby168 <tobby168@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…n-failures-62328

# Conflicts:
#	src/ray/core_worker/experimental_mutable_object_manager.cc
@github-actions github-actions Bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] experimental_mutable_object_manager.cc - Check failed: RAY_PREDICT_TRUE(_left_ != _right_) 0 vs 0

1 participant