Skip to content

[core] Make ObjectBufferPool::FreeObjects lock free#57833

Closed
codope wants to merge 1 commit intoray-project:masterfrom
codope:FreeObjects_lock_free
Closed

[core] Make ObjectBufferPool::FreeObjects lock free#57833
codope wants to merge 1 commit intoray-project:masterfrom
codope:FreeObjects_lock_free

Conversation

@codope
Copy link
Copy Markdown
Contributor

@codope codope commented Oct 17, 2025

Description

Make ObjectBufferPool::FreeObjects lock free. The pool_mutex_ lock in FreeObjects provides no actual synchronization benefit because:

  1. PlasmaClient is already internally synchronized.
  2. No ObjectBufferPool state is accessed.

See #57550 (comment) for the discussion.

Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com>
@codope codope requested a review from a team as a code owner October 17, 2025 04:34
@codope codope requested a review from dayshah October 17, 2025 04:35
@codope codope added the go add ONLY when ready to merge, run all tests label Oct 17, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request makes ObjectBufferPool::FreeObjects lock-free by removing an unnecessary absl::MutexLock. The justification provided is sound: PlasmaClient is internally synchronized, and this function does not access any other state of ObjectBufferPool that would require locking. The changes in both the implementation file (.cc) and the header file (.h) are consistent and correct. Removing the lock and the corresponding ABSL_LOCKS_EXCLUDED annotation is appropriate and should lead to a performance improvement by reducing lock contention. The change is well-contained and appears to be safe.

void ObjectBufferPool::FreeObjects(const std::vector<ObjectID> &object_ids) {
absl::MutexLock lock(&pool_mutex_);
RAY_CHECK_OK(store_client_->Delete(object_ids));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Concurrency Issue in Object Management

Removing the mutex lock from FreeObjects introduces a race condition. This allows FreeObjects to concurrently delete Plasma objects while other operations, like WriteChunk or AbortCreateInternal, are still active on the same object. This can lead to crashes, undefined behavior, or state corruption.

Fix in Cursor Fix in Web

@ray-gardener ray-gardener Bot added the core Issues that should be addressed in Ray Core label Oct 17, 2025
Copy link
Copy Markdown
Contributor

@dayshah dayshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks safe to me

Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious that this is safe because other methods in the ObjectBufferPool make multiple calls in sequence into the store_client_. Previously, these calls were guaranteed to execute transactionally with respect to this Delete call, but with this change the Delete can be interleaved.

If there's a strong motivation to remove the lock, then let's audit the other callsites closely and make sure that there are no correctness issues due to the interleaving, else I'd just leave it be for now.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 1, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 1, 2025
@codope
Copy link
Copy Markdown
Contributor Author

codope commented Nov 7, 2025

ok closing it for now. I will go through the other callsites and revive the PR if needed.

@codope codope closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests stale The issue is stale. It will be closed within 7 days unless there are further conversation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants