New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[object_buffer_pool] reduce mutex lock scope in WriteChunk #43434
Merged
fishbone
merged 1 commit into
ray-project:master
from
pinterest:sjoshi/object_buffer_pool
Feb 29, 2024
Merged
[object_buffer_pool] reduce mutex lock scope in WriteChunk #43434
fishbone
merged 1 commit into
ray-project:master
from
pinterest:sjoshi/object_buffer_pool
Feb 29, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Saurabh Vishwas Joshi <sjoshi@pinterest.com>
@iycheng , @kevin85421 , @jjyao |
fishbone
approved these changes
Feb 28, 2024
iamyangchen
pushed a commit
to pinterest/ray
that referenced
this pull request
Mar 25, 2024
…ct#43434) Signed-off-by: Saurabh Vishwas Joshi <sjoshi@pinterest.com> ## Why are these changes needed? Object store network transfer performance is slow, and we observe a periodic burst followed by a gap in the network usage. A burst of inbound network traffic occurs at the beginning of each `ray.get(obj_refs)` call, then there is a wide-gap of unused network traffic and then a subsequent burst of network traffic in the next `ray.get(obj_refs) call`. This looks like a processing bottleneck when payloads are received over the network on the Pull side. We dug into the code in [object_manager](https://github.com/ray-project/ray/tree/91d5af69085897b02d29bc0d15a53849e56eb8e4/src/ray/object_manager) and found the following: - Objects are transferred in chunks of size: 5 MiB - When a PushRequest is received for a chunk, it is processed by [ObjectManager::HandlePush](https://github.com/ray-project/ray/blob/7ff3969159d3aeac00415ac26bf96a63f782db86/src/ray/object_manager/object_manager.cc#L562) - Which internally calls the [ObjectManager::ReceiveObjectChunk](https://github.com/ray-project/ray/blob/7ff3969159d3aeac00415ac26bf96a63f782db86/src/ray/object_manager/object_manager.cc#L623). This results in a call to the function [ObjectBufferPool::WriteChunk](https://github.com/ray-project/ray/blob/91d5af69085897b02d29bc0d15a53849e56eb8e4/src/ray/object_manager/object_buffer_pool.h#L128). - The WriteChunk function is [mutex guarded](https://github.com/ray-project/ray/blob/91d5af69085897b02d29bc0d15a53849e56eb8e4/src/ray/object_manager/object_buffer_pool.cc#L122) throughout its execution. - This includes the [std::memcpy](https://github.com/ray-project/ray/blob/91d5af69085897b02d29bc0d15a53849e56eb8e4/src/ray/object_manager/object_buffer_pool.cc#L139) call for a 5MiB payload - This [pool_mutex_](https://github.com/ray-project/ray/blob/91d5af69085897b02d29bc0d15a53849e56eb8e4/src/ray/object_manager/object_buffer_pool.h#L215) lock is shared by all object_ids being received over the network Which makes us believe that even if chunks for different ObjectId are received in parallel over the network they are written sequentially. Which would explain why we see a burst in the network usage followed by a hole in the network usage ### Changes **Write Chunk** - Transition the chunk from `REFERENCED` to `SEALED` before releasing the lock - Increment / Decrement `num_inflight_copies` before / after the copy - Perform an unguarded memcpy of the chunk into the buffer - Reacquire the mutex lock and perform object_id level `Seal` and `Release` decisions **AbortCreate** - Wait to ensure `num_inflight_copies == 0` before allowing the `Release` & `Abort` calls for the `object_id` - This check ensures that we do not release the underlying buffer while an unguarded copy is ongoing ### Tests #### Before - Sampled network at 1 second frequency `speedometer.py -i 1 -m 64424509440 -n 1073741824 -rx eth0` <img width="590" alt="Screenshot 2024-02-13 at 10 26 33 AM" src="https://github.com/ray-project/ray/assets/8691593/7a5497dd-b87d-4bec-a51c-62d629c06c58"> - Sampled network at 100 millisecond frequency `speedometer.py -i 0.1 -m 64424509440 -n 1073741824 -rx eth0 ` <img width="656" alt="Screenshot 2024-02-13 at 10 26 37 AM" src="https://github.com/ray-project/ray/assets/8691593/5ac229e6-4777-4820-b9e7-ebbd63cafcc9"> #### After - Sampled network at 1 second frequency `speedometer.py -i 1 -m 64424509440 -n 1073741824 -rx eth0` <img width="620" alt="Screenshot 2024-02-21 at 1 22 08 PM" src="https://github.com/ray-project/ray/assets/8691593/f15196d6-01f7-4d3e-b56c-ba09c58be455"> - Sampled network at 100ms frequency `speedometer.py -i 0.1 -m 64424509440 -n 1073741824 -rx eth0` <img width="1674" alt="Screenshot 2024-02-21 at 1 19 10 PM" src="https://github.com/ray-project/ray/assets/8691593/584f64b9-a39f-4441-b034-86234aff4283"> ``` Finished benchmark for total_size_MiB: 102400, block_size_MiB: 1024, parallel_block: None ray.wait(fetch_local=True) Gbps: 54.67178658153866 ray.wait(fetch_local=True) total time s: 15.711823463439941 ray.get() Gbps: 186301.24111362518 ray.get() total time s: 0.004610776901245117 ``` ## Related issue number
iamyangchen
added a commit
to pinterest/ray
that referenced
this pull request
Apr 12, 2024
…ray-project#43150) (#1) Backport the mutex lock fix in main Ray repo. See original PR for details: ray-project#43434 whl: https://ray-ci-artifact-pr-public.s3.amazonaws.com/7692d63f2d32c4b0da4166cf759190ba7f056bee/tmp/artifacts/.whl/.whl/ray-2.9.3-cp38-cp38-manylinux2014_x86_64.whl Co-authored-by: Saurabh Vishwas Joshi <sjoshi@pinterest.com>
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Object store network transfer performance is slow, and we observe a periodic burst followed by a gap in the network usage.
A burst of inbound network traffic occurs at the beginning of each
ray.get(obj_refs)
call, then there is a wide-gap of unused network traffic and then a subsequent burst of network traffic in the nextray.get(obj_refs) call
.This looks like a processing bottleneck when payloads are received over the network on the Pull side.
We dug into the code in object_manager and found the following:
Which makes us believe that even if chunks for different ObjectId are received in parallel over the network they are written sequentially. Which would explain why we see a burst in the network usage followed by a hole in the network usage
Changes
Write Chunk
REFERENCED
toSEALED
before releasing the locknum_inflight_copies
before / after the copySeal
andRelease
decisionsAbortCreate
num_inflight_copies == 0
before allowing theRelease
&Abort
calls for theobject_id
Tests
Before
speedometer.py -i 1 -m 64424509440 -n 1073741824 -rx eth0
speedometer.py -i 0.1 -m 64424509440 -n 1073741824 -rx eth0
After
speedometer.py -i 1 -m 64424509440 -n 1073741824 -rx eth0
speedometer.py -i 0.1 -m 64424509440 -n 1073741824 -rx eth0
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.