-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[xray] lineage optimization: avoid unnecessary lineage entry allocation & free #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 32b181e.
|
Test FAILed. |
stephanie-wang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I left a few small comments!
Also, just out of curiosity, do you know which exact changes led to the most improvement?
src/ray/raylet/lineage_cache.cc
Outdated
| // If the new status is greater, then overwrite the current entry. | ||
| if (current_entry->GetStatus() < status) { | ||
| // If the new status is greater, then overwrite the current entry. | ||
| current_entry->SetStatus(status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetStatus will return a bool, so you can do if (current_entry->SetStatus(status)) if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, thanks! Putting them into a single check would be cleaner:-)
| auto new_entry = LineageEntry(task, GcsStatus::UNCOMMITTED_READY); | ||
| RAY_CHECK(lineage_.SetEntry(std::move(new_entry))); | ||
| entry->SetStatus(GcsStatus::UNCOMMITTED_READY); | ||
| // TaskSepc is immutable, just update TaskExecSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskSepc -> TaskSpec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
src/ray/raylet/lineage_cache.h
Outdated
| boost::optional<LineageEntry &> GetEntryMutable(const UniqueID &task_id); | ||
|
|
||
| /// Set an entry in the lineage. If an entry with this ID already exists, | ||
| /// Set an entry in the lineage. If an entry with this ID already exists, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
src/ray/raylet/task.h
Outdated
|
|
||
| /// Update the dynamic/mutable information for this task. | ||
| /// \param task Task structure with updated dynamic information. | ||
| void Update(const Task &task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would probably give this a more descriptive name, since we want to be careful when using this method. Maybe something like CopyTaskExecutionSpec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Update() is a bit too vague. Updated to the suggested name.
|
@stephanie-wang Thanks for reviewing. As I understand, the perf improvement comes from lineage state transactions, with the change it just update the status or task data in-place, without needing to erase the entry from hash and add it back, which involves hash operations and memory alloc/free. I didn't measure the impact of each line though:) |
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you fix the typo and then I'll merge? The failing tests look unrelated.
src/ray/raylet/lineage_cache.cc
Outdated
| entry->SetStatus(GcsStatus::UNCOMMITTED_READY); | ||
| // TaskSepc is immutable, just update TaskExecSpec. | ||
| entry->TaskDataMutable().Update(task); | ||
| // TaskSepc. is immutable, just update TaskExecSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace TaskSepc. with TaskSpec? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, updated.
|
Updated. Thanks @stephanie-wang |
|
Test PASSed. |
|
Great, thanks! I'll merge assuming the tests pass. |
core gcs related conflicts
What do these changes do?
With raylet, transferring 100K integers using a python script from two actors running in different dockers takes 3 mintues on my macbook, while previously with legacy ray it only takes 40 seconds. Profiling results show most of time cycles in raylet are spent on lineage stuff, memory allocation & free together take 50% CPU, while serialization/deserialization takes time too.
This change is a first step to optimize lineage related code. It basically avoids unnecessary allocation/free with lineage hash entries. Experiment shows with the change running the same script takes 2 min 10 sec, which indicates an improvement.
For more contexts, kindly refer to
#2403
Related issue number
2403