Skip to content

Conversation

@zhijunfu
Copy link
Contributor

What do these changes do?

With raylet, transferring 100K integers using a python script from two actors running in different dockers takes 3 mintues on my macbook, while previously with legacy ray it only takes 40 seconds. Profiling results show most of time cycles in raylet are spent on lineage stuff, memory allocation & free together take 50% CPU, while serialization/deserialization takes time too.

This change is a first step to optimize lineage related code. It basically avoids unnecessary allocation/free with lineage hash entries. Experiment shows with the change running the same script takes 2 min 10 sec, which indicates an improvement.

For more contexts, kindly refer to
#2403

Related issue number

2403

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6794/
Test FAILed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left a few small comments!

Also, just out of curiosity, do you know which exact changes led to the most improvement?

// If the new status is greater, then overwrite the current entry.
if (current_entry->GetStatus() < status) {
// If the new status is greater, then overwrite the current entry.
current_entry->SetStatus(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetStatus will return a bool, so you can do if (current_entry->SetStatus(status)) if you'd like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, thanks! Putting them into a single check would be cleaner:-)

auto new_entry = LineageEntry(task, GcsStatus::UNCOMMITTED_READY);
RAY_CHECK(lineage_.SetEntry(std::move(new_entry)));
entry->SetStatus(GcsStatus::UNCOMMITTED_READY);
// TaskSepc is immutable, just update TaskExecSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskSepc -> TaskSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

boost::optional<LineageEntry &> GetEntryMutable(const UniqueID &task_id);

/// Set an entry in the lineage. If an entry with this ID already exists,
/// Set an entry in the lineage. If an entry with this ID already exists,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.


/// Update the dynamic/mutable information for this task.
/// \param task Task structure with updated dynamic information.
void Update(const Task &task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would probably give this a more descriptive name, since we want to be careful when using this method. Maybe something like CopyTaskExecutionSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Update() is a bit too vague. Updated to the suggested name.

@zhijunfu
Copy link
Contributor Author

@stephanie-wang Thanks for reviewing. As I understand, the perf improvement comes from lineage state transactions, with the change it just update the status or task data in-place, without needing to erase the entry from hash and add it back, which involves hash operations and memory alloc/free. I didn't measure the impact of each line though:)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6815/
Test FAILed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can you fix the typo and then I'll merge? The failing tests look unrelated.

entry->SetStatus(GcsStatus::UNCOMMITTED_READY);
// TaskSepc is immutable, just update TaskExecSpec.
entry->TaskDataMutable().Update(task);
// TaskSepc. is immutable, just update TaskExecSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace TaskSepc. with TaskSpec? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated.

@zhijunfu
Copy link
Contributor Author

Updated. Thanks @stephanie-wang

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6857/
Test PASSed.

@stephanie-wang
Copy link
Contributor

Great, thanks! I'll merge assuming the tests pass.

@stephanie-wang stephanie-wang merged commit 9ad6a97 into ray-project:master Jul 26, 2018
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Dec 17, 2025
core gcs related conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants