Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grpc] Add grpc server to worker #5054

Merged
merged 30 commits into from Jul 4, 2019

Conversation

zhijunfu
Copy link
Contributor

@zhijunfu zhijunfu commented Jun 28, 2019

What do these changes do?

Add a gRPC server to the worker that registers with the raylet (but no communication through here yet). Also convert RayletClient references to unique_ptr.

For more details, refer to #5039

Related issue number

#5029

Linter

  • [Y] I've run scripts/format.sh to lint the changes in this PR.

src/ray/core_worker/core_worker.h Outdated Show resolved Hide resolved
src/ray/rpc/worker_client.h Outdated Show resolved Hide resolved
src/ray/rpc/worker_server.h Outdated Show resolved Hide resolved
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
@raulchen
Copy link
Contributor

Could you merge master and resolve conflicts?

@zhijunfu
Copy link
Contributor Author

Could you merge master and resolve conflicts?

Done, thanks.

BUILD.bazel Outdated Show resolved Hide resolved
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14943/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14944/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14945/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14947/
Test FAILed.

src/ray/core_worker/core_worker.cc Outdated Show resolved Hide resolved
src/ray/core_worker/task_execution.cc Outdated Show resolved Hide resolved
src/ray/core_worker/core_worker.cc Outdated Show resolved Hide resolved
src/ray/rpc/worker/worker_client.h Show resolved Hide resolved
src/ray/rpc/worker/worker_server.h Show resolved Hide resolved
src/ray/rpc/client_call.h Show resolved Hide resolved
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14974/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1403/
Test FAILed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the core worker look okay, but I think we should remove any unused code from this PR before merging. Generally, we should avoid merging unused code completely.

src/ray/core_worker/core_worker.cc Outdated Show resolved Hide resolved
@zhijunfu
Copy link
Contributor Author

zhijunfu commented Jul 1, 2019

Comments are addressed. This also merges the content of #5062 to avoid future code conflicts.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14996/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1418/
Test PASSed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the unused code for the AssignTask request in this round, but let's please try to avoid merging any unused code in the future.

Also, thanks for merging with #5062. However, because that PR is still under review, I cannot approve this PR until #5062 is merged. There are 2 options: you can either wait for #5062 (and you may have to merge again if it changes) or you can roll back the merge.

}

Status CoreWorkerRayletTaskReceiver::SetTaskHandler(const TaskHandler &callback) {
task_handler_ = callback;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is some reason we'd overwrite this, I think this should be set as part of the constructor to avoid calling an uninitialized task_handler_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function from this PR. Next pr would initialize the task_handler_ in constructor.

src/ray/core_worker/transport/raylet_transport.cc Outdated Show resolved Hide resolved
///
/// \param[in] request The request message.
/// \param[out] reply The reply message.
/// \param[in] done_callback The callback to be called when the request is done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems brittle to require that done_callback be called inside HandleAssignTask. Can we instead pass in a lambda that calls HandleAssignTask, then the done_callback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about refactor ServerCall to call done_callback there instead, but as it would require changing node manager as well as object manager, I'd prefer to do it via a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem necessary. I think we can just wrap HandleAssignTask in a lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I'm not quite sure what the suggested change would be. May you add an example code? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was suggesting that instead of passing in &WorkerTaskHandler::HandleAssignTask into ServerCallFactoryImpl, we could pass in something like:

[HandleAssignTask]( ... , done_callback) { HandleAssignTask(...); done_callback(); }

But actually, I think if I'm understanding the code correctly, what you're suggesting about doing it in ServerCall is the same thing. So agreed, let's just do it in a future PR.

src/ray/protobuf/worker.proto Outdated Show resolved Hide resolved
src/ray/raylet/worker.cc Show resolved Hide resolved
src/ray/rpc/worker/worker_server.h Outdated Show resolved Hide resolved
src/ray/rpc/worker/worker_server.h Show resolved Hide resolved
zhijunfu and others added 4 commits July 2, 2019 18:07
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15025/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15026/
Test PASSed.

@stephanie-wang
Copy link
Contributor

Thanks, @zhijunfu. However, can you please go ahead and roll back the changes that you took from #5062? Otherwise, we can't merge this as there is a lot of other code in the PR from 5062. Thanks!

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Jul 3, 2019

Thanks, @zhijunfu. However, can you please go ahead and roll back the changes that you took from #5062? Otherwise, we can't merge this as there is a lot of other code in the PR from 5062. Thanks!

I saw that one has already been merged to master, so I just sync'ed with latest master. thanks.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15056/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1443/
Test FAILed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stephanie-wang
Copy link
Contributor

Btw, looks like there is a build error on Linux and a lint error.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1444/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15090/
Test FAILed.

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Jul 4, 2019

Btw, looks like there is a build error on Linux and a lint error.

Thanks. Fixed. The remaining errors look unrelated.

@jovany-wang jovany-wang merged commit 54d5969 into ray-project:master Jul 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants