Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS]GCS adapts to job table pub sub #8145

Merged

Conversation

ffbin
Copy link
Contributor

@ffbin ffbin commented Apr 23, 2020

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ffbin ffbin requested a review from raulchen April 23, 2020 10:48
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25096/
Test PASSed.

@@ -36,6 +38,7 @@ class DefaultJobInfoHandler : public rpc::JobInfoHandler {

private:
gcs::RedisGcsClient &gcs_client_;
const std::shared_ptr<gcs::GcsPubSub> &gcs_pub_sub_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use std::shared_ptr<gcs::GcsPubSub> gcs_pub_sub_ directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (!status.ok()) {
RAY_LOG(ERROR) << "Failed to add job, job id = " << job_id
<< ", driver pid = " << request.data().driver_pid();
} else {
RAY_LOG(DEBUG) << "Finished adding job, job id = " << job_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use INFO here as well as the very beginning line of this function as this RPC is very low frequency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

} else {
RAY_CHECK_OK(gcs_pub_sub_->Publish(JOB_CHANNEL, job_id.Binary(),
job_table_data->SerializeAsString(), nullptr));
RAY_LOG(DEBUG) << "Finished marking job state, job id = " << job_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use INFO here as well as the beginning line of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (!status.ok()) {
RAY_LOG(ERROR) << "Failed to mark job state, job id = " << job_id;
} else {
RAY_CHECK_OK(gcs_pub_sub_->Publish(JOB_CHANNEL, job_id.Binary(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this lead to pub twice because that the AsyncMarkFinished will publish too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also change the job subscribe code which only subscribe gcs_pub_sub_ publish messages.

@@ -81,6 +81,8 @@ class RAY_EXPORT RedisGcsClient : public GcsClient {
return redis_client_->GetPrimaryContext();
}

std::shared_ptr<RedisClient> GetRedisClient() { return redis_client_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::shared_ptr GetRedisClient() const { return redis_client_; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25113/
Test PASSed.

@raulchen raulchen merged commit 713e375 into ray-project:master Apr 24, 2020
@raulchen raulchen deleted the dev_gcs_adapts_to_job_table_pubsub branch April 24, 2020 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants