Skip to content

Commit

Permalink
Retry REDIS_REPLY_ERROR for RedisClient::GetNextJobID (ray-project#33733
Browse files Browse the repository at this point in the history
)

Encountered check failure `redis_client.cc:73: Check failed: reply->type == REDIS_REPLY_INTEGER Expected integer, found Redis type 6 for JobCounter`. This PR retries REDIS_REPLY_ERROR which is 6 and also prints out the error message.


Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
  • Loading branch information
jjyao authored and scottsun94 committed Mar 28, 2023
1 parent f39ce34 commit aa901f1
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion src/ray/gcs/redis_client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,19 @@ static int DoGetNextJobID(redisContext *context) {
redisReply *reply = nullptr;
bool under_retry_limit = RunRedisCommandWithRetries(
context, cmd.c_str(), &reply, [](const redisReply *reply) {
return reply != nullptr && reply->type != REDIS_REPLY_NIL;
if (reply == nullptr) {
RAY_LOG(WARNING) << "Didn't get reply for " << cmd;
return false;
}
if (reply->type == REDIS_REPLY_NIL) {
RAY_LOG(WARNING) << "Got nil reply for " << cmd;
return false;
}
if (reply->type == REDIS_REPLY_ERROR) {
RAY_LOG(WARNING) << "Got error reply for " << cmd << " Error is " << reply->str;
return false;
}
return true;
});
RAY_CHECK(reply);
RAY_CHECK(under_retry_limit) << "No entry found for JobCounter";
Expand Down

0 comments on commit aa901f1

Please sign in to comment.