-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Placement Group] Make the creation of placement group sync #13858
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd like to have this change, we should make sure the HandleCreatePlacementGroup will not have this code;
pending_register_iter->second = std::move(callback);
I am not sure if this is necessary.
I think this is necessary, image this case: |
That's why we have .wait and .ready right? Also again, if we'd like to make this change, we shouldn't store the callback for CreatePlacementGroup. We should always reply right away like actor cases. Otherwise, there's no meaning of .wait and .ready APIs. |
I think the |
Currently, the creation is replied "after" registration is done. I am not against this idea, but then please fix this issue. |
discussed with @rkooo567 offline, we should return success immediately when |
/// Callbacks of pending `RegisterPlacementGroup` requests. | ||
/// Maps placement group ID to placement group registration callbacks, which is used to | ||
/// filter duplicated messages from a driver/worker caused by some network problems. | ||
absl::flat_hash_map<PlacementGroupID, std::vector<StatusCallback>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change it to placement_group_to_register_callbacks_
? I'm afraid the design is a bit redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just like the registration of Actor
, there will be the deconstruction problem if we just ignore the previous callback, I've checked this from @raulchen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clay4444 Can you describe the specific problem in detail? thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... I'm not sure whether I can explain this clearly, the general reason is that the GrpcClient
will not be deconstructed since we just ignore the callback and no response was sent to the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a network exception, reply also cannot be returned.
Lint failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of last comments! After you address all of them, and if lint passes, I will merge the PR :)!
iter->second(Status::NotFound(stream.str())); | ||
placement_group_to_register_callback_.erase(iter); | ||
} | ||
// The placement group registration is synchronous, so if we found the placement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment!
SchedulePendingPlacementGroups(); | ||
} else { | ||
// The placement group registration is synchronous, so if we found the placement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussed with @ffbin , we don't need to invoke RAY_CHECK(placement_group_to_register_callback_.count(placement_group_id) == 0)
here, because we will remove the placement group from registered_placement_groups_ and placement_group_to_register_callbacks_ only in the RemovePlacementGroup
@rkooo567
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you at least make the log message to ERROR and say "This is a bug that should be addressed. Please report to Ray Github issue if you see this message"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer to have a RAY_CHECK for strong consistency. What's the reason you guys prefer the log message? If we see the check failure, that means we just need to fix the issue rather than ignoring with the log messages.
Wait I probably needed to have a API change approval. |
Okay approved :) |
Why are these changes needed?
test_automatic_cleanup_detached_actors
Related issue number
#13859
Checks
scripts/format.sh
to lint the changes in this PR.