Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for query heartbeat from coordinator that is shutting down #16322

Merged
merged 1 commit into from
Sep 17, 2021

Conversation

abhiseksaikia
Copy link
Contributor

Currently query heartbeat is failing to register if it comes from a coordinator node which is shutting down. The fix here is to consider query heartbeat from a shutting down coordinator as a valid one.

Test plan - unit test

== NO RELEASE NOTE ==

@@ -141,8 +144,10 @@ public void registerQueryHeartbeat(String nodeId, BasicQueryInfo basicQueryInfo)
{
requireNonNull(nodeId, "nodeId is null");
requireNonNull(basicQueryInfo, "basicQueryInfo is null");
Stream<InternalNode> activeOrShuttingDownCoordinators = concat(internalNodeManager.getCoordinators().stream(),
internalNodeManager.getNodes(SHUTTING_DOWN).stream().filter(InternalNode::isCoordinator));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to expose Shutting down coordinator as a seperate method from InternalNodeManager. i.e. 'getShuttingDownCoordinator` or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering, what value are we going to get with this check here where a coordinator sends it's heartbeat to the RM and we check if it belongs to the same set of nodes or not? We don't do same for the node heartbeat. @tdcmeehan Any specific reason for this check here or can we remove this entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other reason other than a scenario where RM receives a heart beat from a coordinator that does not belong to the cluster, not sure if this is possible. I will wait for Tim to comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of a check to ensure that we're not making decisions based on heartbeats which are inconsistent with discovery service. I would like to eventually add a similar check for node heartbeats.

@abhiseksaikia abhiseksaikia force-pushed the asaikia_coord_drain branch 2 times, most recently from 02be719 to 7bc2f97 Compare June 24, 2021 17:02
Currently query heartbeat is failing to register if it comes from a coordinator node which is shutting down. The fix here is to consider query heartbeat from a shutting down coordinator as a valid one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants