-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autoscaler] Get_Head_Node should return an up-to-date
node
#14579
Conversation
045aded
to
8d66e53
Compare
Hi @ijrsvt , I realize this issue is happening and thanks for taking the initiative here.
Checking that the status is up to date is not going to catch that. |
I think the right solution is that when creating a head node, if it fails to create it, we should just terminate it. |
@AmeerHajAli The head node is set to
I don't think this is a good idea because this adds a lot of latency to iterating through writing setup/initialization commands. For example, if a user has a typo in their At it's core, this PR is defending against a bad implementation of a NodeProvider (one that does not properly re-use the failed |
The issue is that if there was a bug in the setup commands, the head node will stay running and the status tag will be |
Oh, wait, @ijrsvt, this is not modifying create_or_update_cluster right? |
Yep, this does not change |
Do you mind adding some warning that other head nodes exist in failed states if there are head nodes with non STATUS_UP_TO_DATE tag? |
@AmeerHajAli I can add that, but let me switch my implementation to only call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Why are these changes needed?
get_head_node
can return the failed node. This PR ensures thatget_head_node
only returns a node in theup-to-date
state.Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.