New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1808971: Machine status shows "running" when an instance was terminated #575
Bug 1808971: Machine status shows "running" when an instance was terminated #575
Conversation
@Danil-Grigorev: This pull request references Bugzilla bug 1808971, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems to duplicate provider logic https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machine/reconciler.go#L335
This can break the consistency because we set same annotation in 2 different places. However, the idea of setting this in one place is nice. |
@@ -426,9 +426,18 @@ func (r *ReconcileMachine) setPhase(machine *machinev1.Machine, phase string, er | |||
machine.Status.LastUpdated = &now | |||
if phase == phaseFailed && errorMessage != "" { | |||
machine.Status.ErrorMessage = &errorMessage | |||
if machine.Annotations == nil { | |||
machine.Annotations = map[string]string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could save the if:
annotations := m.GetAnnotations()
annotations["key"] = "value"
m.SetAnnotations(annotations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause a panic on Annotations
being nil, especially in tests. I'm not sure how to avoid nil check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing where exactly would this panic? I'd expect GetAnnotations to give you an empty struct slice, may be I'm wrong. I'm not totally sure what the API machinery defaulting will return here. You're right the check is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is needed, apimachinery will happily return a nil map
https://github.com/kubernetes/apimachinery/blob/7ae370969693753028c74c0c876eee17ddb1b15b/pkg/apis/meta/v1/meta.go#L165
pkg/controller/machine/controller.go
Outdated
if machine.Annotations == nil { | ||
machine.Annotations = map[string]string{} | ||
} | ||
machine.Annotations[MachineInstanceStateAnnotationName] = "Unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this into a constant?
pkg/controller/machine/controller.go
Outdated
if machine.Annotations == nil { | ||
machine.Annotations = map[string]string{} | ||
} | ||
machine.Annotations[MachineInstanceStateAnnotationName] = "Unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only set unknown here https://github.com/openshift/machine-api-operator/pull/575/files#diff-b96ce5bf8686fb02d855188000ae1cb4R299-R304
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another place, where the machine will get into Failed pahse, and the VM would not be monitored/updated anymore:
machine-api-operator/pkg/controller/machine/controller.go
Lines 313 to 316 in 46410db
if isInvalidMachineConfigurationError(err) { | |
if err := r.setPhase(m, phaseFailed, err.Error()); err != nil { | |
return reconcile.Result{}, err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, in the other scenario the provider actually have a chance to set a valid state as it goes failed so might good to leave it. Happy to hear others thoughts.
c10a93c
to
96bb756
Compare
@@ -426,9 +426,18 @@ func (r *ReconcileMachine) setPhase(machine *machinev1.Machine, phase string, er | |||
machine.Status.LastUpdated = &now | |||
if phase == phaseFailed && errorMessage != "" { | |||
machine.Status.ErrorMessage = &errorMessage | |||
if machine.Annotations == nil { | |||
machine.Annotations = map[string]string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is needed, apimachinery will happily return a nil map
https://github.com/kubernetes/apimachinery/blob/7ae370969693753028c74c0c876eee17ddb1b15b/pkg/apis/meta/v1/meta.go#L165
if err := r.Client.Patch(context.Background(), machine, baseToPatch); err != nil { | ||
klog.Errorf("Failed to update machine %q: %v", machine.GetName(), err) | ||
return err | ||
} | ||
} | ||
if err := r.Client.Status().Patch(context.Background(), machine, baseToPatch); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check to see if there are conflict errors coming through now? Can't remember if Patch needs an up to date resource version or not for the patch call to succeed
@@ -382,7 +382,7 @@ func TestSetPhase(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
if *lastUpdated != *got.Status.LastUpdated { | |||
t.Errorf("Expected: %v, got: %v", *lastUpdated, *got.Status.LastUpdated) | |||
t.Errorf("Expected: %v, got: %v", *lastUpdated, got.Status.LastUpdated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed? I don't see how it relates 🤔
@@ -401,12 +403,21 @@ func TestSetPhase(t *testing.T) { | |||
} | |||
// validate persisted object | |||
if expecterErrorMessage != *got.Status.ErrorMessage { | |||
t.Errorf("Expected: %v, got: %v", expecterErrorMessage, *got.Status.ErrorMessage) | |||
t.Errorf("Expected: %v, got: %v", expecterErrorMessage, got.Status.ErrorMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, why is this deref changed?
t.Fatal("Got phase nil") | ||
} | ||
if *got.Status.Phase != phaseFailed { | ||
t.Errorf("Got: %v, expected: %v", *got.Status.Phase, phaseRunning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the assertions around here are this way around
t.Errorf("Got: %v, expected: %v", *got.Status.Phase, phaseRunning) | |
t.Errorf("Expected: %v, got: %v", phaseFailed, *got.Status.Phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed the fact that all those fields are pointers :D Thanks for the tip, will revert it.
c1656b4
to
1de9a9b
Compare
Bug 1808971: Machine status shows "running" when an instance was terminated
1de9a9b
to
c38b02f
Compare
@Danil-Grigorev: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
if machine.Annotations == nil { | ||
machine.Annotations = map[string]string{} | ||
} | ||
machine.Annotations[MachineInstanceStateAnnotationName] = unknownInstanceState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might cause skew if the actuators set something in the provider.Status. We might consolidate to consolidate that.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@Danil-Grigorev: All pull requests linked via external trackers have merged: openshift/machine-api-operator#575. Bugzilla bug 1808971 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Setting machine instance state to unknown on machine failure