New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reconcile termination handler daemonSet validations #177
Add reconcile termination handler daemonSet validations #177
Conversation
…tion handler This openshift#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR fix that by passing the event handler to the daemonSet namespaced informer. This will be e2e tested by openshift/cluster-api-actuator-pkg#177
396c87a
to
1405cc5
Compare
/hold |
1405cc5
to
260730b
Compare
This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer. This PR cover this e2e.
260730b
to
f7192cc
Compare
// DeleteDaemonSet deletes the specified daemonSet | ||
func DeleteDaemonSet(c client.Client, ds *kappsapi.DaemonSet) error { | ||
return wait.PollImmediate(RetryShort, WaitShort, func() (bool, error) { | ||
if err := c.Delete(context.TODO(), ds); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to account for situations when the DaemonSet was not found, as it was already removed.
By(fmt.Sprintf("checking got daemonSet spec matches the initial one")) | ||
Expect(framework.IsDaemonSetSynced(client, initialDaemonSet, terminationHandlerDaemonSet, framework.MachineAPINamespace)).To(BeTrue()) | ||
|
||
By(fmt.Sprintf("updating got daemonSet spec")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could give it a separate It
to increase test robustness
By(fmt.Sprintf("checking daemonSet is available")) | ||
Expect(framework.IsDaemonSetAvailable(client, terminationHandlerDaemonSet, framework.MachineAPINamespace)).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does available here mean? Does it mean that all replicas are running? If so, on a default cluster, the daemonset should always be available by virtue of it having no replicas. I think we need to simulate somewhere in the test suite that the daemonset is available and has more than 1 replica. I'm not really sure what this is testing over the daemonset just existing
Perhaps we do that in https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/spot.go#L73? Could follow up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is validating the operator does its job and also that the expectation of having no unavailable replicas is satisfied https://github.com/openshift/cluster-api-actuator-pkg/pull/177/files#diff-a8166de82f0b6261e02122357a0c6096R40.
On a default cluster the expected available happens to be zero. That's circumstantial, this test cover that and literally any other possible scenario scenario. If the default ever changes or if this runs in parallel with any spot instance this test must always still remain green. This let us introducing any change while being confident we are not breaking the expectation.
I think we need to simulate somewhere in the test suite that the daemonset is available and has more than 1 replica
Yes, I'll follow up with PRs to make sure the operator goes degraded if the pod crashloop and a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, SGTM
Should probably address this #177 (comment), but otherwise I'm happy with this PR /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@enxebre: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@enxebre: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
@enxebre: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances.
As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile.
This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer.
This PR cover this e2e.