Add reconcile termination handler daemonSet validations #177

enxebre · 2020-07-20T11:36:55Z

This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances.
As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile.
This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer.
This PR cover this e2e.

…tion handler This openshift#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR fix that by passing the event handler to the daemonSet namespaced informer. This will be e2e tested by openshift/cluster-api-actuator-pkg#177

enxebre · 2020-07-20T11:40:02Z

/hold
to run this manually on my cluster

This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer. This PR cover this e2e.

Danil-Grigorev · 2020-07-21T07:53:47Z

pkg/framework/daemonset.go

+// DeleteDaemonSet deletes the specified daemonSet
+func DeleteDaemonSet(c client.Client, ds *kappsapi.DaemonSet) error {
+	return wait.PollImmediate(RetryShort, WaitShort, func() (bool, error) {
+		if err := c.Delete(context.TODO(), ds); err != nil {


Probably need to account for situations when the DaemonSet was not found, as it was already removed.

Danil-Grigorev · 2020-07-21T08:01:10Z

pkg/operators/machine-api-operator.go

+		By(fmt.Sprintf("checking got daemonSet spec matches the initial one"))
+		Expect(framework.IsDaemonSetSynced(client, initialDaemonSet, terminationHandlerDaemonSet, framework.MachineAPINamespace)).To(BeTrue())
+
+		By(fmt.Sprintf("updating got daemonSet spec"))


Nit: could give it a separate It to increase test robustness

JoelSpeed · 2020-07-21T09:01:53Z

pkg/operators/machine-api-operator.go

+		By(fmt.Sprintf("checking daemonSet is available"))
+		Expect(framework.IsDaemonSetAvailable(client, terminationHandlerDaemonSet, framework.MachineAPINamespace)).To(BeTrue())


What does available here mean? Does it mean that all replicas are running? If so, on a default cluster, the daemonset should always be available by virtue of it having no replicas. I think we need to simulate somewhere in the test suite that the daemonset is available and has more than 1 replica. I'm not really sure what this is testing over the daemonset just existing

Perhaps we do that in https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/spot.go#L73? Could follow up later.

This is validating the operator does its job and also that the expectation of having no unavailable replicas is satisfied https://github.com/openshift/cluster-api-actuator-pkg/pull/177/files#diff-a8166de82f0b6261e02122357a0c6096R40.
On a default cluster the expected available happens to be zero. That's circumstantial, this test cover that and literally any other possible scenario scenario. If the default ever changes or if this runs in parallel with any spot instance this test must always still remain green. This let us introducing any change while being confident we are not breaking the expectation.

I think we need to simulate somewhere in the test suite that the daemonset is available and has more than 1 replica

Yes, I'll follow up with PRs to make sure the operator goes degraded if the pod crashloop and a test for it.

JoelSpeed · 2020-07-21T13:08:01Z

Should probably address this #177 (comment), but otherwise I'm happy with this PR

/approve

openshift-ci-robot · 2020-07-21T13:08:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [JoelSpeed]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2020-07-23T12:13:36Z

/retest

openshift-ci-robot · 2020-07-23T14:07:05Z

@enxebre: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-azure-operator	`f7192cc`	link	`/test e2e-azure-operator`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2020-07-26T14:30:40Z

@enxebre: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2020-10-31T01:33:04Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-11-30T03:28:52Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-merge-robot · 2020-12-15T12:54:49Z

@enxebre: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-vsphere-operator	`f7192cc`	link	`/test e2e-vsphere-operator`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-01-14T18:16:48Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-01-14T18:17:08Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from ingvagabund and JoelSpeed July 20, 2020 11:37

enxebre mentioned this pull request Jul 20, 2020

Fix terminationhandler sync openshift/machine-api-operator#648

Merged

enxebre force-pushed the termination-handler-sync-coverage branch from 396c87a to 1405cc5 Compare July 20, 2020 11:39

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2020

enxebre force-pushed the termination-handler-sync-coverage branch from 1405cc5 to 260730b Compare July 20, 2020 11:42

enxebre force-pushed the termination-handler-sync-coverage branch from 260730b to f7192cc Compare July 20, 2020 14:27

Danil-Grigorev reviewed Jul 21, 2020

View reviewed changes

JoelSpeed reviewed Jul 21, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 30, 2020

openshift-ci-robot closed this Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reconcile termination handler daemonSet validations #177

Add reconcile termination handler daemonSet validations #177

enxebre commented Jul 20, 2020 •

edited

enxebre commented Jul 20, 2020 •

edited

Danil-Grigorev Jul 21, 2020

Danil-Grigorev Jul 21, 2020

JoelSpeed Jul 21, 2020

enxebre Jul 21, 2020 •

edited

JoelSpeed Jul 21, 2020

JoelSpeed commented Jul 21, 2020

openshift-ci-robot commented Jul 21, 2020

enxebre commented Jul 23, 2020

openshift-ci-robot commented Jul 23, 2020

openshift-ci-robot commented Jul 26, 2020

openshift-bot commented Oct 31, 2020

openshift-bot commented Nov 30, 2020

openshift-merge-robot commented Dec 15, 2020

openshift-bot commented Jan 14, 2021

openshift-ci-robot commented Jan 14, 2021

		By(fmt.Sprintf("checking daemonSet is available"))
		Expect(framework.IsDaemonSetAvailable(client, terminationHandlerDaemonSet, framework.MachineAPINamespace)).To(BeTrue())

Add reconcile termination handler daemonSet validations #177

Add reconcile termination handler daemonSet validations #177

Conversation

enxebre commented Jul 20, 2020 • edited

enxebre commented Jul 20, 2020 • edited

Danil-Grigorev Jul 21, 2020

Choose a reason for hiding this comment

Danil-Grigorev Jul 21, 2020

Choose a reason for hiding this comment

JoelSpeed Jul 21, 2020

Choose a reason for hiding this comment

enxebre Jul 21, 2020 • edited

Choose a reason for hiding this comment

JoelSpeed Jul 21, 2020

Choose a reason for hiding this comment

JoelSpeed commented Jul 21, 2020

openshift-ci-robot commented Jul 21, 2020

enxebre commented Jul 23, 2020

openshift-ci-robot commented Jul 23, 2020

openshift-ci-robot commented Jul 26, 2020

openshift-bot commented Oct 31, 2020

openshift-bot commented Nov 30, 2020

openshift-merge-robot commented Dec 15, 2020

openshift-bot commented Jan 14, 2021

openshift-ci-robot commented Jan 14, 2021

enxebre commented Jul 20, 2020 •

edited

enxebre commented Jul 20, 2020 •

edited

enxebre Jul 21, 2020 •

edited