[OCPCLOUD-803] Run Spot Termination Handler from Machine API Operator #535

JoelSpeed · 2020-03-24T16:16:25Z

This PR enables the Machine API Operator to run the Termination Handler added to the AWS cloud provider in openshift/cluster-api-provider-aws#308 by running a DaemonSet on nodes labelled as interruptible

Need to test on a real cluster before this merges

/hold

JoelSpeed · 2020-03-25T12:07:43Z

/unhold

I've verified this deploys the Daemonset as expected

enxebre · 2020-03-26T08:46:58Z

pkg/operator/sync.go

@@ -41,7 +43,16 @@ func (optr *Operator) syncAll(config *OperatorConfig) error {
 	}

 	if config.Controllers.TerminationHandler != clusterAPIControllerNoOp {
-
+		if err := optr.syncTerminationHandler(config); err != nil {


can we make this call within syncClusterAPIController so we make sure we wrap all the logic only once to report status appropriately based on whatever errors are reported from the inner calls?

Also within there may be invert the logic to 1 - error earlier, 2 - logging no Op scenario i.e:`

if config.Controllers.Provider == config.Controllers.TerminationHandler != clusterAPIControllerNoOp { log mao will no op return nil } logic goes here

Can do the return earlier with NoOp, that makes sense, though the comparison above doesn't tell us that the Provider or TerminationHandler aren't NoOp, would just tell us that one of them isn't right? We'd still have to check later?

I'd be tempted to just check the Provider is NoOp, return early, otherwise continue and then check later that TerminationHandler is or isn't and run appropriately

sorry that was a typo I meant
if config.Controllers.TerminationHandler == clusterAPIControllerNoOp then return

We should probably for the same for line 32 but no need to go in this PR.

enxebre · 2020-03-26T08:46:59Z

pkg/operator/sync.go

@@ -39,6 +39,10 @@ func (optr *Operator) syncAll(config *OperatorConfig) error {
 		glog.V(3).Info("Synced up all machine-api-controller components")
 	}

+	if config.Controllers.TerminationHandler != clusterAPIControllerNoOp {


should we do this within syncClusterAPIController so it's all covered by the wrapping here and we don't have to deduplicate the status* logic?

Which status logic are you referring to here? Do you mean the waiting for the rollouts? I don't think there's any duplication of status apart from the waiting, but they have to be different because the fields are different in DaemonSets vs Deployments

As far as I can tell the status is only updated once this method returns

enxebre · 2020-03-26T08:47:51Z

pkg/operator/sync.go

+			Containers:         containers,
+			PriorityClassName:  "system-node-critical",
+			NodeSelector:       map[string]string{machinecontroller.MachineInterruptibleInstanceLabelName: ""},
+			ServiceAccountName: "machine-api-termination-handler",


const for machine-api-termination-handler?

enxebre · 2020-03-26T08:55:44Z

looks great, dropped some comments suggesting all the reconciliation work is covered by only once status* workflow. Needs rebase after #536 got in.

enxebre · 2020-03-26T15:08:30Z

/approve

openshift-ci-robot · 2020-03-26T15:08:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

elmiko

/lgtm

openshift-bot · 2020-03-26T16:47:44Z

/retest