New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPCLOUD-803] Run Spot Termination Handler from Machine API Operator #535
[OCPCLOUD-803] Run Spot Termination Handler from Machine API Operator #535
Conversation
5f4f6c8
to
92bdd71
Compare
/unhold I've verified this deploys the Daemonset as expected |
pkg/operator/sync.go
Outdated
@@ -41,7 +43,16 @@ func (optr *Operator) syncAll(config *OperatorConfig) error { | |||
} | |||
|
|||
if config.Controllers.TerminationHandler != clusterAPIControllerNoOp { | |||
|
|||
if err := optr.syncTerminationHandler(config); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this call within syncClusterAPIController
so we make sure we wrap all the logic only once to report status appropriately based on whatever errors are reported from the inner calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also within there may be invert the logic to 1 - error earlier, 2 - logging no Op scenario i.e:`
if config.Controllers.Provider == config.Controllers.TerminationHandler != clusterAPIControllerNoOp {
log mao will no op
return nil
}
logic goes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do the return earlier with NoOp, that makes sense, though the comparison above doesn't tell us that the Provider
or TerminationHandler
aren't NoOp
, would just tell us that one of them isn't right? We'd still have to check later?
I'd be tempted to just check the Provider
is NoOp
, return early, otherwise continue and then check later that TerminationHandler
is or isn't and run appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that was a typo I meant
if config.Controllers.TerminationHandler == clusterAPIControllerNoOp then return
We should probably for the same for line 32 but no need to go in this PR.
pkg/operator/sync.go
Outdated
@@ -39,6 +39,10 @@ func (optr *Operator) syncAll(config *OperatorConfig) error { | |||
glog.V(3).Info("Synced up all machine-api-controller components") | |||
} | |||
|
|||
if config.Controllers.TerminationHandler != clusterAPIControllerNoOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do this within syncClusterAPIController
so it's all covered by the wrapping here and we don't have to deduplicate the status*
logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which status logic are you referring to here? Do you mean the waiting for the rollouts? I don't think there's any duplication of status
apart from the waiting, but they have to be different because the fields are different in DaemonSets vs Deployments
As far as I can tell the status is only updated once this method returns
pkg/operator/sync.go
Outdated
Containers: containers, | ||
PriorityClassName: "system-node-critical", | ||
NodeSelector: map[string]string{machinecontroller.MachineInterruptibleInstanceLabelName: ""}, | ||
ServiceAccountName: "machine-api-termination-handler", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const for machine-api-termination-handler
?
looks great, dropped some comments suggesting all the reconciliation work is covered by only once status* workflow. Needs rebase after #536 got in. |
e4da5a3
to
c11a68d
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@JoelSpeed: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…tion handler This openshift#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR fix that by passing the event handler to the daemonSet namespaced informer. This will be e2e tested by openshift/cluster-api-actuator-pkg#177
This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer. This PR cover this e2e.
This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer. This PR cover this e2e.
This openshift/machine-api-operator#535 introduced support to manage a damonSet which runs termination handler for spot intances. As an event handler is not passed to the damonSet informer changes to the resource won't trigger a reconcile. This PR openshift/machine-api-operator#648 fixes that by passing the event handler to the daemonSet namespaced informer. This PR cover this e2e.
This PR enables the Machine API Operator to run the Termination Handler added to the AWS cloud provider in openshift/cluster-api-provider-aws#308 by running a DaemonSet on nodes labelled as interruptible
Need to test on a real cluster before this merges
/hold