New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPCLOUD-802] Add Spot terminationion notice handler #308
[OCPCLOUD-802] Add Spot terminationion notice handler #308
Conversation
ae271ed
to
866178c
Compare
pkg/termination/handler.go
Outdated
switch resp.StatusCode { | ||
case http.StatusNotFound: | ||
// Instance not terminated yet | ||
klog.V(2).Infof("Instance not marked for termination") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we prepend the machine/node here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi tempted to switch this over to the logr
style structured logging that CAPI uses, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
pkg/termination/handler.go
Outdated
} | ||
|
||
// Will only get here if the termination endpoint returned 200 | ||
klog.V(1).Infof("Instance marked for termination, deleting Machine \"%s/%s\"", machine.Namespace, machine.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be prepend with machineName everywhere? see comment above
pkg/termination/handler.go
Outdated
@@ -90,5 +92,17 @@ func (h *handler) run(ctx context.Context, wg *sync.WaitGroup) error { | |||
|
|||
// getMachineForNodeName finds the Machine associated with the Node name given | |||
func (h *handler) getMachineForNode(ctx context.Context) (*machinev1.Machine, error) { | |||
machineList := &machinev1.MachineList{} | |||
err := h.client.List(ctx, machineList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think would it worth it to rather create a manager that handles the client and informer so we can index by nodeRef?
We can probably include that later if we want.
https://github.com/kubernetes-sigs/controller-runtime/blob/a7c8a93c1cf395974911ef0ece977a7d27b2bf4b/pkg/manager/manager.go#L320-L335
https://github.com/kubernetes-sigs/controller-runtime/blob/f7e674bbb6c9c122753681a15e7f57c483a5c121/pkg/cache/cache.go#L110-L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we need a full blown manager, but a DelegatingClient
with the cache as you've suggested could definitely be good.
I do wonder whether that's overkill though, since this program only calls the API twice, once to list, and once to delete, so the only benefit to adding the cache is on that list
That said, I am now also thinking about whether we should Get
the machine before Deleting
, not sure if the resource version needs to be up to date for a Delete
, will check
Had a first pass, this looks great. |
866178c
to
0884b9b
Compare
0884b9b
to
1fc9cf6
Compare
/retest |
@enxebre I believe I have addressed all of your feedback, can I get another review please |
/unhold |
/retest |
b978bfd
to
fc95567
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest Please review the full test history for this PR and help us cut down flakes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
This adds a spot termination handler to the AWS actuator image.
This binary will run on nodes and poll the AWS metadata service to check if a node has been marked for termination or not.
Based on the spot-instances proposal
This has a similar purpose to the project https://github.com/aws/aws-node-termination-handler, however differs in a number of ways: