New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubelet health checking #160

Merged
merged 1 commit into from Nov 15, 2018

Conversation

Projects
None yet
7 participants
@rphillips
Contributor

rphillips commented Nov 8, 2018

This PR adds support to the MCD to enable health checking.

/cc @sjenning @derekwaynecarr

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 8, 2018

If the kubelet goes into an error state, the PR sets 'degraded' within the MCD. We may want a separate state to propagate a kubelet error state.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 8, 2018

/assign ashcrow

@abhinavdahiya

This comment has been minimized.

Member

abhinavdahiya commented Nov 8, 2018

the go routine might race with other loops in MCD.
i think we so should have a single writer for state...

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 8, 2018

Good idea. I'll add a channel to propagate to a writer.

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 12, 2018

@ashcrow @abhinavdahiya added the single writer

@sjenning

This comment has been minimized.

Contributor

sjenning commented Nov 12, 2018

Some questions about the MachineConfigDaemonStateAnnotationKey state machine:

  • is a transition from done to degraded allowed?
  • what transitions a node out of degraded state?
@ashcrow

Added notes about missing golang documentation strings. Though this is looking good!

responseChannel chan error
}
type NodeWriter struct {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

writer chan message
}
func NewNodeWriter() *NodeWriter {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

}
}
func (nw *NodeWriter) Run(stop <-chan struct{}) {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

}
}
func (nw *NodeWriter) SetUpdateDone(client corev1.NodeInterface, node string, dcAnnotation string) error {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

return <-respChan
}
func (nw *NodeWriter) SetUpdateWorking(client corev1.NodeInterface, node string) error {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

return <-respChan
}
func (nw *NodeWriter) SetUpdateDegraded(client corev1.NodeInterface, node string) error {

This comment has been minimized.

@ashcrow

ashcrow Nov 13, 2018

Member

Needs golang docstring.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 13, 2018

Some questions about the MachineConfigDaemonStateAnnotationKey state machine:

  • is a transition from done to degraded allowed?

Yes, but it should be quite rare. The path would be that an update finished but then the end state didn't work as it should. Generally speaking degraded would happen more so when a config can not be applied, or applying failed.

  • what transitions a node out of degraded state?

Human or operator intervention. If the node is degraded it will most likely be re-provisioned. It can be possible for a human to do node surgery to force the node to be compliant, but that is a pattern we are trying to avoid.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 13, 2018

/retest

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 13, 2018

@ashcrow Thanks! Added the godocs in c1e56e7.

Edit: updated the hash after a squash

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 15, 2018

@ashcrow friendly ping

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 15, 2018

Seems like testing is stuck ...

/retest

@abhinavdahiya

This comment has been minimized.

Member

abhinavdahiya commented Nov 15, 2018

@rphillips do you mind compressing the 4 commits to whatever you feel is more appropriate.

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 15, 2018

@abhinavdahiya squashed and rebased.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 15, 2018

Looks good, one more rebase and I'm happy to merge.

@rphillips

This comment has been minimized.

Contributor

rphillips commented Nov 15, 2018

@ashcrow rebased to remove the conflicts. thanks!

@ashcrow

This comment has been minimized.

Member

ashcrow commented Nov 15, 2018

/lgtm

Assuming this passes the testing (again) let's get it merged.

@openshift-ci-robot

This comment has been minimized.

openshift-ci-robot commented Nov 15, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 40a3f68 into openshift:master Nov 15, 2018

4 checks passed

ci/prow/e2e-aws Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/unit Job succeeded.
Details
tide In merge pool.
Details

@rphillips rphillips deleted the rphillips:feat/kubelet_health branch Nov 16, 2018

ashcrow added a commit to ashcrow/machine-config-operator that referenced this pull request Nov 16, 2018

daemon: Drop nodeInformer from onceFrom
The nodeInformer snuck back into daemon.New during
rebasing in openshift#160.

Signed-off-by: Steve Milner <smilner@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment