New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidate node deletion logic between kube-controller-manager and cloud-controller-manager #70344

Merged
merged 1 commit into from Dec 17, 2018

Conversation

@andrewsykim
Copy link
Member

andrewsykim commented Oct 29, 2018

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
Consolidates node deletion logic between kube-controller-manager and cloud-controller-manager by creating a new "cloud node lifecycle" controller. This new controller pulls the node deletion logic from both the node lifecycle controller (KCM) and the cloud node controller (CCM) which more or less have the same logic with a few exceptions. The exceptions are mostly differences in semantics (different method names, errors, using informers instead of client, etc) and not actual logic.

This PR is needed for several reasons:

  • avoid divergence of node deletion logic between kube-controller-manager (in-tree) and cloud-controller-manager (out-of-tree).
  • removes dependency to cloud providers in node lifecycle controller
  • allows us to disable node deletion controller only without deleting node lifecycle controller which is something we need as part of the in-tree -> out-of-tree migration.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

/sig cloud-provider
/assign @cheftako

@k8s-ci-robot k8s-ci-robot requested review from ingvagabund and luxas Oct 29, 2018

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch from 89a5978 to f0c137e Oct 29, 2018

@k8s-ci-robot k8s-ci-robot added size/XXL and removed size/XL labels Oct 29, 2018

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch 2 times, most recently from ecfee42 to 5590537 Oct 29, 2018

@andrewsykim andrewsykim changed the title consolidate node deletion logic between node lifecycle and cloud node controller consolidate node deletion logic between kube-controller-manager and cloud-controller-manager Oct 29, 2018

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch 2 times, most recently from 0a37f68 to 561fe59 Oct 29, 2018

go wait.Until(c.GarbageCollect, c.nodeMonitorPeriod, stopCh)
}

func (c *CloudNodeGCController) GarbageCollect() {

This comment has been minimized.

@andrewsykim

andrewsykim Oct 29, 2018

Member

This code is pulled straight out of KCM/CCM with the method name changed from MonitorNode -> GarbageCollect

var (
// ErrCloudInstance occurs when the cloud provider does not support
// the Instances API.
ErrCloudInstance = errors.New("cloud provider doesn't support instances")

This comment has been minimized.

@andrewsykim

andrewsykim Oct 29, 2018

Member

There's nothing in k8s/k8s that actually checks for this error so removed it.

@@ -171,36 +153,6 @@ func MarkAllPodsNotReady(kubeClient clientset.Interface, node *v1.Node) error {
return fmt.Errorf("%v", strings.Join(errMsg, "; "))
}

// ExistsInCloudProvider returns true if the node exists in the
// cloud provider.
func ExistsInCloudProvider(cloud cloudprovider.Interface, nodeName types.NodeName) (bool, error) {

This comment has been minimized.

@andrewsykim

andrewsykim Oct 29, 2018

Member

replaced by ensureNodeExistsByProviderID which is functionally the same but uses providerID if it exists, otherwise uses InstanceID.

@cheftako

This comment has been minimized.

Copy link
Member

cheftako commented Oct 29, 2018

/cc @mtaufen as I know he had been looking at working on this.

testName string
node *v1.Node
existsByProviderID bool
shutdown bool

This comment has been minimized.

@cheftako

cheftako Oct 29, 2018

Member

Sadly a lot of the cloud providers have not implemented this call yet and so rather than returning true/false they return a not implemented error. Can we have a test for that case? (For GCE for instance it would return error for shutdown but would correctly return true/false for existsByProviderID)

This comment has been minimized.

@andrewsykim
func shutdownInCloudProvider(ctx context.Context, cloud cloudprovider.Interface, node *v1.Node) (bool, error) {
instances, ok := cloud.Instances()
if !ok {
return false, errors.New("cloud provider does not support instances")

This comment has been minimized.

@cheftako

cheftako Oct 29, 2018

Member

Wouldn't it make more sense to check this in start/run and just skip the controller entirely in that case? Especially as GarbageCollect will blow up before you get here if cloud.Instances is not ok.

This comment has been minimized.

@andrewsykim

andrewsykim Oct 29, 2018

Member

I had the same thought and added it here https://github.com/kubernetes/kubernetes/pull/70344/files#diff-da5354d582a4b27381b8cee161e46163R82, but then figured it can't hurt to check it here too

This comment has been minimized.

@cheftako

cheftako Oct 29, 2018

Member

On that note does it make sense to actually error out on the new? I think that would make the KCM (https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/controllermanager.go#L486) crash for any cloud provider who does not support instances and has this controller enabled. I grant that enabling this controller in that case is probably a bug but I am not sure its a bug worth crashing over.

This comment has been minimized.

@andrewsykim

andrewsykim Oct 29, 2018

Member

I did what service controller does, which is to log the error but still return nil so kube-controller-manager doesn't crash. See here

func startCloudNodeGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) {
	cloudNodeGCController, err := cloudcontroller.NewCloudNodeGarbageCollectorController(
		ctx.ClientBuilder.ClientOrDie("cloud-node-gc-controller"),
		ctx.Cloud,
		ctx.ComponentConfig.KubeCloudShared.NodeMonitorPeriod.Duration,
	)
	if err != nil {
		// the controller manager should continue to run if the "Instances" interface is not
		// supported, though it's unlikely for a cloud provider to not support it
		glog.Errorf("failed to start cloud node garbage collector controller: %v", err)
		return nil, false, nil
	}
	go cloudNodeGCController.Run(ctx.Stop)
	return nil, true, nil
}

This comment has been minimized.

@cheftako

cheftako Oct 30, 2018

Member

Good call

@mtaufen mtaufen self-assigned this Oct 29, 2018

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch from 561fe59 to ced3a1e Oct 30, 2018

}
}

// shutdownInCloudProvider returns true if the node is shutdowned in

This comment has been minimized.

@cheftako

cheftako Oct 30, 2018

Member

Minor quibble but I believe the past tense of shutdown is shutdown.
(https://en.wiktionary.org/wiki/shut_down)

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch from ced3a1e to 89a359f Oct 30, 2018

@mtaufen
Copy link
Contributor

mtaufen left a comment

Thanks for this! We've wanted to do something like this for a long time.

I gave the control loop a first pass, there are still some pieces I have to review and I have yet to look at the tests.

eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")})

if kubeClient == nil {
glog.Fatal("kube client is nil while starting cloud node deletion controller")

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

Does it make more sense to return an error here as well? Generally prefer to avoid fatal errors in library code, since they aren't self-contained.

eventBroadcaster.StartLogging(glog.Infof)

glog.V(0).Infof("Sending events to api server.")
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")})

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

Would it make more sense to start this in Run?


eventBroadcaster := record.NewBroadcaster()
recorder := eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "cloud-node-gc-controller"})
eventBroadcaster.StartLogging(glog.Infof)

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

Would it make more sense to start this in Run?

nodeMonitorPeriod time.Duration
}

func NewCloudNodeGarbageCollectorController(

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

If we pass a node informer in here, we can use its lister instead of directly listing from the kubeClient, which lists from the informer cache which will result in less direct load on the API server.

E.g. current kube node lifecycle controller does these things:

func (c *CloudNodeGCController) Run(stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()

go wait.Until(c.GarbageCollect, c.nodeMonitorPeriod, stopCh)

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

If Run is already called via a goroutine, do we also need to call wait.Until in a goroutine as well?

@cheftako is there a convention for whether to put the go in Run or outside Run for the controller manager code?

}
glog.V(2).Infof("Recording %s event message for node %s", "DeletingNode", node.Name)

c.recorder.Eventf(ref, v1.EventTypeNormal, fmt.Sprintf("Deleting Node %v because it's not present according to cloud provider", node.Name), "Node %s event: %s", node.Name, "DeletingNode")

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

nodeutil.RecordNodeEvent can simplify the event creation/recording ("k8s.io/kubernetes/pkg/controller/util/node")


// Check with the cloud provider to see if the node still exists. If it
// doesn't, delete the node immediately.
exists, err := ensureNodeExistsByProviderID(instances, node)

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

It may be more minimal to just directly call nodeutil.ExistsInCloudProvider(c.cloud, types.NodeName(node.Name)).

This comment has been minimized.

@andrewsykim

andrewsykim Oct 30, 2018

Member

Same here, that method was removed in favour of this method

This comment has been minimized.

@mtaufen

mtaufen Oct 31, 2018

Contributor

Ah, I see, sounds good then

// we need to check this first to get taint working in similar in all cloudproviders
// current problem is that shutdown nodes are not working in similar way ie. all cloudproviders
// does not delete node from kubernetes cluster when instance it is shutdown see issue #46442
shutdown, err := shutdownInCloudProvider(context.TODO(), c.cloud, node)

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

It may be more minimal to just directly call nodeutil.ShutdownInCloudProvider(context.TODO(), c.cloud, node).

This comment has been minimized.

@andrewsykim

andrewsykim Oct 30, 2018

Member

That method was removed and copied into this file (but private) because this is the only controller that uses it now.

This comment has been minimized.

@mtaufen

mtaufen Oct 31, 2018

Contributor

Ah, I see, sounds good then

var currentReadyCondition *v1.NodeCondition
node := &nodes.Items[i]
// Try to get the current node status
// If node status is empty, then kubelet has not posted ready status yet. In this case, process next node

This comment has been minimized.

@mtaufen

mtaufen Oct 30, 2018

Contributor

What if the node was deleted or shut down in the cloud provider before the Kubelet could post a ready status?

It might be more robust to assume v1.ConditionUnknown if there is no ready condition on the node yet, and then call directly into the cloud to see if a VM exists/is shut down. Since in the unknown case we'd then always check the cloud provider directly, and only take action when the cloud reports deletion or shut-down, we can probably also avoid the retries/additional Get requests.

And then if we're also listing from the informer cache, that will help to reduce any additional load on the API server that splitting out these control loops would create.

If we then went a little further and moved the body of this for loop into a helper function, the initial status checks might collapse to something like this, at which point we can un-nest a lot of the subsequent logic:

// Default status to v1.ConditionUnknown (in fact, this is the same as the default "fake" status from tryUpdateNodeHealth in the kube node lifecycle controller)
status := v1.ConditionUnknown
if _, c := v1node.GetNodeCondition(&node.Status, v1.NodeReady); c != nil {
  status = c.Status
}

// Skip healthy nodes.
if status == v1.ConditionTrue {
  return
}
@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Oct 30, 2018

@mtaufen thanks for reviewing! I agree with your comments but my goal here was to introduce the least amount of logical changes while breaking out the node deletion part of each controller. I'm happy to address those comments here but it will be harder to track down bugs later as we will merge everything in the same PR. More than happy to address those comments in a follow up PRs so we can easily revert the logical changes without undoing everything else. Will defer judgement to you though, let me know what you think :)

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 29, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Nov 29, 2018

/retest

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Nov 30, 2018

/assign @deads2k
for approval

@mtaufen

mtaufen approved these changes Dec 3, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Dec 3, 2018

/unassign @deads2k
/assign @dchen1107
for approval

@k8s-ci-robot k8s-ci-robot assigned dchen1107 and unassigned deads2k Dec 3, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Dec 3, 2018

@andrewsykim FYI I noticed some rebase artifacts (e.g. >>>>>>>) in your commits; do you mind just squashing them together?

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Dec 3, 2018

Sure thing, didn't initially so it's easier to review.

@andrewsykim andrewsykim force-pushed the andrewsykim:consolidate-node-delete branch from eb7f507 to 5329f09 Dec 3, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label Dec 3, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Dec 3, 2018

/lgtm

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Dec 16, 2018

friendly ping @deads2k @dchen1107 for review :)

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Dec 17, 2018

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 17, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, dchen1107

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Dec 17, 2018

/retest

@k8s-ci-robot k8s-ci-robot merged commit e2be7c9 into kubernetes:master Dec 17, 2018

19 checks passed

cla/linuxfoundation andrewsykim authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment