Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(daemon): create more expections when skipping pods #74856

Conversation

draveness
Copy link
Contributor

@draveness draveness commented Mar 3, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

DaemonSetsController creates more expections when errors happens in the 2nd batch.

https://github.com/draveness/kubernetes/blame/master/pkg/controller/daemon/daemon_controller.go#L1027-L1084

	batchSize := integer.IntMin(createDiff, controller.SlowStartInitialBatchSize)
	for pos := 0; createDiff > pos; batchSize, pos = integer.IntMin(2*batchSize, createDiff-(pos+batchSize)), pos+batchSize {
		errorCount := len(errCh)
		createWait.Add(batchSize)
		for i := pos; i < pos+batchSize; i++ {
			go func(ix int) {
				defer createWait.Done()
				var err error
				// ...
			}(i)
		}
		createWait.Wait()
		// any skipped pods that we never attempted to start shouldn't be expected.
		skippedPods := createDiff - batchSize
		if errorCount < len(errCh) && skippedPods > 0 {
			klog.V(2).Infof("Slow-start failure. Skipping creation of %d pods, decrementing expectations for set %q/%q", skippedPods, ds.Namespace, ds.Name)
			for i := 0; i < skippedPods; i++ {
				dsc.expectations.CreationObserved(dsKey)
			}
			// The skipped pods will be retried later. The next controller resync will
			// retry the slow start process.
			break
		}
	}

skippedPods := createDiff - batchSize expressions doesn't consider pods which created previously. However, in job_controller it decreases the diff after each batch's creation.

https://github.com/draveness/kubernetes/blame/master/pkg/controller/job/job_controller.go#L765-L809

for batchSize := int32(integer.IntMin(int(diff), controller.SlowStartInitialBatchSize)); diff > 0; batchSize = integer.Int32Min(2*batchSize, diff) {
			errorCount := len(errCh)
			wait.Add(int(batchSize))
			for i := int32(0); i < batchSize; i++ {
				go func() {
					defer wait.Done()
					// ...
				}()
			}
			wait.Wait()
			// any skipped pods that we never attempted to start shouldn't be expected.
			skippedPods := diff - batchSize
			if errorCount < len(errCh) && skippedPods > 0 {
				klog.V(2).Infof("Slow-start failure. Skipping creation of %d pods, decrementing expectations for job %q/%q", skippedPods, job.Namespace, job.Name)
				active -= skippedPods
				for i := int32(0); i < skippedPods; i++ {
					// Decrement the expected number of creates because the informer won't observe this pod
					jm.expectations.CreationObserved(jobKey)
				}
				// The skipped pods will be retried later. The next controller resync will
				// retry the slow start process.
				break
			}
			diff -= batchSize
		}

Does this PR introduce a user-facing change?:

Remove extra pod creation expections when daemonset fails to create pods in batches.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @draveness. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 3, 2019
@draveness draveness changed the title fix(daemon): create more expections when skippingy pods fix(daemon): create more expections when skipping pods Mar 3, 2019
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 3, 2019
@draveness
Copy link
Contributor Author

/assign @k82cn

@k82cn
Copy link
Member

k82cn commented Mar 3, 2019

can you help to add e2e test for this case?

@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch from 46f21a1 to 1dad9c4 Compare March 3, 2019 05:15
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2019
@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch 2 times, most recently from 97ec9d0 to c45fd23 Compare March 3, 2019 05:17
@draveness
Copy link
Contributor Author

draveness commented Mar 3, 2019

Hi @k82cn, I use a unit test to verify the pods creation expectations which works the same but simpler, is that ok?

@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch 3 times, most recently from e9e2cd4 to 7e1d314 Compare March 4, 2019 10:26
@draveness
Copy link
Contributor Author

Hi, @k82cn it's already been a while. Could you give me some advice on this? Many thanks.

@dims
Copy link
Member

dims commented Mar 26, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 26, 2019
@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch 2 times, most recently from 80607e2 to 8e76760 Compare April 6, 2019 02:37
@draveness
Copy link
Contributor Author

draveness commented Apr 6, 2019 via email

@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch 2 times, most recently from ea35b7a to d8e45d3 Compare April 7, 2019 07:35
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 7, 2019
@draveness
Copy link
Contributor Author

The expectations for the controller is just like remainingCreations what @lavalamp has mentioned above. When there isn't an error with pod creations, the timing of sending creation expectations is when daemon set controller receives create pod notification by the informer.

func (dsc *DaemonSetsController) addPod(obj interface{}) {
	pod := obj.(*v1.Pod)

	// ...
	if controllerRef := metav1.GetControllerOf(pod); controllerRef != nil {
		ds := dsc.resolveControllerRef(pod.Namespace, controllerRef)

		dsKey, err := controller.KeyFunc(ds)
		dsc.expectations.CreationObserved(dsKey)
		dsc.enqueueDaemonSet(ds)
		return
	}

	// ...
}

But FakePodControl does not create pod actually, so we send pods creation expectations in it which means daemonset satisfied expectations after manage called:

func (dsc *DaemonSetsController) syncDaemonSet(key string) error {
	if !dsc.expectations.SatisfiedExpectations(dsKey) {
		return dsc.updateDaemonSetStatus(ds, hash, false) // originally returned from here.
	}

	err = dsc.manage(ds, hash)
	if err != nil {
		return err
	}

	if dsc.expectations.SatisfiedExpectations(dsKey) {
		switch ds.Spec.UpdateStrategy.Type {
		case apps.OnDeleteDaemonSetStrategyType:
		case apps.RollingUpdateDaemonSetStrategyType:
			err = dsc.rollingUpdate(ds, hash) // !! with rolling update strategy, update daemonset status and send an event.
		}
		if err != nil {
			return err
		}
	}

	err = dsc.cleanupHistory(ds, old)
	if err != nil {
		return fmt.Errorf("failed to clean up revisions of DaemonSet: %v", err)
	}

	return dsc.updateDaemonSetStatus(ds, hash, true) // send an event.
}

So I change the two test case TestSufficientCapacityWithTerminatedPodsDaemonLaunchesPod and TestSufficientCapacityNodeDaemonLaunchesPod as below:

func TestSufficientCapacityWithTerminatedPodsDaemonLaunchesPod(t *testing.T) {
	defer utilfeaturetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.ScheduleDaemonSetPods, false)()
	{
		strategy := newOnDeleteStrategy()
		podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
		ds := newDaemonSet("foo")
		ds.Spec.UpdateStrategy = *strategy
		ds.Spec.Template.Spec = podSpec
		manager, podControl, _, err := newTestController(ds)
		if err != nil {
			t.Fatalf("error creating DaemonSets controller: %v", err)
		}
		node := newNode("too-much-mem", nil)
		node.Status.Allocatable = allocatableResources("100M", "200m")
		manager.nodeStore.Add(node)
		manager.podStore.Add(&v1.Pod{
			Spec:   podSpec,
			Status: v1.PodStatus{Phase: v1.PodSucceeded},
		})
		manager.dsStore.Add(ds)
		syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 1)
	}

	{
		strategy := newRollbackStrategy()
		podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
		ds := newDaemonSet("foo")
		ds.Spec.UpdateStrategy = *strategy
		ds.Spec.Template.Spec = podSpec
		manager, podControl, _, err := newTestController(ds)
		if err != nil {
			t.Fatalf("error creating DaemonSets controller: %v", err)
		}
		node := newNode("too-much-mem", nil)
		node.Status.Allocatable = allocatableResources("100M", "200m")
		manager.nodeStore.Add(node)
		manager.podStore.Add(&v1.Pod{
			Spec:   podSpec,
			Status: v1.PodStatus{Phase: v1.PodSucceeded},
		})
		manager.dsStore.Add(ds)
		syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 2)
	}
}

I worked on this for quite a period to get it done, and I have to say it's really hard to test and verify it. And I'm very happy to change it if someone gives me more context and advice.🤣

@draveness
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 7, 2019
@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch 2 times, most recently from e339259 to 7a02eb3 Compare April 7, 2019 07:56
@draveness
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

@draveness
Copy link
Contributor Author

Kindly ping @k82cn @janetkuo @mikedanese for review and approval.

@mikedanese
Copy link
Member

mikedanese commented May 3, 2019

Agree that this is way to hard to understand, but the fix looks correct. I have one question about why we are seeing two events in those tests now when we weren't before.

@draveness
Copy link
Contributor Author

draveness commented May 3, 2019

Agree that this is way to hard to understand, but the fix looks correct. I have one question about why we are seeing two events in those tests now when we weren't before.

I think I explain the two events in the previous comment - when the stateful set update strategy is rollingUpdate, it would check whether daemonset satisfies the expectations in syncDeamonSet function:

  • Without sending creation expectations in fake pod control, controller emits creation expectations when it receives pod creation notification which would never happen in the unit test.
  • With sending creation expectations in fake pod control, fakePodControl issue creation expectation synchronously, so the controller satisfies the expectation immediately after syncing.

When the expectations are satisfied, and the update strategy is rolling update, the logic will go into rollingUpdate function and send another event. I know it's a complicated fix and took me quite a long time to figure out. You could see the previous comments for more information.

BTW, the previous fix has no changes on other unit tests, but it was involved in some math which is hard to understand. And you could find it here 0d37ef6 if you are interested in the previous approach.

@mikedanese
Copy link
Member

Can you address my comments?

@draveness draveness force-pushed the fix/daemonset-controller-slow-batch-creation branch from 7a02eb3 to 5f8dfdc Compare May 4, 2019 00:16
@draveness
Copy link
Contributor Author

Hi @mikedanese, just resolved the comments, PTAL.

@mikedanese
Copy link
Member

/lgtm
/approve

I see SlowStart got copied and pasted a few times. I filed #77436 to clean this up.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: draveness, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit d5245b9 into kubernetes:master May 4, 2019
@draveness draveness deleted the fix/daemonset-controller-slow-batch-creation branch May 4, 2019 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants