Controller changes for perma failed deployments #35691

0xmichalis · 2016-10-27T12:25:53Z

This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.

Follow-up to #19343

Docs at kubernetes/website#1337

Fixes #14519

@kubernetes/deployment @smarterclayton

This change is

smarterclayton · 2016-10-31T18:22:10Z

pkg/controller/deployment/sync.go

+		if deployment.Spec.ProgressDeadlineSeconds != nil && cond == nil {
+			msg := fmt.Sprintf("Found new replica set %q", rsCopy.Name)
+			condition := deploymentutil.NewDeploymentCondition(extensions.DeploymentProgressing, api.ConditionTrue, deploymentutil.FoundNewRSReason, msg)
+			deploymentutil.SetDeploymentCondition(&deployment.Status, *condition)


How many extra deployment status writes does this PR add (on average) to a deployment creation and a deployment update? I.e. for scenario:

Create a new deployment -> deployment reaches success

Modify deployment -> new code is rolled out

What is before and after REST calls to D and RS look like?

This really depends on the size and the fenceposts of the deployment. The fastest a rollout can advance ((spec.replicas+maxSurge - (spec.replicas-maxUnavailable)) is bigger) and the less the pods under a deployment, the less the writes will be.

Deployment with one pod:
I don't notice any difference

Deployment with 3 pods:
~7-8 writes in master
~10 writes with progressDeadline set

Deployment with 10 pods:
~40 writes in master
~50 writes with progressDeadline set

This PR adds no additional RS writes.

janetkuo · 2016-10-31T22:43:18Z

test/e2e/deployment.go

@@ -418,8 +425,8 @@ func testRollingUpdateDeploymentEvents(f *framework.Framework) {
 	newRS, err := deploymentutil.GetNewReplicaSet(deployment, c)
 	Expect(err).NotTo(HaveOccurred())
 	Expect(newRS).NotTo(Equal(nil))
-	Expect(events.Items[0].Message).Should(Equal(fmt.Sprintf("Scaled up replica set %s to 1", newRS.Name)))
-	Expect(events.Items[1].Message).Should(Equal(fmt.Sprintf("Scaled down replica set %s to 0", rsName)))
+	Expect(events.Items[0].Message).Should(Equal(fmt.Sprintf("Created new replica set %q and scaled up to 1", newRS.Name)))


Would this cause version skewed tests to fail?

Are there any other tests outside of this repo that need to be taken into account?

janetkuo · 2016-10-31T22:45:44Z

test/e2e/deployment.go

+
+	// Create a nginx deployment.
+	deploymentName := "nginx"
+	badImage := "nginx:404"


nit: it's not bad, but non-existent

janetkuo · 2016-10-31T22:46:20Z

test/e2e/deployment.go

+	thirty := int32(30)
+	d := newDeployment(deploymentName, replicas, podLabels, nginxImageName, badImage, extensions.RecreateDeploymentStrategyType, nil)
+	d.Spec.ProgressDeadlineSeconds = &thirty
+	framework.Logf("Creating deployment %q with a bad image", deploymentName)


Again not bad

mention ProgressDeadlineSeconds = 30s or at least mention it's set?

janetkuo · 2016-10-31T22:47:59Z

test/e2e/deployment.go

+	badImage := "nginx:404"
+	thirty := int32(30)
+	d := newDeployment(deploymentName, replicas, podLabels, nginxImageName, badImage, extensions.RecreateDeploymentStrategyType, nil)
+	d.Spec.ProgressDeadlineSeconds = &thirty


Suggest using By to describe test steps where appropriate

There is one just below

janetkuo · 2016-10-31T22:51:56Z

test/e2e/deployment.go

+	Expect(err).NotTo(HaveOccurred())
+
+	// Check that deployment is created fine.
+	deployment, err := c.Extensions().Deployments(ns).Get(deploymentName)


Why not just use the returned deployment from Create?

janetkuo · 2016-10-31T23:14:41Z

test/e2e/deployment.go

+		case n < 0.8:
+			// pause / resume the deployment
+			if deployment.Spec.Paused {
+				framework.Logf("%02d: pausing deployment %q", i, deployment.Name)


Same comment as #35691 (comment)

janetkuo · 2016-10-31T23:14:48Z

test/e2e/deployment.go

+			Expect(err).NotTo(HaveOccurred())
+
+		case n < 0.8:
+			// pause / resume the deployment


janetkuo · 2016-10-31T23:16:12Z

test/e2e/deployment.go

+			podList, err := c.Core().Pods(ns).List(opts)
+			Expect(err).NotTo(HaveOccurred())
+			if len(podList.Items) == 0 {
+				framework.Logf("%02d: no deployment pods", i)


... to delete

janetkuo · 2016-10-31T23:16:26Z

test/e2e/deployment.go

+
+		default:
+			// arbitrarily delete deployment pods
+			framework.Logf("%02d: deleting one or more deployment pods for deployment %q", i, deployment.Name)


arbitrarily deleting...

janetkuo · 2016-10-31T23:20:06Z

test/e2e/deployment.go

+	iterations := 20
+	for i := 0; i < iterations; i++ {
+		if r := rand.Float32(); r < 0.6 {
+			time.Sleep(time.Duration(float32(i) * r * float32(time.Second)))


What's the reason behind this number?

It's for creating a random wait between actions. The more the test proceeds, the more the waits will be longer so we can test various action rates.

This was fairly effective at exposing races and bad logic in the deployment controller in openshift.

Found the first race in the scaling code: http://pastebin.com/fwwEfJnp
Fast updates can block rollouts because the replica annotations are not updated correctly in the scale code.

smarterclayton · 2016-11-03T00:19:17Z

I would not have expected the resource quota e2e test flake in https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/35691/pull-kubernetes-e2e-gce-gci/2732/ @derekwaynecarr

smarterclayton · 2016-11-03T00:30:27Z

pkg/controller/controller_utils.go

@@ -343,6 +343,14 @@ func NewUIDTrackingControllerExpectations(ce ControllerExpectationsInterface) *U
 	return &UIDTrackingControllerExpectations{ControllerExpectationsInterface: ce, uidStore: cache.NewStore(UIDSetKeyFunc)}
 }

+// Reasons for pod events
+const (
+	FailedCreatePodReason     = "FailedCreate"


smarterclayton · 2016-11-03T00:32:54Z

pkg/controller/deployment/util/deployment_util.go

+	// progress or tried to create a replica set, or resumed a paused deployment and
+	// compare against progressDeadlineSeconds.
+	from := condition.LastTransitionTime
+	delta := time.Duration(*deployment.Spec.ProgressDeadlineSeconds) * time.Second


Add a TODO here that this is clock dependent and should be based on observed time from this controller, not stored time.

What's the difference between stored time that has been set based on observation by this controller (what we currently have) and observed time that is not stored?

The time stored on the object may have been set by a different controller with a different current time, rate of time change, or transient problem. In a distributed system any sort of "store time, then read it later from a different process" action is fundamentally broken. Instead, each component in the cluster needs to maintain its own clock (based on when it sees something happen) and then act only according to the time the local process keeps. That's not 100% safe (the admin on the machine could adjust the time forward or backwards), but it prevents clock skew from having an impact.

Here's an example:

Controller process 1 observes a new deployment update, sets a condition

Node that controller process 1 is on dies, controller fails over to node 2 with controller process 2

Node 2's clock is 30s in the future

Controller process 2 compares stored time from etcd (recorded by node 1's clock) against now, sees that the progress deadline window has passed, and records "failure", even though the wall clock measurement is ~1-2s from when process 1 actually saw the deployment update.

The correct way to track time is for each controller to maintain its own clock, and it can only act when it observes a change. I.e.:

Controller process 2 observes progress deadline is set AND condition is set, starts its own clock at now (t0)

Controller process 2 waits to update condition until at least progressDeadlineSeconds have elapsed since t0.

Thanks for the explanation. This is not a problem with the main controller running on a single master since it's the only component that updates conditions. I guess the master going down in a HA setup would break this. In the future when we will support custom controllers that run on nodes, this problem will become more apparent. Dropping a reference in #29229.

smarterclayton · 2016-11-03T00:36:34Z

pkg/controller/deployment/progress.go

+// and when new pods scale up or old pods scale down. Progress is not estimated for paused
+// deployments or when users don't really care about it ie. progressDeadlineSeconds is not
+// specified.
+// TODO: Look for permanent failures in the new replica set such as image pull errors or


Maybe I'm misunderstanding this TODO, but image pull and crash looping are not perm failures. Crash looping can be because a dependency is not up, and image pull can be because we are waiting for another pod to push an image.

Removed for now - #18568 is the issue for identifying permanent failures

smarterclayton · 2016-11-03T00:37:09Z

A few comments, but needs tests passing.

0xmichalis · 2016-11-03T14:53:31Z

@smarterclayton all green

0xmichalis · 2016-11-03T14:53:49Z

@mfojtik do you want to have a last pass?

mfojtik · 2016-11-03T16:05:13Z

pkg/controller/deployment/deployment_controller.go

+	if err != nil {
+		return err
+	}
+	if failed {


remove this if, just keep TODO

mfojtik · 2016-11-03T16:07:12Z

pkg/controller/deployment/progress.go

+	}
+
+	// If there is no progressDeadlineSeconds set, remove any Progressing condition.
+	if d.Spec.ProgressDeadlineSeconds == nil {


move this check up?

mfojtik · 2016-11-03T16:13:08Z

LGTM

0xmichalis · 2016-11-03T16:43:21Z

Will wait until #34645 is merged and then I will tag this.

0xmichalis · 2016-11-04T10:00:00Z

@k8s-bot e2e test this: #33388

This commit adds support for failing deployments based on a timeout parameter defined in the spec. If there is no progress for the amount of time defined as progressDeadlineSeconds then the deployment will be marked as failed by adding a condition with a ProgressDeadlineExceeded reason in it. Progress in the context of a deployment means the creation or adoption of a new replica set, scaling up new pods, and scaling down old pods.

0xmichalis · 2016-11-04T13:26:49Z

@mfojtik ptal in the last commit, annotations weren't updated correctly and the rollout would block. Every time we get into scale we have to update annotations.

0xmichalis · 2016-11-04T17:08:02Z

Here is the place where we check rs annotations in order to identify scaling events.

smarterclayton · 2016-11-04T18:05:18Z

/lgtm

smarterclayton · 2016-11-04T18:05:34Z

This needs a release note, can you update?

k8s-github-robot · 2016-11-04T21:06:26Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-11-04T21:49:42Z

Automatic merge from submit-queue

…ployments Automatic merge from submit-queue kubectl: enhancements for deployment progress deadline Changes: * add deployment conditions in the describer * abort 'rollout status' for deployments that have exceeded their progress deadline Depends on #35691. @kubernetes/kubectl @kubernetes/deployment Fixes #31319

0xmichalis assigned smarterclayton Oct 27, 2016

googlebot added the cla: yes label Oct 27, 2016

k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Oct 27, 2016

0xmichalis mentioned this pull request Oct 27, 2016

Replica set conditions controller changes #34645

Merged

k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 27, 2016

0xmichalis added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Oct 27, 2016

k8s-github-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 28, 2016

0xmichalis added this to the v1.5 milestone Oct 28, 2016

janetkuo mentioned this pull request Oct 28, 2016

Add docs for perma-failed deployments kubernetes/website#1337

Closed

smarterclayton reviewed Oct 31, 2016

View reviewed changes

janetkuo reviewed Oct 31, 2016

View reviewed changes

0xmichalis assigned mfojtik Nov 2, 2016

smarterclayton reviewed Nov 3, 2016

View reviewed changes

0xmichalis added priority/backlog Higher priority than priority/awaiting-more-evidence. area/app-lifecycle labels Nov 3, 2016

mfojtik reviewed Nov 3, 2016

View reviewed changes

This was referenced Nov 3, 2016

kubectl: enhancements for deployment progress deadline #36171

Merged

Master/node clock skew will affect deployment's pod availability check #29229

Closed

Deployment doesn't create new replica set nor give any error #36117

Closed

0xmichalis added 2 commits November 4, 2016 13:36

test: e2e tests for perma-failed deployments

de8214a

Update replica annotations every time they are out of sync

f52ea8f

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2016

0xmichalis added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 4, 2016

k8s-github-robot merged commit a811515 into kubernetes:master Nov 4, 2016

0xmichalis deleted the controller-changes-for-perma-failed branch November 6, 2016 13:02

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

Controller changes for perma failed deployments #35691

Controller changes for perma failed deployments #35691

Conversation

0xmichalis commented Oct 27, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Nov 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Nov 3, 2016

0xmichalis commented Nov 3, 2016

0xmichalis commented Nov 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfojtik commented Nov 3, 2016

0xmichalis commented Nov 3, 2016

0xmichalis commented Nov 4, 2016

0xmichalis commented Nov 4, 2016

0xmichalis commented Nov 4, 2016

smarterclayton commented Nov 4, 2016

smarterclayton commented Nov 4, 2016

k8s-github-robot commented Nov 4, 2016

k8s-github-robot commented Nov 4, 2016

0xmichalis commented Oct 27, 2016 •

edited