Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller changes for perma failed deployments #35691

Merged
merged 3 commits into from Nov 4, 2016
Merged

Controller changes for perma failed deployments #35691

merged 3 commits into from Nov 4, 2016

Conversation

0xmichalis
Copy link
Contributor

@0xmichalis 0xmichalis commented Oct 27, 2016

This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.

Follow-up to #19343

Docs at kubernetes/website#1337

Fixes #14519

@kubernetes/deployment @smarterclayton


This change is Reviewable

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Oct 27, 2016
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 27, 2016
@0xmichalis 0xmichalis added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Oct 27, 2016
@k8s-github-robot k8s-github-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 28, 2016
@0xmichalis 0xmichalis added this to the v1.5 milestone Oct 28, 2016
if deployment.Spec.ProgressDeadlineSeconds != nil && cond == nil {
msg := fmt.Sprintf("Found new replica set %q", rsCopy.Name)
condition := deploymentutil.NewDeploymentCondition(extensions.DeploymentProgressing, api.ConditionTrue, deploymentutil.FoundNewRSReason, msg)
deploymentutil.SetDeploymentCondition(&deployment.Status, *condition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many extra deployment status writes does this PR add (on average) to a deployment creation and a deployment update? I.e. for scenario:

  1. Create a new deployment -> deployment reaches success
  2. Modify deployment -> new code is rolled out

What is before and after REST calls to D and RS look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really depends on the size and the fenceposts of the deployment. The fastest a rollout can advance ((spec.replicas+maxSurge - (spec.replicas-maxUnavailable)) is bigger) and the less the pods under a deployment, the less the writes will be.

Deployment with one pod:
I don't notice any difference

Deployment with 3 pods:
~7-8 writes in master
~10 writes with progressDeadline set

Deployment with 10 pods:
~40 writes in master
~50 writes with progressDeadline set

This PR adds no additional RS writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -418,8 +425,8 @@ func testRollingUpdateDeploymentEvents(f *framework.Framework) {
newRS, err := deploymentutil.GetNewReplicaSet(deployment, c)
Expect(err).NotTo(HaveOccurred())
Expect(newRS).NotTo(Equal(nil))
Expect(events.Items[0].Message).Should(Equal(fmt.Sprintf("Scaled up replica set %s to 1", newRS.Name)))
Expect(events.Items[1].Message).Should(Equal(fmt.Sprintf("Scaled down replica set %s to 0", rsName)))
Expect(events.Items[0].Message).Should(Equal(fmt.Sprintf("Created new replica set %q and scaled up to 1", newRS.Name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this cause version skewed tests to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other tests outside of this repo that need to be taken into account?


// Create a nginx deployment.
deploymentName := "nginx"
badImage := "nginx:404"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not bad, but non-existent

thirty := int32(30)
d := newDeployment(deploymentName, replicas, podLabels, nginxImageName, badImage, extensions.RecreateDeploymentStrategyType, nil)
d.Spec.ProgressDeadlineSeconds = &thirty
framework.Logf("Creating deployment %q with a bad image", deploymentName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not bad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention ProgressDeadlineSeconds = 30s or at least mention it's set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

badImage := "nginx:404"
thirty := int32(30)
d := newDeployment(deploymentName, replicas, podLabels, nginxImageName, badImage, extensions.RecreateDeploymentStrategyType, nil)
d.Spec.ProgressDeadlineSeconds = &thirty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using By to describe test steps where appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one just below

Expect(err).NotTo(HaveOccurred())

// Check that deployment is created fine.
deployment, err := c.Extensions().Deployments(ns).Get(deploymentName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the returned deployment from Create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

case n < 0.8:
// pause / resume the deployment
if deployment.Spec.Paused {
framework.Logf("%02d: pausing deployment %q", i, deployment.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as #35691 (comment)

Expect(err).NotTo(HaveOccurred())

case n < 0.8:
// pause / resume the deployment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toggling

podList, err := c.Core().Pods(ns).List(opts)
Expect(err).NotTo(HaveOccurred())
if len(podList.Items) == 0 {
framework.Logf("%02d: no deployment pods", i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... to delete


default:
// arbitrarily delete deployment pods
framework.Logf("%02d: deleting one or more deployment pods for deployment %q", i, deployment.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrarily deleting...

iterations := 20
for i := 0; i < iterations; i++ {
if r := rand.Float32(); r < 0.6 {
time.Sleep(time.Duration(float32(i) * r * float32(time.Second)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for creating a random wait between actions. The more the test proceeds, the more the waits will be longer so we can test various action rates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fairly effective at exposing races and bad logic in the deployment controller in openshift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the first race in the scaling code: http://pastebin.com/fwwEfJnp
Fast updates can block rollouts because the replica annotations are not updated correctly in the scale code.

@smarterclayton
Copy link
Contributor

@@ -343,6 +343,14 @@ func NewUIDTrackingControllerExpectations(ce ControllerExpectationsInterface) *U
return &UIDTrackingControllerExpectations{ControllerExpectationsInterface: ce, uidStore: cache.NewStore(UIDSetKeyFunc)}
}

// Reasons for pod events
const (
FailedCreatePodReason = "FailedCreate"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc

// progress or tried to create a replica set, or resumed a paused deployment and
// compare against progressDeadlineSeconds.
from := condition.LastTransitionTime
delta := time.Duration(*deployment.Spec.ProgressDeadlineSeconds) * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO here that this is clock dependent and should be based on observed time from this controller, not stored time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between stored time that has been set based on observation by this controller (what we currently have) and observed time that is not stored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time stored on the object may have been set by a different controller with a different current time, rate of time change, or transient problem. In a distributed system any sort of "store time, then read it later from a different process" action is fundamentally broken. Instead, each component in the cluster needs to maintain its own clock (based on when it sees something happen) and then act only according to the time the local process keeps. That's not 100% safe (the admin on the machine could adjust the time forward or backwards), but it prevents clock skew from having an impact.

Here's an example:

  1. Controller process 1 observes a new deployment update, sets a condition
  2. Node that controller process 1 is on dies, controller fails over to node 2 with controller process 2
  3. Node 2's clock is 30s in the future
  4. Controller process 2 compares stored time from etcd (recorded by node 1's clock) against now, sees that the progress deadline window has passed, and records "failure", even though the wall clock measurement is ~1-2s from when process 1 actually saw the deployment update.

The correct way to track time is for each controller to maintain its own clock, and it can only act when it observes a change. I.e.:

  1. Controller process 2 observes progress deadline is set AND condition is set, starts its own clock at now (t0)
  2. Controller process 2 waits to update condition until at least progressDeadlineSeconds have elapsed since t0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. This is not a problem with the main controller running on a single master since it's the only component that updates conditions. I guess the master going down in a HA setup would break this. In the future when we will support custom controllers that run on nodes, this problem will become more apparent. Dropping a reference in #29229.

// and when new pods scale up or old pods scale down. Progress is not estimated for paused
// deployments or when users don't really care about it ie. progressDeadlineSeconds is not
// specified.
// TODO: Look for permanent failures in the new replica set such as image pull errors or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding this TODO, but image pull and crash looping are not perm failures. Crash looping can be because a dependency is not up, and image pull can be because we are waiting for another pod to push an image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for now - #18568 is the issue for identifying permanent failures

@smarterclayton
Copy link
Contributor

A few comments, but needs tests passing.

@0xmichalis
Copy link
Contributor Author

@smarterclayton all green

@0xmichalis
Copy link
Contributor Author

@mfojtik do you want to have a last pass?

@0xmichalis 0xmichalis added priority/backlog Higher priority than priority/awaiting-more-evidence. area/app-lifecycle labels Nov 3, 2016
if err != nil {
return err
}
if failed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this if, just keep TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

// If there is no progressDeadlineSeconds set, remove any Progressing condition.
if d.Spec.ProgressDeadlineSeconds == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this check up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@mfojtik
Copy link
Contributor

mfojtik commented Nov 3, 2016

LGTM

@0xmichalis
Copy link
Contributor Author

Will wait until #34645 is merged and then I will tag this.

@0xmichalis
Copy link
Contributor Author

@k8s-bot e2e test this: #33388

This commit adds support for failing deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by adding a condition with a ProgressDeadlineExceeded
reason in it. Progress in the context of a deployment means the creation
or adoption of a new replica set, scaling up new pods, and scaling down
old pods.
@0xmichalis
Copy link
Contributor Author

@mfojtik ptal in the last commit, annotations weren't updated correctly and the rollout would block. Every time we get into scale we have to update annotations.

@0xmichalis
Copy link
Contributor Author

Here is the place where we check rs annotations in order to identify scaling events.

@smarterclayton
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2016
@smarterclayton
Copy link
Contributor

This needs a release note, can you update?

@0xmichalis 0xmichalis added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 4, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit a811515 into kubernetes:master Nov 4, 2016
@0xmichalis 0xmichalis deleted the controller-changes-for-perma-failed branch November 6, 2016 13:02
k8s-github-robot pushed a commit that referenced this pull request Nov 7, 2016
…ployments

Automatic merge from submit-queue

kubectl: enhancements for deployment progress deadline

Changes:
* add deployment conditions in the describer
* abort 'rollout status' for deployments that have exceeded their
progress deadline

Depends on #35691.

@kubernetes/kubectl @kubernetes/deployment

Fixes #31319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants