Add perma-failed deployments API #19343

0xmichalis · 2016-01-06T20:26:48Z

@kubernetes/deployment @smarterclayton

This change is

ironcladlou · 2016-01-07T02:36:27Z

pkg/apis/extensions/types.go

@@ -296,6 +300,17 @@ type DeploymentStatus struct {

 	// Total number of non-terminated pods targeted by this deployment that have the desired template spec.
 	UpdatedReplicas int `json:"updatedReplicas,omitempty"`
+
+	// Stuck indicates if the current deployment has failed.
+	Stuck bool `json:"stuck,omitempty"`


Rather than a boolean, we need to express failed as a condition (i.e. using ConditionStatus).

Thanks, didn't know about ConditionStatus, will switch to it.

After digging in other uses of ConditionStatus in the api I see that everywhere it is wrapped in a higher level <Resource>Condition so I guess we need to follow that pattern and have a DeploymentCondition here.

ironcladlou · 2016-01-07T13:44:36Z

The new timeout/condition handling looks great.

ironcladlou · 2016-01-07T18:08:23Z

api/swagger-spec/v1beta1.json

@@ -4374,7 +4374,7 @@
    "properties": {
     "type": {
      "type": "string",
-      "description": "Type of job condition, currently only Complete."
+      "description": "Type of job condition."


Not entirely clear why you're changing a job resource field in this PR.

ironcladlou · 2016-01-08T14:16:40Z

pkg/controller/deployment/deployment_controller.go

@@ -396,6 +397,23 @@ func (dc *DeploymentController) syncDeployment(key string) error {
 		return nil
 	}

+	// If the deployment times out, we will need to add a failed condition in its
+	// status (if it's not already there), scale down all of its replication
+	// controllers, and finally update its status to reflect zero replicas.


Comment needs updated to reflect the removal of the auto-scaledown behavior

ironcladlou · 2016-01-08T14:21:24Z

Other than one remaining nit this is looking good to me... I'd like to get feedback on the condition clearing approach (cc @nikhiljindal).

0xmichalis · 2016-01-08T14:38:45Z

Hm, after playing around with e2e, I realized that the current condition clearing approach isn't enough. dc.timedOut will fallback in the creationTimestamp of the deployment in case of no found conditions. Imagine a deployment with maxProgressThresholdSeconds=60, failed yesterday, and today someone retried it. It would fail on the spot. Should the retry endpoint add (and update in subsequent calls) a timestamp annotation in the deployment?

0xmichalis · 2016-01-08T14:41:40Z

Or even better, how about adding a condition instead of an annotation?

ironcladlou · 2016-01-08T15:56:01Z

Hm, after playing around with e2e, I realized that the current condition clearing approach isn't enough. dc.timedOut will fallback in the creationTimestamp of the deployment in case of no found conditions. Imagine a deployment with maxProgressThresholdSeconds=60, failed yesterday, and today someone retried it. It would fail on the spot. Should the retry endpoint add (and update in subsequent calls) a timestamp annotation in the deployment?

Thinking about this more, should the controller really be enforcing timeout when no conditions are present? It could be a while between creation and first sync, and during that time I don't think it counts as the time the deployment is actually taking since it hasn't even tried working with the deployment yet.

What if you only check timeouts when there's a condition present?

ironcladlou · 2016-01-08T16:00:08Z

Put another way, the timeout should always be relative to progress, and lack of a condition implies that we haven't even started trying to make progress yet, implying timeouts shouldn't yet be effective.

k8s-ci-robot · 2016-10-26T18:50:10Z

Jenkins verification failed for commit 35cdedf48a5fa6c4489674afadaebbdc01049824. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

bgrant0607 · 2016-10-26T23:08:21Z

pkg/apis/extensions/v1beta1/types.go

 	// The config this deployment is rolling back to. Will be cleared after rollback is done.
 	// +optional
 	RollbackTo *RollbackConfig `json:"rollbackTo,omitempty" protobuf:"bytes,8,opt,name=rollbackTo"`
+
+	// The maximum time in seconds for a deployment to make progress before it
+	// is considered to be failed. Failed deployments are not retried by the


@Kargakis
What does "failed deployments are not retried" mean? Is that still accurate, if we're not implementing auto-rollback yet?

I think the wording here is not accurate enough, will update.

bgrant0607 · 2016-10-26T23:08:57Z

pkg/apis/extensions/v1beta1/types.go

+	// The maximum time in seconds for a deployment to make progress before it
+	// is considered to be failed. Failed deployments are not retried by the
+	// deployment controller and require user action. Failure causes are surfaced
+	// in the deployment status as Conditions. This is not set by default.


Please specify which specific condition type(s). They aren't otherwise discoverable currently.

bgrant0607 · 2016-10-26T23:09:30Z

pkg/apis/extensions/v1beta1/types.go

+
+// These are valid conditions of a deployment.
+const (
+	// DeploymentAvailable means the deployment is available, ie. at least the minimum available


Please refer to these types by the string names not variable names.

bgrant0607 · 2016-10-26T23:10:19Z

@Kargakis API looks ok, thanks. Added a few comments about the documentation/descriptions.

k8s-ci-robot · 2016-10-27T11:34:42Z

Jenkins GKE smoke e2e failed for commit cf9fd31. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

0xmichalis · 2016-10-27T12:21:11Z

@k8s-bot test this issue: #33388

0xmichalis · 2016-10-27T13:41:51Z

@bgrant0607 addressed all of your latest comments. Applying lgtm

0xmichalis · 2016-10-27T15:06:41Z

Marking as P2 to avoid any more rebases for today:)

k8s-github-robot · 2016-10-28T01:46:47Z

Automatic merge from submit-queue

bgrant0607 · 2016-10-28T04:59:58Z

@grodrigues3 @apelisse @fejta Another PR plagued by rebases.

apelisse · 2016-11-01T16:12:22Z

Unfortunately, it's not obvious by looking at the timeline what is the hot-point for rebase.

By looking at the final code though, it looks like there are generated docs that include a timestamp (recipe for disaster), and lots of generated code that is doomed to conflict.

Questions are:

Does it need to be rebased on a regular basis, or can it just wait for the review to be done, then rebase/quick validation/set to highest priority so that it goes quickly through the queue?
How much effort would it be to remove this generated code/docs?
Are there any other source of conflicts @Kargakis ?

0xmichalis · 2016-11-01T16:22:35Z

@apelisse primary source of conflicts was generated docs and protobuf.

smarterclayton · 2016-11-01T16:33:11Z

Protobuf must conflict (because we have to uniquely allocate proto tags).
So nothing I'm aware of we can do there.

Docs are common points of conflict for almost every API change for sure.

On Nov 1, 2016, at 12:22 PM, Michail Kargakis notifications@github.com
wrote:

@apelisse https://github.com/apelisse primary source of conflicts was
generated docs and protobuf.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19343 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pw4ArJzbkSrMh_99Ot24wgfvkx2_ks5q52dagaJpZM4G_44l
.

@smarterclayton

…failed Automatic merge from submit-queue Controller changes for perma failed deployments This PR adds support for reporting failed deployments based on a timeout parameter defined in the spec. If there is no progress for the amount of time defined as progressDeadlineSeconds then the deployment will be marked as failed by a Progressing condition with a ProgressDeadlineExceeded reason. Follow-up to #19343 Docs at kubernetes/website#1337 Fixes #14519 @kubernetes/deployment @smarterclayton

stop using factory.OpenshiftClientConfig Origin-commit: df81ecaf3ca02d8de059e5a437a546a49ef53415

k8s-github-robot assigned lavalamp Jan 6, 2016

k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/new-api size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 6, 2016

ironcladlou reviewed Jan 7, 2016
View reviewed changes

0xmichalis changed the title ~~[WIP] Add perma-failed deployments~~ Add perma-failed deployments Jan 7, 2016

ironcladlou reviewed Jan 7, 2016
View reviewed changes

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 8, 2016

ironcladlou reviewed Jan 8, 2016
View reviewed changes

lavalamp removed their assignment Jan 8, 2016

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 26, 2016

bgrant0607 reviewed Oct 26, 2016

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016

0xmichalis added 2 commits October 27, 2016 12:41

extensions: api changes for perma-failed deployments

7bb68bc

extensions: generated changes for perma-failed deployments

cf9fd31

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016

0xmichalis mentioned this pull request Oct 27, 2016

Controller changes for perma failed deployments #35691

Merged

0xmichalis added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 27, 2016

k8s-github-robot merged commit b98a990 into kubernetes:master Oct 28, 2016

0xmichalis deleted the perma-failed-deployments branch October 28, 2016 12:11

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

janetkuo mentioned this pull request Nov 8, 2017

Add statefulset conditions #55268

Merged

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Apr 16, 2018

Merge pull request kubernetes#19343 from deads2k/cli-28-slim-interface

10a3857

stop using factory.OpenshiftClientConfig Origin-commit: df81ecaf3ca02d8de059e5a437a546a49ef53415

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add perma-failed deployments API #19343

Add perma-failed deployments API #19343

0xmichalis commented Jan 6, 2016 •

edited

Loading

ironcladlou Jan 7, 2016

0xmichalis Jan 7, 2016

0xmichalis Jan 7, 2016

ironcladlou commented Jan 7, 2016

ironcladlou Jan 7, 2016

0xmichalis Jan 8, 2016

ironcladlou Jan 8, 2016

ironcladlou commented Jan 8, 2016

0xmichalis commented Jan 8, 2016

0xmichalis commented Jan 8, 2016

ironcladlou commented Jan 8, 2016

ironcladlou commented Jan 8, 2016

k8s-ci-robot commented Oct 26, 2016

bgrant0607 Oct 26, 2016

0xmichalis Oct 27, 2016

bgrant0607 Oct 26, 2016

0xmichalis Oct 27, 2016

bgrant0607 Oct 26, 2016

0xmichalis Oct 27, 2016

bgrant0607 commented Oct 26, 2016

k8s-ci-robot commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

k8s-github-robot commented Oct 28, 2016

bgrant0607 commented Oct 28, 2016

apelisse commented Nov 1, 2016

0xmichalis commented Nov 1, 2016

smarterclayton commented Nov 1, 2016

Add perma-failed deployments API #19343

Add perma-failed deployments API #19343

Conversation

0xmichalis commented Jan 6, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironcladlou commented Jan 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironcladlou commented Jan 8, 2016

0xmichalis commented Jan 8, 2016

0xmichalis commented Jan 8, 2016

ironcladlou commented Jan 8, 2016

ironcladlou commented Jan 8, 2016

k8s-ci-robot commented Oct 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgrant0607 commented Oct 26, 2016

k8s-ci-robot commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

0xmichalis commented Oct 27, 2016

k8s-github-robot commented Oct 28, 2016

bgrant0607 commented Oct 28, 2016

apelisse commented Nov 1, 2016

0xmichalis commented Nov 1, 2016

smarterclayton commented Nov 1, 2016

0xmichalis commented Jan 6, 2016 •

edited

Loading