Bug 1329138: stop emitting events on update conflicts by 0xmichalis · Pull Request #8652 · openshift/origin

0xmichalis · 2016-04-27T16:59:51Z

Update conflicts for deployments are pretty common since they are
handled by three different controllers (kube: rc manager, origin:
deployer pod controller, deployment controller) and their events
stay attached on deploymentconfigs which may confuse users ("My
deployment is running but I have this event over there talking about
an update conflict"). This commit makes it so those events will be
superseded by successful updates.

ref: https://bugzilla.redhat.com/show_bug.cgi?id=1329138
Closes #8630

@ironcladlou @smarterclayton

ironcladlou · 2016-04-27T17:18:55Z

LGTM, we could use way more events across the board

0xmichalis · 2016-04-27T19:09:54Z

@ironcladlou what worries me here is event flooding on the dc

smarterclayton · 2016-04-28T22:35:41Z

Why do we emit an event on conflicts?

0xmichalis · 2016-04-29T07:48:13Z

Why do we emit an event on conflicts?

Not sure, @ironcladlou ?

ironcladlou · 2016-04-29T13:02:50Z

Why do we emit an event on conflicts?
Not sure, @ironcladlou ?

We shouldn't do that anymore- it's a transient error.

0xmichalis · 2016-04-29T13:20:38Z

updated to stop emitting events on update failures

smarterclayton · 2016-04-30T00:08:21Z

Maybe I missed it, where are we ignoring conflict errors so as not to send events?

0xmichalis · 2016-04-30T00:27:44Z

Maybe I missed it, where are we ignoring conflict errors so as not to send events?

https://github.com/openshift/origin/pull/8652/files#diff-fa925ff7d2acc462649ccd349e76d981L206

smarterclayton · 2016-04-30T01:12:21Z

pkg/deploy/controller/deployment/controller.go

-			} else {
-				c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
-			}
+			c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))


You can conflict here

I am ok with not touching this until we really need to.

Not sure what you mean.

An event of the form "you got a conflict" is not something we should ever be sending, unless we are completely giving up. We should only emit events when the failure is meaningful to an end user, or because of the conflict, we will not retry for a very long time. Neither of those apply here, and given the potential for contention, we are generating meaningless events for end users. Those events reduce comprehension and debuggability, not improve it.

So in general, the only conflict error that should result in an event being sent is when we drop an item from the retry queue until the next sync interval due to too many conflicts.

Do you find other failure modes useful for events here or should I remove emitting altogether?

Any of the ones that are transient, probably not, if we generate an event
when they would drop out of the retry queue. Do we do that today?

On Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote:

In pkg/deploy/controller/deployment/controller.go
#8652 (comment):

@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {

if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {

c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)

} else {

c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)

}

c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))

Do you find other failure modes useful for events here or should I remove
emitting altogether?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784

Nope

On Sun, May 1, 2016 at 2:21 AM, Clayton Coleman notifications@github.com
wrote:

In pkg/deploy/controller/deployment/controller.go
#8652 (comment):

@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {

if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {

c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)

} else {

c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)

}

c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))

Any of the ones that are transient, probably not, if we generate an
event when they would drop out of the retry queue. Do we do that today? On
Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote: In pkg/deploy/controller/deployment/controller.go <#8652 (comment)
https://github.com/openshift/origin/pull/8652#discussion_r61676784>:
Do you find other failure modes useful for events here or should I remove
emitting altogether? — You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub <
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61677340

0xmichalis · 2016-05-04T14:48:54Z

@smarterclayton updated to emit events for transient errors when deployments are about to get dropped from the retry loop

smarterclayton · 2016-05-07T05:37:38Z

pkg/deploy/controller/deployment/controller.go

-			} else {
-				c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error getting existing deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
-			}
+			c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error getting existing deployer pod: %v", err))


Isn't this a transient error?

Everything that isn't a fatal error, will be retried so yes.

smarterclayton · 2016-05-09T18:53:55Z

One comment, then looks good tome.

smarterclayton · 2016-05-09T18:54:05Z

[test]

0xmichalis · 2016-05-10T15:23:48Z

last comment addressed. [merge]

Update conflicts for deployments are pretty common since they are handled by three different controllers (kube: rc manager, origin: deployer pod controller, deployment controller) and their events stay attached on deploymentconfigs which may confuse users ("My deployment is running but I have this event over there talking about an update conflict"). Since those errors are retried by the controller there is no reason to emit events for them.

openshift-bot · 2016-05-10T20:40:22Z

Evaluated for origin test up to 68c7073

openshift-bot · 2016-05-10T21:40:20Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/3731/)

openshift-bot · 2016-05-10T22:50:16Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5877/) (Image: devenv-rhel7_4158)

smarterclayton · 2016-05-10T23:55:25Z

Yum mirror failure

[merge]

On Tue, May 10, 2016 at 6:50 PM, OpenShift Bot notifications@github.com
wrote:

continuous-integration/openshift-jenkins/merge FAILURE (
https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5874/
)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8652 (comment)

openshift-bot · 2016-05-11T00:00:23Z

Evaluated for origin merge up to 68c7073

0xmichalis changed the title ~~Bug 1329138: Emit dc events on successful status updates~~ Bug 1329138: stop emitting events on update conflicts Apr 29, 2016

0xmichalis added the component/apps label Apr 30, 2016

0xmichalis assigned ironcladlou Apr 30, 2016

smarterclayton reviewed Apr 30, 2016
View reviewed changes

smarterclayton reviewed May 7, 2016
View reviewed changes

openshift-bot merged commit c8f5531 into openshift:master May 11, 2016

0xmichalis deleted the bug-1329138 branch May 11, 2016 08:21

0xmichalis mentioned this pull request May 23, 2016

Bug 1338679: emit events on failure to create a deployer pod #8978

Merged

Conversation

0xmichalis commented Apr 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ironcladlou commented Apr 27, 2016

Uh oh!

0xmichalis commented Apr 27, 2016

Uh oh!

smarterclayton commented Apr 28, 2016 via email

Uh oh!

0xmichalis commented Apr 29, 2016

Uh oh!

ironcladlou commented Apr 29, 2016

Uh oh!

0xmichalis commented Apr 29, 2016

Uh oh!

smarterclayton commented Apr 30, 2016

Uh oh!

0xmichalis commented Apr 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xmichalis commented May 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smarterclayton commented May 9, 2016

Uh oh!

smarterclayton commented May 9, 2016

Uh oh!

0xmichalis commented May 10, 2016

Uh oh!

openshift-bot commented May 10, 2016

Uh oh!

openshift-bot commented May 10, 2016

Uh oh!

openshift-bot commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarterclayton commented May 10, 2016

Uh oh!

openshift-bot commented May 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0xmichalis commented Apr 27, 2016 •

edited

Loading

openshift-bot commented May 10, 2016 •

edited

Loading