Skip to content

Bug 1329138: stop emitting events on update conflicts#8652

Merged
openshift-bot merged 1 commit intoopenshift:masterfrom
0xmichalis:bug-1329138
May 11, 2016
Merged

Bug 1329138: stop emitting events on update conflicts#8652
openshift-bot merged 1 commit intoopenshift:masterfrom
0xmichalis:bug-1329138

Conversation

@0xmichalis
Copy link
Copy Markdown
Contributor

@0xmichalis 0xmichalis commented Apr 27, 2016

Update conflicts for deployments are pretty common since they are
handled by three different controllers (kube: rc manager, origin:
deployer pod controller, deployment controller) and their events
stay attached on deploymentconfigs which may confuse users ("My
deployment is running but I have this event over there talking about
an update conflict"). This commit makes it so those events will be
superseded by successful updates.

ref: https://bugzilla.redhat.com/show_bug.cgi?id=1329138
Closes #8630

@ironcladlou @smarterclayton

@ironcladlou
Copy link
Copy Markdown
Contributor

LGTM, we could use way more events across the board

@0xmichalis
Copy link
Copy Markdown
Contributor Author

@ironcladlou what worries me here is event flooding on the dc

@smarterclayton
Copy link
Copy Markdown
Contributor

smarterclayton commented Apr 28, 2016 via email

@0xmichalis
Copy link
Copy Markdown
Contributor Author

Why do we emit an event on conflicts?

Not sure, @ironcladlou ?

@ironcladlou
Copy link
Copy Markdown
Contributor

Why do we emit an event on conflicts?
Not sure, @ironcladlou ?

We shouldn't do that anymore- it's a transient error.

@0xmichalis 0xmichalis changed the title Bug 1329138: Emit dc events on successful status updates Bug 1329138: stop emitting events on update conflicts Apr 29, 2016
@0xmichalis
Copy link
Copy Markdown
Contributor Author

updated to stop emitting events on update failures

@smarterclayton
Copy link
Copy Markdown
Contributor

Maybe I missed it, where are we ignoring conflict errors so as not to send events?

@0xmichalis
Copy link
Copy Markdown
Contributor Author

Maybe I missed it, where are we ignoring conflict errors so as not to send events?

https://github.com/openshift/origin/pull/8652/files#diff-fa925ff7d2acc462649ccd349e76d981L206

} else {
c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
}
c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can conflict here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with not touching this until we really need to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean.

An event of the form "you got a conflict" is not something we should ever be sending, unless we are completely giving up. We should only emit events when the failure is meaningful to an end user, or because of the conflict, we will not retry for a very long time. Neither of those apply here, and given the potential for contention, we are generating meaningless events for end users. Those events reduce comprehension and debuggability, not improve it.

So in general, the only conflict error that should result in an event being sent is when we drop an item from the retry queue until the next sync interval due to too many conflicts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you find other failure modes useful for events here or should I remove emitting altogether?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the ones that are transient, probably not, if we generate an event
when they would drop out of the retry queue. Do we do that today?

On Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote:

In pkg/deploy/controller/deployment/controller.go
#8652 (comment):

@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {

  •     if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {
    
  •         c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
    
  •     } else {
    
  •         c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
    
  •     }
    
  •     c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))
    

Do you find other failure modes useful for events here or should I remove
emitting altogether?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

On Sun, May 1, 2016 at 2:21 AM, Clayton Coleman notifications@github.com
wrote:

In pkg/deploy/controller/deployment/controller.go
#8652 (comment):

@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {

  •       if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {
    
  •           c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
    
  •       } else {
    
  •           c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
    
  •       }
    
  •       c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))
    

Any of the ones that are transient, probably not, if we generate an
event when they would drop out of the retry queue. Do we do that today? On
Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote: In pkg/deploy/controller/deployment/controller.go <#8652 (comment)
https://github.com/openshift/origin/pull/8652#discussion_r61676784>:
Do you find other failure modes useful for events here or should I remove
emitting altogether? — You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub <
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61677340

@0xmichalis
Copy link
Copy Markdown
Contributor Author

@smarterclayton updated to emit events for transient errors when deployments are about to get dropped from the retry loop

} else {
c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error getting existing deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)
}
c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error getting existing deployer pod: %v", err))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a transient error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything that isn't a fatal error, will be retried so yes.

@smarterclayton
Copy link
Copy Markdown
Contributor

One comment, then looks good tome.

@smarterclayton
Copy link
Copy Markdown
Contributor

[test]

@0xmichalis
Copy link
Copy Markdown
Contributor Author

last comment addressed. [merge]

Update conflicts for deployments are pretty common since they are
handled by three different controllers (kube: rc manager, origin:
deployer pod controller, deployment controller) and their events
stay attached on deploymentconfigs which may confuse users ("My
deployment is running but I have this event over there talking about
an update conflict"). Since those errors are retried by the controller
there is no reason to emit events for them.
@openshift-bot
Copy link
Copy Markdown
Contributor

Evaluated for origin test up to 68c7073

@openshift-bot
Copy link
Copy Markdown
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/3731/)

@openshift-bot
Copy link
Copy Markdown
Contributor

openshift-bot commented May 10, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5877/) (Image: devenv-rhel7_4158)

@smarterclayton
Copy link
Copy Markdown
Contributor

Yum mirror failure

[merge]

On Tue, May 10, 2016 at 6:50 PM, OpenShift Bot notifications@github.com
wrote:

continuous-integration/openshift-jenkins/merge FAILURE (
https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5874/
)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8652 (comment)

@openshift-bot
Copy link
Copy Markdown
Contributor

Evaluated for origin merge up to 68c7073

@openshift-bot openshift-bot merged commit c8f5531 into openshift:master May 11, 2016
@0xmichalis 0xmichalis deleted the bug-1329138 branch May 11, 2016 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants