Bug 1329138: stop emitting events on update conflicts#8652
Bug 1329138: stop emitting events on update conflicts#8652openshift-bot merged 1 commit intoopenshift:masterfrom 0xmichalis:bug-1329138
Conversation
|
LGTM, we could use way more events across the board |
|
@ironcladlou what worries me here is event flooding on the dc |
|
Why do we emit an event on conflicts?
|
Not sure, @ironcladlou ? |
We shouldn't do that anymore- it's a transient error. |
|
updated to stop emitting events on update failures |
|
Maybe I missed it, where are we ignoring conflict errors so as not to send events? |
https://github.com/openshift/origin/pull/8652/files#diff-fa925ff7d2acc462649ccd349e76d981L206 |
| } else { | ||
| c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err) | ||
| } | ||
| c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err)) |
There was a problem hiding this comment.
You can conflict here
There was a problem hiding this comment.
I am ok with not touching this until we really need to.
There was a problem hiding this comment.
Not sure what you mean.
An event of the form "you got a conflict" is not something we should ever be sending, unless we are completely giving up. We should only emit events when the failure is meaningful to an end user, or because of the conflict, we will not retry for a very long time. Neither of those apply here, and given the potential for contention, we are generating meaningless events for end users. Those events reduce comprehension and debuggability, not improve it.
So in general, the only conflict error that should result in an event being sent is when we drop an item from the retry queue until the next sync interval due to too many conflicts.
There was a problem hiding this comment.
Do you find other failure modes useful for events here or should I remove emitting altogether?
There was a problem hiding this comment.
Any of the ones that are transient, probably not, if we generate an event
when they would drop out of the retry queue. Do we do that today?
On Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote:
In pkg/deploy/controller/deployment/controller.go
#8652 (comment):
@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {
if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)} else {c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)}c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))
Do you find other failure modes useful for events here or should I remove
emitting altogether?
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784
There was a problem hiding this comment.
Nope
On Sun, May 1, 2016 at 2:21 AM, Clayton Coleman notifications@github.com
wrote:
In pkg/deploy/controller/deployment/controller.go
#8652 (comment):@@ -116,16 +109,12 @@ func (c *DeploymentController) Handle(deployment *kapi.ReplicationController) er
deploymentPod, err := c.podClient.createPod(deployment.Namespace, podTemplate)
// Retry on error.
if err != nil {
if config, decodeErr := c.decodeConfig(deployment); decodeErr == nil {c.recorder.Eventf(config, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)} else {c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error creating deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err)}c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error creating deployer pod", err))Any of the ones that are transient, probably not, if we generate an
event when they would drop out of the retry queue. Do we do that today? On
Apr 30, 2016, at 7:18 PM, Michail Kargakis notifications@github.com
wrote: In pkg/deploy/controller/deployment/controller.go <#8652 (comment)
https://github.com/openshift/origin/pull/8652#discussion_r61676784>:
Do you find other failure modes useful for events here or should I remove
emitting altogether? — You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub <
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61676784—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8652/files/7d9f3a30fbfe89487a3bc1322f79fb29f4ad4b68#r61677340
|
@smarterclayton updated to emit events for transient errors when deployments are about to get dropped from the retry loop |
| } else { | ||
| c.recorder.Eventf(deployment, kapi.EventTypeWarning, "FailedCreate", "Error getting existing deployer pod for %s: %v", deployutil.LabelForDeployment(deployment), err) | ||
| } | ||
| c.emitDeploymentEvent(deployment, config, kapi.EventTypeWarning, "FailedCreate", fmt.Sprintf("Error getting existing deployer pod: %v", err)) |
There was a problem hiding this comment.
Isn't this a transient error?
There was a problem hiding this comment.
Everything that isn't a fatal error, will be retried so yes.
|
One comment, then looks good tome. |
|
[test] |
|
last comment addressed. [merge] |
Update conflicts for deployments are pretty common since they are
handled by three different controllers (kube: rc manager, origin:
deployer pod controller, deployment controller) and their events
stay attached on deploymentconfigs which may confuse users ("My
deployment is running but I have this event over there talking about
an update conflict"). Since those errors are retried by the controller
there is no reason to emit events for them.
|
Evaluated for origin test up to 68c7073 |
|
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/3731/) |
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5877/) (Image: devenv-rhel7_4158) |
|
Yum mirror failure [merge] On Tue, May 10, 2016 at 6:50 PM, OpenShift Bot notifications@github.com
|
|
Evaluated for origin merge up to 68c7073 |
Update conflicts for deployments are pretty common since they are
handled by three different controllers (kube: rc manager, origin:
deployer pod controller, deployment controller) and their events
stay attached on deploymentconfigs which may confuse users ("My
deployment is running but I have this event over there talking about
an update conflict"). This commit makes it so those events will be
superseded by successful updates.
ref: https://bugzilla.redhat.com/show_bug.cgi?id=1329138
Closes #8630
@ironcladlou @smarterclayton