-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(logging): add fabulous logging for when pipelines fail to trigger #555
fix(logging): add fabulous logging for when pipelines fail to trigger #555
Conversation
The impetus for this change is to be able to better alert on and understand pipeline trigger failures. We've been bit by trigger failures that went undiscovered (sometimes for days) until our users complained. With this change we can reliable alert on `orca.trigger.errors` metric which is incremented when kicking of a pipeline fails (unlike the previous metric `orca.error` which was sometimes triggered erroneously). When triggering does fail, this change will log, in much more detail, why the trigger failed as well as log the full payload submitted to `orca` for quick and easy debugging/reproducability of the issue. Bonus 1: This change (mostly) removes the reliance on Rx and replaces it with an executor to trigger pipelines. Bonus 2: Fixup calls to `front50` to allow them to be anonymous
@@ -27,7 +27,6 @@ dependencies { | |||
implementation "com.netflix.spinnaker.kork:kork-core" | |||
implementation "com.netflix.spinnaker.kork:kork-artifacts" | |||
implementation "com.netflix.spinnaker.kork:kork-web" | |||
implementation "io.reactivex:rxjava" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Honestly, I am still not convinced that the
So I don't see the point... but I am going to keep the executor for now (to maintain behavior as before) |
|
||
log.warn("Retrying pipeline trigger for {}", pipeline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.warn("Retrying pipeline trigger (attempt {}/{}) for {}", attempts, retryCount, pipeline);
try { | ||
attempts++; | ||
return orca.trigger(pipeline); | ||
} catch (RetrofitError e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to log e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we log the last one (i.e. if all retries fail). otherwise any transitory errors make it noisy... but i could use splainer style and only log it all when the last one fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be a matter of preference but I'd rather see all exceptions as they happen rather than just the last one. They each are their own event, me thinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will put them in as warnings, then it's clear it's not fatal but we still see the issues. will also add a orca.trigger.retry
metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dreynaud likes
public enum TriggerSource { | ||
SCHEDULER, | ||
MISSEDSCHEDULER, | ||
EVENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be interesting to have a different value for the different types of events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in jenkins vs. docker, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to logOrcaErrorMetric
I APPROVE THIS |
String orcaResponse = "N/A"; | ||
|
||
if (e.getResponse() != null && e.getResponse().getBody() != null) { | ||
new String(((TypedByteArray) e.getResponse().getBody()).getBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be assigned to something? like orcaResponse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um... yes. thank you! i refactored this and messed up
@ezimanyi you may be interested in this! |
…lines fail to trigger
…ution previous change (spinnaker#555) moved orca invocation to an executor thread without propagating MDC context.
…ution previous change (spinnaker#555) moved orca invocation to an executor thread without propagating MDC context.
The impetus for this change is to be able to better alert on and understand pipeline trigger failures.
We've been bit by trigger failures that went undiscovered (sometimes for days) until our users complained.
With this change we can reliable alert on
orca.trigger.errors
metric which is incremented when kickingof a pipeline fails (unlike the previous metric
orca.error
which was sometimes triggered erroneously).When triggering does fail, this change will log, in much more detail, why the trigger failed as well as
log the full payload submitted to
orca
for quick and easy debugging/reproducability of the issue.Bonus 1: This change (mostly) removes the reliance on Rx and replaces it with an executor to
trigger pipelines.
Bonus 2: Fixup calls to
front50
to allow them to be anonymous