fix(logging): add fabulous logging for when pipelines fail to trigger #555

marchello2000 · 2019-05-20T18:20:38Z

The impetus for this change is to be able to better alert on and understand pipeline trigger failures.
We've been bit by trigger failures that went undiscovered (sometimes for days) until our users complained.
With this change we can reliable alert on orca.trigger.errors metric which is incremented when kicking
of a pipeline fails (unlike the previous metric orca.error which was sometimes triggered erroneously).

When triggering does fail, this change will log, in much more detail, why the trigger failed as well as
log the full payload submitted to orca for quick and easy debugging/reproducability of the issue.

Bonus 1: This change (mostly) removes the reliance on Rx and replaces it with an executor to
trigger pipelines.

Bonus 2: Fixup calls to front50 to allow them to be anonymous

The impetus for this change is to be able to better alert on and understand pipeline trigger failures. We've been bit by trigger failures that went undiscovered (sometimes for days) until our users complained. With this change we can reliable alert on `orca.trigger.errors` metric which is incremented when kicking of a pipeline fails (unlike the previous metric `orca.error` which was sometimes triggered erroneously). When triggering does fail, this change will log, in much more detail, why the trigger failed as well as log the full payload submitted to `orca` for quick and easy debugging/reproducability of the issue. Bonus 1: This change (mostly) removes the reliance on Rx and replaces it with an executor to trigger pipelines. Bonus 2: Fixup calls to `front50` to allow them to be anonymous

dreynaud · 2019-05-20T18:25:56Z

echo-notifications/echo-notifications.gradle

@@ -27,7 +27,6 @@ dependencies {
    implementation "com.netflix.spinnaker.kork:kork-core"
    implementation "com.netflix.spinnaker.kork:kork-artifacts"
    implementation "com.netflix.spinnaker.kork:kork-web"
-    implementation "io.reactivex:rxjava"


marchello2000 · 2019-05-20T18:26:33Z

Honestly, I am still not convinced that the executorService is needed:

cron triggers already happen on quartz thread
events are already Rx'd on IOScheduler here

So I don't see the point... but I am going to keep the executor for now (to maintain behavior as before)

dreynaud · 2019-05-20T19:59:24Z

...iggers/src/main/java/com/netflix/spinnaker/echo/pipelinetriggers/orca/PipelineInitiator.java


+      log.warn("Retrying pipeline trigger for {}", pipeline);


log.warn("Retrying pipeline trigger (attempt {}/{}) for {}", attempts, retryCount, pipeline);

dreynaud · 2019-05-20T20:00:11Z

...iggers/src/main/java/com/netflix/spinnaker/echo/pipelinetriggers/orca/PipelineInitiator.java

+      try {
+        attempts++;
+        return orca.trigger(pipeline);
+      } catch (RetrofitError e) {


would be nice to log e

well, we log the last one (i.e. if all retries fail). otherwise any transitory errors make it noisy... but i could use splainer style and only log it all when the last one fails?

may be a matter of preference but I'd rather see all exceptions as they happen rather than just the last one. They each are their own event, me thinks.

i will put them in as warnings, then it's clear it's not fatal but we still see the issues. will also add a orca.trigger.retry metric

dreynaud likes

dreynaud · 2019-05-20T20:02:18Z

...iggers/src/main/java/com/netflix/spinnaker/echo/pipelinetriggers/orca/PipelineInitiator.java

+  public enum TriggerSource {
+    SCHEDULER,
+    MISSEDSCHEDULER,
+    EVENT


might be interesting to have a different value for the different types of events

as in jenkins vs. docker, etc?

added to logOrcaErrorMetric

…trigger

…ail to trigger

dreynaud · 2019-05-21T17:11:02Z

I APPROVE THIS

emjburns · 2019-05-21T17:17:55Z

...iggers/src/main/java/com/netflix/spinnaker/echo/pipelinetriggers/orca/PipelineInitiator.java

+              String orcaResponse = "N/A";
+
+              if (e.getResponse() != null && e.getResponse().getBody() != null) {
+                new String(((TypedByteArray) e.getResponse().getBody()).getBytes());


should this be assigned to something? like orcaResponse?

um... yes. thank you! i refactored this and messed up

emjburns · 2019-05-21T17:20:56Z

@ezimanyi you may be interested in this!

…lines fail to trigger

…ution previous change (spinnaker#555) moved orca invocation to an executor thread without propagating MDC context.

…ution (#560) previous change (#555) moved orca invocation to an executor thread without propagating MDC context.

marchello2000 requested review from robzienert and dreynaud May 20, 2019 18:20

dreynaud reviewed May 20, 2019

View reviewed changes

marchello2000 added 2 commits May 20, 2019 14:30

fixup! fix(logging): add fabulous logging for when pipelines fail to …

45d6b8f

…trigger

fixup! fixup! fix(logging): add fabulous logging for when pipelines f…

ef0af0d

…ail to trigger

dreynaud self-requested a review May 21, 2019 17:09

dreynaud approved these changes May 21, 2019

View reviewed changes

emjburns reviewed May 21, 2019

View reviewed changes

fixup! fixup! fixup! fix(logging): add fabulous logging for when pipe…

51333d9

…lines fail to trigger

marchello2000 merged commit ae23f03 into spinnaker:master May 21, 2019

spinnakerbot added the target-release/1.15 label May 21, 2019

marchello2000 mentioned this pull request May 23, 2019

fix(auth): propagate MDC across the thread boundary for pipeline execution #560

Merged

marchello2000 added a commit that referenced this pull request May 24, 2019

fix(auth): propagate MDC across the thread boundary for pipeline exec…

d2a4076

…ution (#560) previous change (#555) moved orca invocation to an executor thread without propagating MDC context.

marchello2000 deleted the mark/logging_extravaganza branch March 11, 2020 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(logging): add fabulous logging for when pipelines fail to trigger #555

fix(logging): add fabulous logging for when pipelines fail to trigger #555

marchello2000 commented May 20, 2019

dreynaud May 20, 2019

marchello2000 commented May 20, 2019

dreynaud May 20, 2019

dreynaud May 20, 2019

marchello2000 May 20, 2019

dreynaud May 20, 2019

marchello2000 May 20, 2019

dreynaud May 20, 2019

dreynaud May 20, 2019

marchello2000 May 20, 2019

dreynaud May 20, 2019

marchello2000 May 20, 2019

dreynaud commented May 21, 2019

emjburns May 21, 2019

marchello2000 May 21, 2019

emjburns commented May 21, 2019

fix(logging): add fabulous logging for when pipelines fail to trigger #555

fix(logging): add fabulous logging for when pipelines fail to trigger #555

Conversation

marchello2000 commented May 20, 2019

Choose a reason for hiding this comment

marchello2000 commented May 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreynaud commented May 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emjburns commented May 21, 2019