fix(peering): Log error metrics more better #3647

marchello2000 · 2020-05-01T16:28:42Z

Not all exceptions were caught and thus would leak out of the PeeringAgent and weren't logged.
Additionally, I threw a retry on the most egregious SQL query in case it fails due to timeout.

Not all exceptions were caught and thus would leak out of the `PeeringAgent` and weren't logged. Additionally, threw a retry on the most eggregious SQL query in case it fails due to timeout.

dreynaud · 2020-05-01T17:10:08Z

orca-peering/src/main/kotlin/com/netflix/spinnaker/orca/peering/MySqlRawAccess.kt

@@ -3,6 +3,10 @@ package com.netflix.spinnaker.orca.peering
 import com.netflix.spinnaker.kork.exceptions.SystemException
 import com.netflix.spinnaker.kork.sql.routing.withPool
 import com.netflix.spinnaker.orca.api.pipeline.models.ExecutionType
+import io.github.resilience4j.retry.Retry
+import io.github.resilience4j.retry.RetryConfig
+import io.vavr.control.Try


dreynaud · 2020-05-01T17:31:46Z

orca-peering/src/main/kotlin/com/netflix/spinnaker/orca/peering/PeeringAgent.kt

-    var hadFailures = false
-    var orchestrationsDeleted = 0
-    var pipelinesDeleted = 0
+      try {


I just noticed the code duplication here. I would suggest the following:

data class DeletionResult(val numDeleted: Int, val hadFailures: Boolean) private fun delete(executionType: ExecutionType, idsToDelete: List<String>): DeletionResult { var numDeleted = 0 var hadFailures = false try { numDeleted = destDB.deleteExecutions(executionType, idsToDelete) peeringMetrics.incrementNumDeleted(executionType, numDeleted) } catch (e: Exception) { log.error("Failed to delete some $executionType", e) peeringMetrics.incrementNumErrors(executionType) hadFailures = true } return DeletionResult(numDeleted, hadFailures) }

And then in peerDeletedExecutions, it becomes simply:

val (orchestrationsDeleted, orchestrationsHadFailures) = delete(ExecutionType.ORCHESTRATION, orchestrationIdsToDelete) val (pipelinesDeleted, pipelinesHadFailures) = delete(ExecutionType.PIPELINE, pipelineIdsToDelete)

thanks, i like it!

dreynaud · 2020-05-01T17:34:16Z

orca-peering/src/main/kotlin/com/netflix/spinnaker/orca/peering/PeeringAgent.kt

@@ -76,7 +75,7 @@ class PeeringAgent(
  override fun tick() {
    if (dynamicConfigService.isEnabled("pollers.peering", true) &&
      dynamicConfigService.isEnabled("pollers.peering.$peeredId", true)) {
-      peeringMetrics.recordOverallLag() {
+      peeringMetrics.recordOverallLag {


is it the same with and without the parens? I'm confused

yes, i guess spotless did it.. but yeah, i guess that's kotlin convention: https://kotlinlang.org/docs/reference/lambdas.html#passing-a-lambda-to-the-last-parameter

dreynaud · 2020-05-01T17:36:11Z

orca-peering/src/main/kotlin/com/netflix/spinnaker/orca/peering/PeeringAgent.kt

    } catch (e: Exception) {
-      log.error("Failed to delete some pipelines", e)
+      log.error("Failed to delete some executions", e)
+      peeringMetrics.incrementNumErrors(ExecutionType.ORCHESTRATION)
      peeringMetrics.incrementNumErrors(ExecutionType.PIPELINE)


ah hmmm... it wouldn't be terribad to pass an execution type to peerDeletedExecutions and call it twice, right? Then the metrics wouldn't have to lie 😅

it's a bit sucky since there are both pipelines and orchestrations in that deleted_executions table...

cfieber · 2020-05-01T18:27:25Z

orca-peering/src/main/kotlin/com/netflix/spinnaker/orca/peering/MySqlRawAccess.kt

@@ -204,4 +210,16 @@ open class MySqlRawAccess(

    return persisted
  }
+
+  private fun <T> withRetry(action: () -> T): T {


you can also do this with a retry annotation:
https://github.com/spinnaker/fiat/blob/64021e98f8d55c11a83149dc8aacdb854342c777/fiat-roles/src/main/java/com/netflix/spinnaker/fiat/providers/internal/Front50DataLoader.java#L39

And then make the options configurable/overridable via spring:
https://github.com/spinnaker/fiat/blob/64021e98f8d55c11a83149dc8aacdb854342c777/fiat-roles/src/main/resources/resilience4j-defaults.properties

now that I say that, the annotation approach would change this from returning a function to executing the retry so never mind

there was also some weird thing that @jonsie and I tried to figure out (the annotation wasn't working somewhere... river maybe?) anyway, it was driving me mad so I didn't want to chance it here :)

* fix(peering): Log error metrics more better Not all exceptions were caught and thus would leak out of the `PeeringAgent` and weren't logged. Additionally, threw a retry on the most eggregious SQL query in case it fails due to timeout. * fixup! fix(peering): Log error metrics more better Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

fix(peering): Log error metrics more better

66de401

Not all exceptions were caught and thus would leak out of the `PeeringAgent` and weren't logged. Additionally, threw a retry on the most eggregious SQL query in case it fails due to timeout.

marchello2000 requested a review from dreynaud May 1, 2020 16:28

dreynaud reviewed May 1, 2020

View reviewed changes

fixup! fix(peering): Log error metrics more better

655d863

cfieber approved these changes May 1, 2020

View reviewed changes

marchello2000 added the ready to merge Approved and ready for merge label May 3, 2020

mergify bot added the auto merged Merged automatically by a bot label May 3, 2020

Merge branch 'master' into mark/peering_log_errors

6d20582

mergify bot merged commit d464708 into spinnaker:master May 3, 2020

spinnakerbot added the target-release/1.21 label May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(peering): Log error metrics more better #3647

fix(peering): Log error metrics more better #3647

marchello2000 commented May 1, 2020

dreynaud May 1, 2020

dreynaud May 1, 2020 •

edited

Loading

marchello2000 May 1, 2020

dreynaud May 1, 2020

marchello2000 May 1, 2020

dreynaud May 1, 2020

marchello2000 May 1, 2020

cfieber May 1, 2020

cfieber May 1, 2020

marchello2000 May 1, 2020

fix(peering): Log error metrics more better #3647

fix(peering): Log error metrics more better #3647

Conversation

marchello2000 commented May 1, 2020

Choose a reason for hiding this comment

dreynaud May 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreynaud May 1, 2020 •

edited

Loading