Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(peering): Log error metrics more better #3647

Merged
merged 3 commits into from
May 3, 2020

Conversation

marchello2000
Copy link
Contributor

Not all exceptions were caught and thus would leak out of the PeeringAgent and weren't logged.
Additionally, I threw a retry on the most egregious SQL query in case it fails due to timeout.

Not all exceptions were caught and thus would leak out of the `PeeringAgent` and weren't logged.
Additionally, threw a retry on the most eggregious SQL query in case it fails due to timeout.
@@ -3,6 +3,10 @@ package com.netflix.spinnaker.orca.peering
import com.netflix.spinnaker.kork.exceptions.SystemException
import com.netflix.spinnaker.kork.sql.routing.withPool
import com.netflix.spinnaker.orca.api.pipeline.models.ExecutionType
import io.github.resilience4j.retry.Retry
import io.github.resilience4j.retry.RetryConfig
import io.vavr.control.Try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, TIL

var hadFailures = false
var orchestrationsDeleted = 0
var pipelinesDeleted = 0
try {
Copy link
Contributor

@dreynaud dreynaud May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed the code duplication here. I would suggest the following:

  data class DeletionResult(val numDeleted: Int, val hadFailures: Boolean)

  private fun delete(executionType: ExecutionType, idsToDelete: List<String>): DeletionResult {
    var numDeleted = 0
    var hadFailures = false
    try {
      numDeleted = destDB.deleteExecutions(executionType, idsToDelete)
      peeringMetrics.incrementNumDeleted(executionType, numDeleted)
    } catch (e: Exception) {
      log.error("Failed to delete some $executionType", e)
      peeringMetrics.incrementNumErrors(executionType)
      hadFailures = true
    }

    return DeletionResult(numDeleted, hadFailures)
  }

And then in peerDeletedExecutions, it becomes simply:

    val (orchestrationsDeleted, orchestrationsHadFailures) = delete(ExecutionType.ORCHESTRATION, orchestrationIdsToDelete)
    val (pipelinesDeleted, pipelinesHadFailures) = delete(ExecutionType.PIPELINE, pipelineIdsToDelete)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i like it!

@@ -76,7 +75,7 @@ class PeeringAgent(
override fun tick() {
if (dynamicConfigService.isEnabled("pollers.peering", true) &&
dynamicConfigService.isEnabled("pollers.peering.$peeredId", true)) {
peeringMetrics.recordOverallLag() {
peeringMetrics.recordOverallLag {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same with and without the parens? I'm confused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i guess spotless did it.. but yeah, i guess that's kotlin convention: https://kotlinlang.org/docs/reference/lambdas.html#passing-a-lambda-to-the-last-parameter

} catch (e: Exception) {
log.error("Failed to delete some pipelines", e)
log.error("Failed to delete some executions", e)
peeringMetrics.incrementNumErrors(ExecutionType.ORCHESTRATION)
peeringMetrics.incrementNumErrors(ExecutionType.PIPELINE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah hmmm... it wouldn't be terribad to pass an execution type to peerDeletedExecutions and call it twice, right? Then the metrics wouldn't have to lie 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit sucky since there are both pipelines and orchestrations in that deleted_executions table...

@@ -204,4 +210,16 @@ open class MySqlRawAccess(

return persisted
}

private fun <T> withRetry(action: () -> T): T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I say that, the annotation approach would change this from returning a function to executing the retry so never mind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was also some weird thing that @jonsie and I tried to figure out (the annotation wasn't working somewhere... river maybe?) anyway, it was driving me mad so I didn't want to chance it here :)

@marchello2000 marchello2000 added the ready to merge Approved and ready for merge label May 3, 2020
@mergify mergify bot added the auto merged Merged automatically by a bot label May 3, 2020
@mergify mergify bot merged commit d464708 into spinnaker:master May 3, 2020
KathrynLewis pushed a commit to KathrynLewis/orca that referenced this pull request Jan 31, 2021
* fix(peering): Log error metrics more better

Not all exceptions were caught and thus would leak out of the `PeeringAgent` and weren't logged.
Additionally, threw a retry on the most eggregious SQL query in case it fails due to timeout.

* fixup! fix(peering): Log error metrics more better

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto merged Merged automatically by a bot ready to merge Approved and ready for merge target-release/1.21
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants