Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.

Improve handling of SLI queries that don't produce a single value #728

Closed
grabnerandi opened this issue Mar 3, 2022 · 3 comments · Fixed by #733
Closed

Improve handling of SLI queries that don't produce a single value #728

grabnerandi opened this issue Mar 3, 2022 · 3 comments · Fixed by #733
Assignees
Labels
feature request New feature request

Comments

@grabnerandi
Copy link
Contributor

Currently when a single metric query fails to deliver any data the dynatrace-service returns the message: "Metrics query result has no data". This is correct - but - doesnt tell the user anything which metric query resulted in no data

This is the only message that will be shown in the bridge indicating why the evaluation failed. to troubleshoot the issue the user needs to dig deep into the payload of the get-sli.finished event.

To fix this I suggest to simply list the names of the SLIs that failed as part of the error message. Here is an example payload with the new message:

{
  "data": {
    "get-sli": {
      "indicatorValues": [
        {
          "metric": "Performance_SLO",
          "success": true,
          "value": 99.48424304709904
        },
        {
          "message": "Metrics query result has no data",
          "metric": "problems",
          "success": false,
          "value": 0
        },
        {
          "metric": "response_time_p90",
          "success": true,
          "value": 1238.0154590423788
        },
        {
          "metric": "response_time_p75",
          "success": true,
          "value": 639.2749136467561
        },
        {
          "message": "Metrics query result has no data",
          "metric": "request_error_rate",
          "success": false,
          "value": 0
        }
      ],
    },
    "message": "Metrics query for request_error_rate, problems resulted in no data",
    "project": "dynatrace",
    "result": "fail",
    "service": "myservice",
    "stage": "quality-gate",
    "status": "succeeded"
  },
  "source": "dynatrace-service",
  "specversion": "1.0",
  "type": "sh.keptn.event.get-sli.finished"
}
@arthurpitman
Copy link
Collaborator

@grabnerandi I guess the simplest fix would be to include the metric name in the indicator's message field which would at least propagate up to the message of the event, e.g. for this example it would become "Metrics query result for problems has no data, Metrics query result for request_error_rate has no data".

@arthurpitman
Copy link
Collaborator

Due to the effect this issue will have on the code paths and tests, I'm broadening the scope to include the following functionality:

  • Set the result field of the sh.keptn.event.get-sli.finished to warning if and only if the individual indicators have success set to false only for not returning a single value. This corresponds to extending the code to implement "case B" described in Improve the handling of `sh.keptn.event.get-sli.triggered` events where SLIs don't produce a value #704.
  • Harmonize the message fields of individual elements in indicatorValues to make them consistent for the dashboard and sli.yaml use cases.
  • Group the indicator messages to form a summarized message in the get-sli event data.

@arthurpitman arthurpitman changed the title Better error message in case get-sli failed due to a single problematic metric Improve handling of SLI queries that don't produce a single value Mar 4, 2022
@grabnerandi
Copy link
Contributor Author

lgtm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature request New feature request
Projects
None yet
2 participants