Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Add error grouping key filter in error count rule type #155410

Merged
merged 7 commits into from
Apr 21, 2023

Conversation

kpatticha
Copy link
Contributor

@kpatticha kpatticha commented Apr 20, 2023

part of #152329

Introduces the error grouping key filter in the error count rule type

Screen.Recording.2023-04-20.at.15.58.47.mov

@kpatticha kpatticha requested a review from a team as a code owner April 20, 2023 14:03
@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label Apr 20, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:APM)

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 3.4MB 3.4MB +2.9KB
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 394 397 +3

Total ESLint disabled count

id before after diff
securitySolution 474 477 +3

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@sorenlouv
Copy link
Member

/oblt-deploy

@sorenlouv
Copy link
Member

sorenlouv commented Apr 21, 2023

@kpatticha With your synthtrace data the error grouping key is a somewhat readable string.

image

However, it will always be a completely unreadable hash. I've updated the synthtrace scenarios in #151280 so you can see what it will really look like if you pull in main.

My point is: I'm not sure if we should expect users to select an error group from the hash alone. We may want to display the short hash together with the error message like:

02fc6: ValueError: Cannot use None as a query value.

We already do that on errors table

image

If this is difficult with the existing suggestion component I'm good with postponing this for later.

Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Just a comment about how we display the error group

@kpatticha
Copy link
Contributor Author

kpatticha commented Apr 21, 2023

My point is: I'm not sure if we should expect users to select an error group from the hash alone. We may want to display the short hash together with the error message like:

oh, I see! Listing the hash alone is not so user-friendly but the error message it's not guaranteed it's the same for all error events.

For example, error.grouping_key: d6815ce0dd6439c5b9dc1f5afe193be2 can have error messages like

  1. 13 INTERNAL: failed to charge card: could not charge the card: rpc error: code = Unknown desc = Credit card info is invalid
  2. 13 INTERNAL: timeout processing order
  3. 14 UNAVAILABLE: failed to connect to all addresses
  4. ....
Fetching error messages for the error.grouping_key: d6815ce0dd6439c5b9dc1f5afe193be2
GET *apm*/_search
{
"track_total_hits": false,
"size": 0,
"query": {
  "bool": {
    "filter": [
      {
        "exists": {
          "field": "error.grouping_key"
        }
      },
      {
        "term": {
          "error.grouping_key": "d6815ce0dd6439c5b9dc1f5afe193be2"
        }
      }
    ]
  }
},
"aggs": {
  "grouping_keys": {
    "terms": {
      "field": "error.grouping_key",
      "size": 10
    }
  },
  "names": {
    "terms": {
      "field": "error.grouping_name",
      "size": 10
    }
  }
}
}
Results
{
"took": 1102,
"timed_out": false,
"_shards": {
  "total": 16,
  "successful": 16,
  "skipped": 0,
  "failed": 0
},
"hits": {
  "max_score": null,
  "hits": []
},
"aggregations": {
  "names": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": "14 UNAVAILABLE: failed to connect to all addresses",
        "doc_count": 1151
      },
      {
        "key": "13 INTERNAL: timeout processing order",
        "doc_count": 51
      },
      {
        "key": "13 INTERNAL: failed to charge card: could not charge the card: rpc error: code = Unknown desc = Credit card info is invalid",
        "doc_count": 8
      },
      {
        "key": "13 INTERNAL: failed to charge card: could not charge the card: rpc error: code = Code(401) desc = Your credit card (ending 0004) expired on 1/19",
        "doc_count": 4
      },
      {
        "key": "13 INTERNAL: failed to charge card: could not charge the card: rpc error: code = Code(401) desc = Your credit card (ending 0454) expired on 1/19",
        "doc_count": 3
      }
    ]
  },
  "grouping_keys": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": "d6815ce0dd6439c5b9dc1f5afe193be2",
        "doc_count": 1217
      }
    ]
  }
}
}

in such cases, which one should we display?

In the error details pages we seem to display theerror.grouping_name with the most occurancies
Screenshot 2023-04-21 at 11 49 17

@sorenlouv
Copy link
Member

sorenlouv commented Apr 21, 2023

For example, error.grouping_key: d6815ce0dd6439c5b9dc1f5afe193be2 can have error messages like

Good point. I wonder if people want the ability to filter on error.grouping_name in addition to error.grouping_key. Either way, for now, let's just keep it as-is. We can always improve later.

For grouping I think we should have error.grouping_key, error.grouping_name and transaction.name as options (cc @benakansara )

@kpatticha kpatticha merged commit d401b44 into elastic:main Apr 21, 2023
7 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Apr 21, 2023
ApmRuleType.ErrorCount,
serviceName,
environment,
ruleParams.errorGroupingKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpatticha do we need ruleParams.errorGroupingKey to be added in alert ID? It will always be same value for a given rule as it is coming from the filter. Is it ok if we keep this implementation?

Copy link
Contributor Author

@kpatticha kpatticha Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok if we keep this implementation?

@benakansara Yes, let's keep what you've implemented regarding the IDs.

do we need ruleParams.errorGroupingKey to be added in alert ID?
No, we don't since we're planning to add the error.grouping_key as a grouping field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:enhancement Team:APM All issues that need APM UI Team support v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants