Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Add telemetry around the time it is taking for grabbing the telemetry stats #132233

Merged
merged 5 commits into from
May 16, 2022

Conversation

Bamieh
Copy link
Member

@Bamieh Bamieh commented May 16, 2022

Summary

Adds new telemetry around the execution duration grabbing usage:

  • Total execution duration to grab all collectors
  • Total execution duration to get the isReady state of each collector
  • Total execution duration to get the fetch objects from each collector
  • Breakdown per collector type with details on the execution duration for fetch and isReady

The overall durations show the overall health of the collection mechanism, while the breakdown objects help diagnose specific collectors and improve upon them.

Why is this in telemetry and not in CI?

Adding limits and checks in CI is a good idea for catching early issues. Collecting these metrics via telemetry will also help us identify bottlenecks against real-world use cases from Kibanas in the wild.

Changes

  • Add The following fields to the usage_collector_stats collector:
    • total_is_ready_duration
    • total_fetch_duration
    • total_duration
    • is_ready_duration_breakdown
    • fetch_duration_breakdown
  • Refactor the usage_collector_stats to a Collector with a proper schema, for a more ergonomic codebase and to include the schema automatically into the schema files.
  • Update the telemetry_check to grab the usage_collector_stats collector schema and verify it.
  • Add unit tests for the usage_collector_stats collector
  • README about the collector

What does the usage collector stats look like?

"usage_collector_stats": {
  "not_ready": {
    "count": 1,
    "names": [
      "cloud_provider"
    ]
  },
  "not_ready_timeout": {
    "count": 0,
    "names": []
  },
  "succeeded": {
    "count": 54,
    "names": [
      "task_manager",
      "ui_counters",
      "usage_counters",
      "kibana_stats",
      "kibana",
      ...
    ]
  },
  "failed": {
    "count": 0,
    "names": []
  },
  "total_is_ready_duration": 0.07500024700000003,
  "total_fetch_duration": 0.35939233100000006,
  "total_duration": 0.4343925780000001,
  "is_ready_duration_breakdown": {
    { "name": "task_manager", "duration": 0.001828041 },
    { "name": "ui_counters", "duration": 0.001790625 },
    { "name": "usage_counters", "duration": 0.001778125 },
    { "name": "kibana_stats", "duration": 0.001764709 },
    { "name": "kibana", "duration": 0.001748917 },
    ...
  },
  "fetch_duration_breakdown": {
    { "name": "task_manager", "duration": 0.011157708 },
    { "name": "ui_counters", "duration": 0.011002625 },
    { "name": "usage_counters", "duration": 0.009945833 },
    { "name": "kibana_stats", "duration": 0.009424458 },
    { "name": "kibana", "duration": 0.009406416 },
    ...
  }
}

Notes

Closes #119468

@Bamieh Bamieh added release_note:skip Skip the PR/issue when compiling release notes v8.3.0 labels May 16, 2022
@Bamieh Bamieh requested a review from afharo May 16, 2022 11:46
@Bamieh Bamieh requested review from a team as code owners May 16, 2022 11:46
Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT! Just one small nit.

src/plugins/telemetry/schema/oss_plugins.json Outdated Show resolved Hide resolved
@Bamieh Bamieh enabled auto-merge (squash) May 16, 2022 13:22
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #32 / alerting api integration spaces only Alerting bulkEdit should return mapped params after bulk edit

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Bamieh Bamieh merged commit 7e7f862 into elastic:main May 16, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 16, 2022
@Bamieh Bamieh deleted the telemetry/collection_stats branch May 16, 2022 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Telemetry] Add telemetry around the time it is taking for grabbing the telemetry stats
4 participants