Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] [Monitoring] Only retry fetching usage once monitoring bulk upload is successful #54309

Merged
merged 3 commits into from
Jan 9, 2020

Conversation

Bamieh
Copy link
Member

@Bamieh Bamieh commented Jan 8, 2020

The bulk uploader in monitoring attempts to bulk insert data into Elasticsearch every 10 seconds (defined by the flag xpack.monitoring.kibana.collection.interval).

To avoid performance issues, we have throttled fetching telemetry usage data to once every 24 hours in the bulk uploader when monitoring is enabled.

The current behavior is to keep fetching and trying to insert usage data until ES succeeds. Once it succeeds we start fetching usage every 24 hours.

When monitoring is not enabled, the bulk uploader will keep on retring since ES returns ignored: true (the index does not exist) rendering the operation as unsuccessful, hence fetching usage again.

This is happening on all 7.x and master. It was discovered when running a backport against 7.5 branch. (#54055)

To improve performance when monitoring is not enabled we can start fetching usage data once the bulk uploader gets a success on the bulk insert from ES.

The tiny downside to this approach is that we will not be getting usage data on the first successful insert after enabling monitoring. We will be getting this data on the second tick (in less that 20 seconds).

CC @aaronjcaldwell

Closes #54294

@Bamieh Bamieh added Feature:Telemetry v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.6.0 Team:Pulse labels Jan 8, 2020
@Bamieh Bamieh requested review from kindsun and a team January 8, 2020 22:19
@elasticmachine
Copy link
Contributor

Pinging @elastic/pulse (Team:Pulse)

@Bamieh Bamieh added the bug Fixes for quality problems that affect the customer experience label Jan 8, 2020
…_uploader.js

Co-Authored-By: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine for what it's doing.
Offline, @Bamieh and I discussed the fix and agreed it's fine as a work-around for not easily being able to detect programmatically if monitoring in Kibana is enabled or not. Optimal solutions will be handled elsewhere.

I ran the code locally with the --verbose flag both before and after enabling Monitoring, and verified that it works.
(before enabling monitoring, we get the debug log Resetting lastFetchWithUsage because uploading to the cluster was not successful.. After enabling monitoring, we get Uploaded bulk stats payload to the local cluster
LGTM.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@kindsun kindsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this fix locally merged into #54055 and appears to work as expected. Telemetry fetches once initially and then as needed. lgtm!

@Bamieh Bamieh merged commit a27c4c4 into elastic:master Jan 9, 2020
@Bamieh Bamieh deleted the telemetry/monitoring_bulk_upload_fix branch January 9, 2020 00:56
Bamieh added a commit to Bamieh/kibana that referenced this pull request Jan 9, 2020
…tic#54309)

* fix interval and add tests

* Update x-pack/legacy/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js

Co-Authored-By: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jan 9, 2020
* master: (23 commits)
  [Vis: Default editor] Reactify the timelion editor (elastic#52990)
  [Discover] fix histogram min interval (elastic#53979)
  [Telemetry] [Monitoring] Only retry fetching usage once monito… (elastic#54309)
  [docs][APM] Add runtime index config documentation (elastic#53907)
  [SIEM] Detection engine timeline (elastic#53783)
  Filter scripted fields preview field list to source fields (elastic#53826)
  Management - New platform api (elastic#52579)
  Reset region and Account when switching inventory (elastic#54287)
  [SIEM] [Case] Case workflow api schema (elastic#51535)
  Code coverage setup on CI (elastic#49003)
  [ML] DF Analytics Results: adds link to docs (elastic#54189)
  Update schemas boolean, byteSize, and duration to coerce strings (elastic#54177)
  [Metrics UI] Pass relevant shouldAllowEdit capabilities into SettingsPage (elastic#49781)
  [Canvas] Fixes bugs with autoplay and refresh (elastic#53149)
  [ML] DF Analytics Classification: ensure confusion matrix can be fetched (elastic#53629)
  Fix Vega react eslint errors (elastic#54259)
  Remove non existing codeowners (elastic#54274)
  use correct type (elastic#54244)
  [Dashboard] Removing 100% as dshDashboardViewport height (elastic#54263)
  add `examples/` to no-restricted-path config (elastic#54252)
  ...
@timroes timroes added v7.5.2 and removed 7.5.2 labels Jan 10, 2020
@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Oct 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.5.2 v7.6.0 v8.0.0
Projects
No open projects
Telemetry 7.5
Awaiting triage
Development

Successfully merging this pull request may close these issues.

[Telemetry] [Monitoring] Only retry fetching usage once monitoring bulk upload is successful
7 participants