Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Application Usage] Use Promise.allSettled during rollups #87675

Merged

Conversation

afharo
Copy link
Member

@afharo afharo commented Jan 7, 2021

Summary

Now that we are running Node v14, we can use Promise.allSettled instead of Promise.all during rollups

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@afharo afharo added Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes v7.12.0 labels Jan 7, 2021
@afharo afharo added this to Pending Review in kibana-core [DEPRECATED] via automation Jan 7, 2021
@afharo afharo requested a review from a team as a code owner January 7, 2021 16:43
@afharo afharo requested a review from rudolf January 7, 2021 16:46
@afharo afharo added the v7.11.1 label Jan 7, 2021
await Promise.all(
await Promise.allSettled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this will no longer be throwing, should we retrieve the result of allSettled to log the potential failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've pushed an update to throw if we find that any promise was rejected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this happens very frequently when there's multiple nodes, and users should always just ignore this message when it happens we decided to lower the log level to debug.

@afharo
Copy link
Member Author

afharo commented Jan 11, 2021

@elasticmachine merge upstream

@afharo
Copy link
Member Author

afharo commented Jan 11, 2021

@elasticmachine merge upstream

Copy link
Contributor

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem with this algorithm: because we load the transactions before the dailyDoc, there's a race condition where some transactions will be counted twice. Anytime we are unable to delete a document it means that document was counted twice.

The simplest way around this would be to first delete the transactions and then only update the dailyDoc with the values of the transactions that were successfully deleted. We would then need to use incrementCounter to prevent version conflicts (or retry when a version conflict occurs until it succeeds). The downside of this algorithm is that we could loose data when a node is restarted after deleting transactions but before updating the daily doc. This is less likely than counting documents twice so maybe this is a better tradeoff for application usage?

The other option is to do something like:

  1. Load the dailyDoc
  2. Load all transactions for that day
  3. Update the dailyDoc
  4. Delete all transactions for that day
  5. process the next day

If we only rollup daily's for completed days (i.e. for days < today) then we can efficiently see which days have already been rolled up.

If we have saved object aggregations we could do it even faster but don't think we have a timeline for that #64002

@afharo
Copy link
Member Author

afharo commented Jan 11, 2021

@rudolf I agree it's a very valid concern. And it'd be ideal if we could run aggregations and _delete_by_query requests in SOs to completely fix this issue.

However, I think we can patiently wait for those features to come because I believe the potential harm is very minor: the transactional documents are sent and stored from the browser every 3 minutes if the tab is on-screen (otherwise the browser itself sleeps the intervals). Failing to delete a document only affects the actions that could have happened during those 3 minutes. Not ideal, but in the overall stats, I think it has a minor effect.

Also, I would expect that the issues might occur during the upgrade process from pre-7.9.0 only (that's when we have plenty of documents to rollup and delete). Once there, we attempt to rollup every 30 minutes. The number of documents accumulated in that period should be manageable.

What do you think?

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@afharo
Copy link
Member Author

afharo commented Jan 11, 2021

As discussed on Slack with @rudolf, we'll tackle an improvement to overcome the concurrency issues in #87840. I'll still merge this one to minimize the duplicate effect in rollups.

@afharo afharo merged commit dd85399 into elastic:master Jan 11, 2021
kibana-core [DEPRECATED] automation moved this from Pending Review to Done (7.11) Jan 11, 2021
@afharo afharo deleted the application_usage/rollups/promise.allSettled branch January 11, 2021 14:27
afharo added a commit to afharo/kibana that referenced this pull request Jan 11, 2021
…87675)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
afharo added a commit that referenced this pull request Jan 11, 2021
…87675) (#87845)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
afharo added a commit that referenced this pull request Jan 11, 2021
…7675) (#87844)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes v7.11.1 v7.12.0
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants