Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving CSV generation by supporting concurrent background tasks #181064

Open
mikecote opened this issue Apr 17, 2024 · 5 comments
Open

Improving CSV generation by supporting concurrent background tasks #181064

mikecote opened this issue Apr 17, 2024 · 5 comments
Assignees
Labels
Feature:Reporting Reporting (PDF, CSV, ..) feature Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)

Comments

@mikecote
Copy link
Contributor

mikecote commented Apr 17, 2024

The report:execute task today has a concurrency per Kibana node set to 1. What Task Manager does when a task type is configured like this is it will prevent more than one reporting task from running on the same node at any given time.

The following exposes some limitations that we have in serverless:

  • Autoscaling will not trigger when there are many CSV generation tasks in the queue given the Kibana node may have another 9 workers available.
  • Having many CSV generation tasks in the queue will cause our users to wait a long time to get their files generated
  • We will get alerts if ever the queue for CSV generation is backed-up however, we would need to manually intervene to fix the CSV queue and it would be better to have the autoscaler manage this situation.

What I propose is running the CSV generation tasks under a new task type report:execute-csv that doesn't have maxConcurrency set within its task definition and keep the report:execute multi-purpose in case there are still CSV tasks in the queue. This will allow 10x throughput per Kibana node for generating CSVs and will benefit serverless, ESS and on-prem users. One thing to keep an eye out for is with 10x concurrency, we also put 10x the memory / CPU pressure and I am not familiar with the internals of how much resource utilization each task needs.

@mikecote mikecote added Feature:Reporting Reporting (PDF, CSV, ..) feature Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) labels Apr 17, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

@kobelb
Copy link
Contributor

kobelb commented Apr 17, 2024

Assuming my understanding of #108485 is accurate, and still relevant, running 10 CSV exports concurrently has the chance of causing Kibana to crash due to an OOM. @elastic/appex-sharedux can you all confirm that each CSV export task could use approximately 100MB of memory?

@tsullivan
Copy link
Member

@kobelb You understanding seems accurate of the current configuration of how we chunk the reports. Currently, @vadimkibana has created an issue to hardcode the chunk size to 4MB: #180829 and that will stop reports from causing OOM in 1GB instances.

@kobelb
Copy link
Contributor

kobelb commented Apr 22, 2024

Thanks, @tsullivan. If we use 4 MB chunks, then I don't have concerns about doing 10 concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Reporting Reporting (PDF, CSV, ..) feature Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)
Projects
None yet
Development

No branches or pull requests

5 participants