Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Processor Multi-Threading not working as expected (Mutex wait) #8245

Closed
drbugfinder-work opened this issue Dec 4, 2023 · 11 comments
Closed

Comments

@drbugfinder-work
Copy link
Contributor

Bug Report

Describe the bug
We've observed an issue related to output multi-threading/worker option that seems to affect the performance of the processor in our case.
Observations:

  • Output multi-threading does not seem to work as expected.
  • Specifically, it appears that each thread is spending a significant amount of time in the mutexwait of the flb_processor_run instead of being effectively parallelized.

For instance (in our configuration):

  • With 2 output workers, each output thread waits in mutexwait for approximately 50% of the time.
  • With 10 output workers, this waiting time increases to around 90%.
  • With 100 output workers, it almost reaches 99% of the time, indicating a significant serial processing bottleneck.
  • Currently, there is no measurable benefit when using the multi-threading option of the outputs, except for very rare use cases where the log sink is notably slow or has a high response time.

Expected Behavior:
The output threads (including the processors) should ideally run in parallel, minimizing the waiting time in mutexwait and thus optimizing the overall performance.

To Reproduce
Simplified example configuration: https://gist.github.com/drbugfinder-work/456ef9715db25372a935d2d3a997e049
(You may have to adjust the number of dummy inputs / lua calculation to get similar results on your machine.)

Screenshots

  • 2 workers:
    Bildschirmfoto 2023-12-04 um 15 59 43

  • 10 workers:
    Bildschirmfoto 2023-12-01 um 11 22 03

  • 100 workers:
    Bildschirmfoto 2023-12-01 um 09 13 33

Your Environment

  • Version used: v2.2.0
@leonardo-albertovich
Copy link
Collaborator

Hi @drbugfinder-work, what you observed is the expected behavior when filters are used in the processor stack in the output stage.

We are working on improving the situation but there is no way around it at the moment.

@drbugfinder-work
Copy link
Contributor Author

@leonardo-albertovich thanks for clarification. I was wondering, because @patrick-stephens mentioned this as a solution how to parallelize filters (see: #8088 (comment) & #8088 (comment))

@patrick-stephens
Copy link
Contributor

Yeah I wasn't really thinking of adding multiple workers as well as processors though :)

@drbugfinder-work
Copy link
Contributor Author

@patrick-stephens
I've added a chunk mode to the lua filter, so with the help of lua lanes the lua script can be parallelized on a whole chunk
#8478

@patrick-stephens
Copy link
Contributor

Sounds interesting @drbugfinder-work , @tarruda and @agup006 may be interested in that PR.

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label May 13, 2024
@drbugfinder-work
Copy link
Contributor Author

Still open

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Aug 16, 2024
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@lecaros
Copy link
Contributor

lecaros commented Sep 25, 2024

@drbugfinder-work what would be the expectation here? To correct the current expected behavior?

@drbugfinder-work
Copy link
Contributor Author

@lecaros This issue is several months old. I haven't tested again with output processors in this context - and with input processors it is working as expected. So I would keep this issue closed for now.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants