Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output: make it go zoom #19

Merged
merged 2 commits into from Nov 16, 2023

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Nov 15, 2023

By declaring concurrency :shared, a single shared instance can be used by pipeline workers concurrently instead of the legacy sharing we get by default that shares the instance sequentially.

(labeled as a bug, since this plugin was designed with concurrent use in mind, we simply neglected to declare it)

What does this PR do?

Removes a bottleneck caused by Logstash selecting the "legacy" strategy, which prevented the single shared instance of this plugin from being used concurrently.

Why is it important/What is the impact to the user?

Make it go ZOOM! (improves throughput by allowing use of available resources)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  • A pair of pipelines, upstream and downstream:
    • A downstream pipeline with a logstash input that receives events and discards them
    • An upstream pipeline with a java_generator input that floods the queue as quickly as possible and a logstash output that transmits the events to the downstream
    • Note: for simplicity's sake, we add ssl_enabled => false to both to obviate the need to set up certificates.
# pipelines.yml
- pipeline.id: upstream
  pipeline.workers: 16
  config.string: |-
    input { java_generator {} }
    output { logstash { hosts => ["localhost:9803"] ssl_enabled => false } }
- pipeline.id: downstream
  config.string: |-
    input { logstash { port => 9803 ssl_enabled => false } }
    output { sink {} }'
  • Observe that without this patch, changing the number of workers on the upstream has little/no net effect on throughput (because the single shared logstash output is sequentially accessed by the workers)
  • Observe that with this patch, throughput is higher when there is more than one worker in the pipeline

By declaring `concurrency :shared`, a single shared instance can be used by
pipeline workers _concurrently_ instead of the legacy sharing we get by default
that shares the instance _sequentially_.
@yaauie yaauie added the bug Something isn't working label Nov 15, 2023
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaauie yaauie merged commit 466a299 into logstash-plugins:main Nov 16, 2023
2 checks passed
@yaauie yaauie deleted the output-concurrency-shared branch November 16, 2023 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants