Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes output.kafka bulks sizes #12254

Merged
merged 5 commits into from
Jun 10, 2019

Conversation

marqc
Copy link
Contributor

@marqc marqc commented May 23, 2019

Currently property "bulk_max_size" is not used.

output.kafka does not set any sarama Producer.Flush config options, which results in sending bulks as fast as possible. Example bulks sizes I observed when starting filebeat against file with about 9599 log entries looks like belowe:

count:1 count:1506 count:1712 count:877 count:171 count:1034 count:1709 count:1182 count:2 count:70 count:206 count:984 count:145

This change sets sarama's Producer.Flush.Messages to configured property bulk_max_size. To avoid waiting forever, when no new entries are collected new property bulk_flush_frequency of type Duration is introduced.

When this options are set bulks observed on kafka server looks like below:

count:1711 count:1711 count:1709 count:1709 count:1709 count:1050

Default value for bulk_flush_frequency is 0 which keeps current "bulking" behavior.

@marqc marqc requested review from a team as code owners May 23, 2019 14:13
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@ph
Copy link
Contributor

ph commented May 23, 2019

@jsoriano Can you take a look?

@marqc marqc force-pushed the fix_output.kafka.bulk_max_size branch from 49ccc89 to f5c6288 Compare May 24, 2019 06:51
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marqc Thanks for this contribution, this looks quite good to me, I only wonder if we should set the hard-limit instead of the "best effort" limit to be consequent with the current documentation.

Could you please also add a changelog entry for this?

libbeat/_meta/config.reference.yml Outdated Show resolved Hide resolved
libbeat/outputs/kafka/config.go Outdated Show resolved Hide resolved
@marqc
Copy link
Contributor Author

marqc commented May 30, 2019

@jsoriano changelog entry added

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@jsoriano
Copy link
Member

jenkins, test this please

@jsoriano
Copy link
Member

jenkins, test this again please

@marqc
Copy link
Contributor Author

marqc commented Jun 10, 2019

@jsoriano how about merging this? all is green

@jsoriano jsoriano merged commit ff6c007 into elastic:master Jun 10, 2019
@marqc
Copy link
Contributor Author

marqc commented Jun 28, 2019

@jsoriano will this be backported to 7.x?

@jsoriano
Copy link
Member

@marqc yes, this will be released in principle in 7.3.0.

DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
Set sarama `Producer.Flush` params based on `bulk_max_size` and
`bulk_max_frequency` config options to control the bulk size when
Kafka output is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants