New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telegraf logparser too slow #3539

Open
kingluo opened this Issue Dec 5, 2017 · 12 comments

Comments

Projects
None yet
5 participants
@kingluo

kingluo commented Dec 5, 2017

Bug report

I use telegraf to collect nginx logs to kafka. But it's too slow, the kafka hosts are in the same data center with the telegraf host, but it has one to two hours delay according to the log timestamp!

Relevant telegraf.conf:

[agent]
  interval=3
  flush_interval=3

[[inputs.logparser]]
  name_override = "foobar"
  files = ["/var/logs/foobar.service.foobar.com-access_log"]
  from_beginning = true
  [inputs.logparser.grok]
    patterns = ["%{IP:remote_addr}\t%{USER:remote_user}\t\\[%{HTTPDATE:time_local:ts-httpd}\\]\t\"%{WORD:method} %{NOTSPACE:url}(?: HTTP/%{NUMBER:httpversion})\"\t%{NUMBER:resp_code}\t(?:%{NUMBER:resp_bytes:int}|-)\t%{QS:referrer}\t%{QS:agent}\t(?:%{IP:client_ip}|-)\t%{HOSTNAME:http_host:tag}\t%{NUMBER:request_time}\t(?:%{IP:upstream_addr}|-)\t(?:%{NUMBER:upstream_status}|-)\t(?:%{NUMBER:upstream_response_time}|-)\t(?:%{NUMBER:upstream_response_length}|-)\t(?:%{QS:mid}|-)"]

[[outputs.kafka]]
  brokers = ["kafkahost1:9092", "kafkahost2:9092", "kafkahost3:9092"]
  topic = "test"
  routing_tag = "host"
  required_acks = -1
  max_retry = 3
  ## format
  data_format = "json"
  fieldpass = [ "client_ip", "mid", "url" ]
  taginclude = [ "host" ]
  [outputs.kafka.tagpass]
    http_host = [ "foobar.service.foobar.com" ]

System info:

telegraf 1.4.5, centos 6.5

@danielnelson

This comment has been minimized.

Contributor

danielnelson commented Dec 5, 2017

Can you try setting the interval and flush_interval to "10s"?

@kingluo

This comment has been minimized.

kingluo commented Dec 6, 2017

I change the interval and flush_interval to "1s" (10s is slower).

And I found I set a wrong option:

from_beginning = true

It should be false.

But the collection is still slow, at least 3 mins.

@danielnelson

This comment has been minimized.

Contributor

danielnelson commented Dec 6, 2017

Please try with 10s, this will introduce more latency but may help with throughput. 3 mins is not too long, does it fall further behind the longer it runs?

@kingluo

This comment has been minimized.

kingluo commented Dec 6, 2017

Please try with 10s, this will introduce more latency but may help with throughput. 3 mins is not too long,

I tried, but it's slower!

does it fall further behind the longer it runs?

yes.

@danielnelson

This comment has been minimized.

Contributor

danielnelson commented Dec 6, 2017

Can you try replacing the kafka output with a file output and check if logparser still falls behind?

@kingluo

This comment has been minimized.

kingluo commented Dec 7, 2017

Of course, I tried file and influx format, but it's still slow.

@danielnelson

This comment has been minimized.

Contributor

danielnelson commented Dec 7, 2017

How many logfile lines are generated per minute?

@kingluo

This comment has been minimized.

kingluo commented Dec 8, 2017

About 2000 lines per sec.

FYI, I try filebeat, and no delay there, although as known, filebeat is only a log shipper without grok scan.
But (near-)real-time is important for log collection. I could do grok or regexp scan at the backend.

Does anyone do some benchmark on telegraf with similar config to my case?

@agnivade

This comment has been minimized.

Contributor

agnivade commented Dec 19, 2017

@kingluo - It looks like you are using a custom pattern. Can you post a few examples of your log line ? I would like to give this a shot.

@jberry-suse

This comment has been minimized.

jberry-suse commented Jun 6, 2018

Can confirm, only able to parse at max 1000 entries per second. Easy to reproduce just create/copy apache access log with 100,000 entries and watch how slow it is parsed. With even a single threaded from scratch regex parsing I can achieve at least a million entries per second.

No need for custom log format, use standard format. Based on CPU usage one core is utilized and does not seem to be running at 100% almost as if telelgraf is throttling itself.

@jberry-suse

This comment has been minimized.

jberry-suse commented Jun 20, 2018

I ended up writing a simple tool which achieves ~500,000/s per core.

@russorat

This comment has been minimized.

russorat commented Jul 16, 2018

  • check this against the tail plugin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment