Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

td-agent memory usage gradually creeping upwards #1414

Closed
hjet opened this issue Jan 10, 2017 · 18 comments
Closed

td-agent memory usage gradually creeping upwards #1414

hjet opened this issue Jan 10, 2017 · 18 comments

Comments

@hjet
Copy link

hjet commented Jan 10, 2017

See also #1384 (seems similar):

- fluentd or td-agent version.
fluentd-0.14.10

- Environment information, e.g. OS.
Debian GNU/Linux 8.6 (jessie)
3.16.0-4-amd64

- Your configuration
(please excuse the possibly poor or nonstandard config – I am inheriting this from another developer and new to fluentd)

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 65000
</source>
<source>
  @type syslog
  port 65001
  tag system
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/stderr.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/*.*/runs/latest/stderr
  exclude_path ["/tmp/mesos/slaves/*/frameworks/*/executors/*job-scheduler*.*/runs/latest/stderr"]
  tag mesos.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/stdout.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/*.*/runs/latest/stdout
  exclude_path ["/tmp/mesos/slaves/*/frameworks/*/executors/*job-scheduler*.*/runs/latest/stdout"]
  tag mesos.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  path /tmp/mesos/slaves/*/frameworks/*/executors/*.apps/runs/*/logs/*.log
  pos_file /var/log/td-agent/tmp/apps.stdout.log.pos
  tag apps.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/job-scheduler.stderr.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/*job-scheduler*.*/runs/latest/stderr
  tag mesos.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/job-scheduler.stdout.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/*job-scheduler*.*/runs/latest/stdout
  tag mesos.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
   ----- OMITTED -------
  </grok>
  <grok>
   ----- OMITTED -------
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
# Job-Scheduler: Launchers  e.g. framework ct:1473451200000:0:bfe92d4217_launcher:
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/job-scheduler_launchers.stderr.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/ct:*/runs/latest/stderr
  tag launcher_scheduler.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<source>
  @type tail
  read_from_head true
  read_lines_limit 250
  refresh_interval 20
  pos_file /var/log/td-agent/tmp/job-scheduler_launchers.stdout.log.pos
  path /tmp/mesos/slaves/*/frameworks/*/executors/ct:*/runs/latest/stdout
  tag launcher_scheduler.*
  format multiline_grok
  multiline_start_regexp /^[^\s]/
  custom_pattern_path /etc/td-agent/custom_patterns
  <grok>
   ------ OMITTED -------
  </grok>
  <grok>
   ------ OMITTED -------
  </grok>
  <grok>
    pattern %{GREEDYDATA:logger_info}%{LEVEL}%{GREEDYDATA:log_message}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
</source>
<filter **>
  @type elasticsearch_timestamp_check
</filter>
<filter>
  @type record_transformer
  <record>
    hostname ${hostname}
  </record>
</filter>
<filter mesos.**>
  @type record_transformer
  <record>
    site-id ${tag_parts[8]}
  </record>
</filter>
<filter mesos.**>
  @type record_transformer
  <record>
    task ${tag_parts[9]}
  </record>
</filter>
<filter apps.**>
  @type record_transformer
  enable_ruby true
  <record>
    site-id ${tag_parts[8]}
  </record>
  <record>
    app ${tag_parts[13].split('_')[1]}
  </record>
</filter>
<match *.**>
  @type secure_forward
  self_hostname ${hostname}
  shared_key xxxxxxxxx
  ca_cert_path /etc/td-agent/fluentd-ssl/ca_cert.pem
  secure yes
  enable_strict_verification yes
  num_threads 2
  <server>
    host xxxxxxx
    port xxxxxxx
  </server>
</match>

- Your problem explanation. If you have an error logs, write it together.

We have td-agent running on several mesos agents, tailing various log files, etc. (from the conf you will observe that we use the * format in path) – it usually is tailing a large number of files that often get created and then subsequently deleted (but never rotated).

On some agents, td-agent memory usage mysteriously begins growing over the course of several days, to the point where it begins using a ridiculous amount of memory and needs to be killed. Sending a SIGTERM via service restart usually works but takes some time (~10 mins), and memory usage returns to normal upon restart. I can recreate the problem (and it is currently occurring), so let me know what further diagnostic information to provide and I will be happy to help. Also, once again, please forgive the poor configuration, as I said earlier I am inheriting this from someone else.

I am including logs and diagnostic information for both a low mem usage (normal operation) td-agent and a high mem usage (faulty operation) td-agent for ease of comparison:

low mem td-agent.log:
https://gist.github.com/hjet/329ff5abe38efbf5b68c55328a6925a1

high mem td-agent.log:
https://gist.github.com/hjet/cf39a32146edffbf61b651dff482e6e5

monitor_agent (for both):
https://gist.github.com/hjet/769e581918b69a3031657a7bfbf7dfb3

sigdumps (low mem usage):
https://gist.github.com/hjet/6b46311466d2487592db3f9feb1e0279

sigdumps (high mem usage):
https://gist.github.com/hjet/1fbead9a120fc5efb05fdc66cefee8c6

strace (low mem usage):
https://gist.github.com/hjet/b6c969196ea1a6488559c84aeb442175

strace (high mem usage):
https://gist.github.com/hjet/866c17ab9e8891dcb00c446f60ca3c28

perf (low mem usage):
low mem perf

perf (high mem usage):
bad mem perf

pid2line.rb (low mem usage):
screen shot 2017-01-10 at 5 28 22 pm

pid2line.rb (high mem usage):
screen shot 2017-01-10 at 5 25 59 pm

Please let me know what other information I can provide to help!

Also not sure if the error="no one nodes with valid ssl session" is part of the problem (flushing the buffer did not alleviate memory pressure) – so some insight there would be appreciated as well (I am planning on fixing this issue at the same time).

Thank you!

@repeatedly
Copy link
Member

repeatedly commented Jan 11, 2017

Also not sure if the error="no one nodes with valid ssl session" is part of the problem

It means out_secure_forward can't flush buffer to destination and it causes growing memory usage?
Or buffer length is low but memory usage is high?

And how about dentry cache?

@hjet
Copy link
Author

hjet commented Jan 11, 2017

It means out_secure_forward can't flush buffer to destination and it causes growing memory usage?

I have confirmed that force flushing the buffer (sending SIGUSR1) doesn't reduce memory usage.

Or buffer length is low but memory usage is high?

Can you clarify what you mean by this? This is precisely the problem. Memory usage increases gradually over the course of several days to the point of consuming most of the memory on the server.

And how about dentry cache?

How do I check this?

Thanks for your quick response!

@repeatedly
Copy link
Member

repeatedly commented Jan 11, 2017

I have confirmed that force flushing the buffer (sending SIGUSR1) doesn't reduce memory usage.

When flush the buffer?
If secure_forward has error="no one nodes with valid ssl session" errors, force flushing doesn't work because there is no valid destination.

Can you clarify what you mean by this?

Your secure_forward setting uses memory buffer. It means the secure_forward can't flush the buffer chunks, memory usage is growing unlike file buffer.

How do I check this?

Googling it is faster than my comment ;)
I forgot actual commands.
If application touches lots of files, dentry cache is also growing. I'm not sure this is the part of problem.

@hjet
Copy link
Author

hjet commented Jan 11, 2017

I have just force flushed the buffer:

screen shot 2017-01-10 at 7 39 47 pm

No error was thrown. Memory usage is at ~2.17 GB.

monitor_agent shows the following:

      {
            "buffer_queue_length": 0,
            "buffer_total_queued_size": 411256,
            "config": {
                "@type": "secure_forward",
                "ca_cert_path": "/etc/td-agent/fluentd-ssl/ca_cert.pem",
                "enable_strict_verification": "yes",
                "num_threads": "2",
                "secure": "yes",
                "self_hostname": "xxxxx",
                "shared_key": "xxxxxx"
            },
            "output_plugin": true,
            "plugin_category": "output",
            "plugin_id": "object:3ff5798316f8",
            "retry": {},
            "retry_count": 118,
            "type": "secure_forward"
        },

The "buffer_total_queued_size": 411256, confirms your theory I believe. I guess there's no error thrown because it's using the exponential backoff retry mechanism (and has failed many subsequent times)?

So that then prompts another question: what is usually the cause of error="no one nodes with valid ssl session" and how do I diagnose/fix this? Some of the logs clearly seem to be shipped correctly (lots of retry succeeded. chunk_id="....." messages) – but then sometimes it fails.

I guess this would probably be more of an issue for https://github.com/tagomoris/fluent-plugin-secure-forward

I just want to confirm that the root cause of the issue is secure_forward/ not being able to find a valid destination to dump the chunk, and nothing else.

Thanks again!

@repeatedly
Copy link
Member

So I guess that prompts another question, what is usually the cause of error="no one nodes with valid ssl session"? How do I fix this?

This is very hard question. Check secure-forward plugin issues is better.

@hjet
Copy link
Author

hjet commented Jan 11, 2017

Ok great, thanks again.

@hjet hjet closed this as completed Jan 11, 2017
@repeatedly
Copy link
Member

BTW, if you want to reduce the ruby's memory usage itself, set RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR envrionemnt variable to 0.9 may help.

@hjet
Copy link
Author

hjet commented Jan 11, 2017

Thanks!

@hjet hjet reopened this Feb 3, 2017
@hjet
Copy link
Author

hjet commented Feb 3, 2017

This is reoccurring, and I have disabled secure-forward (using elasticsearch output plugin with no SSL). Furthermore the logs no longer contain any error="no one nodes with valid ssl session messages, nor any other connection related messages. Everything seems normal, except for a [info]: flushing all buffer forcedly roughly ~9 hours after starting td-agent.

I am restarting with verbose logging and the issue is reproducible and will almost certainly reoccur, so please let me know how to proceed, and which logs and diagnostic info to provide. If possible, can we move this to a more private channel, or can I email you the full logs? Some may potentially contain sensitive information and I would like to provide you with a full set of logs instead of the excerpts I was able to post above.

Thanks again.

@hjet
Copy link
Author

hjet commented Feb 3, 2017

The new match section looks as follows:

<match *.**>
  @type elasticsearch
  host xxxxxx
  port 9200
  include_tag_key true
  tag_key @log_name
  logstash_format true
  buffer_type memory
  buffer_chunk_limit 64m
  buffer_queue_limit 175
  flush_interval 20
  disable_retry_limit false
  retry_limit 15
  retry_wait 2
  request_timeout 30
  reload_connections false
</match>

and from the logs:

<match *.**>
    @type elasticsearch
    host "xxxxxx"
    port 9200
    include_tag_key true
    tag_key "@log_name"
    logstash_format true
    buffer_type "memory"
    buffer_chunk_limit 64m
    buffer_queue_limit 175
    flush_interval 20
    disable_retry_limit false
    retry_limit 15
    retry_wait 2
    request_timeout 30
    reload_connections false
    <buffer tag>
      flush_mode interval
      retry_type exponential_backoff
      @type memory
      flush_interval 20
      retry_forever false
      retry_max_times 15
      chunk_limit_size 64m
      queue_length_limit 175
    </buffer>
    <inject>
      tag_key @log_name
    </inject>
  </match>

@hjet
Copy link
Author

hjet commented Feb 3, 2017

Also from the logs, the only interesting messages are as follows:

2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486011874000:0:fa06769a7b_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486013027000:0:e9fe5590b4_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486014299000:0:0fbd3568d2_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486014376000:0:89625e870d_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486018554000:0:b5a3cab503_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486021314000:0:80cb94ee7c_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486022453000:0:afb0546592_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486024200000:0:99cbcd2263_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486024822000:0:2232e104dd_launcher:/runs/latest/stdout
2017-02-02 21:21:28 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486027491000:0:58f22e2cea_launcher:/runs/latest/stdout

etc...

and


2017-02-02 21:21:29 +0000 [info]: listening syslog socket on 0.0.0.0:65001 with udp
2017-02-02 21:21:29 +0000 [info]: fluentd worker is now running
2017-02-02 21:21:29 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2017-02-02 21:21:43 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2017-02-02 21:21:46 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2017-02-02 21:21:46 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2017-02-02 21:21:49 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"xxxxx", :port=>9200, :scheme=>"http"}
2017-02-02 21:22:01 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2017-02-02 21:26:44 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-93820/executors/fa06769a7b.metadata/runs/latest/stderr; waiting 5 seconds
2017-02-02 21:26:44 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-93820/executors/fa06769a7b.metadata/runs/latest/stdout; waiting 5 seconds
2017-02-02 21:29:43 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486011874000:0:fa06769a7b_launcher:/runs/latest/stdout; waiting 5 seconds
2017-02-02 21:29:43 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/

and then a

2017-02-03 06:25:02 +0000 [info]: force flushing buffered events

(notice the time of the message)

and then more of the above sorts of messages (no errors or anything).

@hjet
Copy link
Author

hjet commented Feb 5, 2017

Memory usage still growing. Here is an error that came up in the logs:

2017-02-05 01:18:50 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/51d8d228-78f8-4201-af98-e275f9898cbb-4800/executors/ct:1486
204744000:0:e2f8698e84_launcher:/runs/latest/stdout; waiting 5 seconds
2017-02-05 01:18:50 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-98010/executors/3bd13b6d6
7.parquet-cds-customer/runs/latest/stderr
2017-02-05 01:18:50 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` met
hod.
2017-02-05 01:18:50 +0000 [info]: following tail of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-98010/executors/3bd13b6d6
7.analysis-campaigns/runs/latest/stderr
2017-02-05 01:18:50 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` met
hod.
2017-02-05 01:18:55 +0000 [error]: closed stream
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin/in_tail.rb:585:in `readpartial'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin/in_tail.rb:585:in `on_notify'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin/in_tail.rb:455:in `detach'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin/in_tail.rb:283:in `detach_watcher'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin/in_tail.rb:293:in `block in detach_watcher_after_rotate_wait'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin_helper/timer.rb:77:in `call'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin_helper/timer.rb:77:in `on_timer'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.5/lib/cool.io/loop.rb:88:in `run_once'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.5/lib/cool.io/loop.rb:88:in `run'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin_helper/event_loop.rb:77:in `block in start'
  2017-02-05 01:18:55 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.11/lib/fluent/plugin_helper/thread.rb:66:in `block in thread_create'
2017-02-05 01:18:55 +0000 [error]: closed stream
  2017-02-05 01:18:55 +0000 [error]: suppressed same stacktrace
2017-02-05 01:18:55 +0000 [info]: disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` met
hod.
2017-02-05 01:23:46 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-97178/executors/3ed3380e4c.events/runs/latest/stdout; waiting 5 seconds
2017-02-05 01:23:46 +0000 [info]: detected rotation of /tmp/mesos/slaves/27ffe398-bec5-46e4-b644-d2132dbdd54b-S31/frameworks/27ffe398-bec5-46e4-b644-d2132dbdd54b-97178/executors/3ed3380e4c.events/runs/latest/stderr; waiting 5 seconds

@hjet
Copy link
Author

hjet commented Feb 6, 2017

@tagomoris may be related to #1434?

@hjet
Copy link
Author

hjet commented Feb 6, 2017

Also a curl http://localhost:65000/api/plugins.json takes around 10-15s....
Memory usage currently at ~32% and 22gb.

@repeatedly
Copy link
Member

#1467 may fix this problem.
Could you try this patch?

@hjet
Copy link
Author

hjet commented Feb 24, 2017

I can and will try it soon. Sorry for the delay, have been very busy lately.

Does that same issue exist in 0.12?

@repeatedly
Copy link
Member

Does that same issue exist in 0.12?

I'm not sure but I didn't receive the report from users.

@repeatedly
Copy link
Member

Closed. if you have same problem with latest fluentd v0.14, please reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants