New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors are leaked when using HTTP #1604

Closed
ryanbrink opened this Issue Aug 6, 2014 · 26 comments

Comments

Projects
None yet
@ryanbrink

ryanbrink commented Aug 6, 2014

Several of Logstash's input and output modules use HTTP via the FTW gem (https://github.com/jordansissel/ruby-ftw). From my experience with Logstash, this gem is leaking file descriptors and eventually causing Logstash to essentially die by repeatedly trying and failing to open a file descriptor for some task. For example, here is a trace I got after running Logstash for a couple of days in my setup:

{:timestamp=>"2014-08-06T11:13:07.128000+0000", 
:message=>"Failed to flush outgoing items", 
:outgoing_count=>1, 
:exception=>#<SocketError: initialize: java.net.SocketException: Too many open files>, 
:backtrace=>[
    "org/jruby/ext/socket/RubySocket.java:190:in `initialize'", 
    "org/jruby/RubyIO.java:852:in `new'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:146:in `connect'", 
    "org/jruby/RubyArray.java:1613:in `each'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", 
    "org/jruby/RubyProc.java:271:in `call'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48:in `fetch'", "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `execute'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217:in `post!'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/outputs/elasticsearch_http.rb:215:in `post'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/outputs/elasticsearch_http.rb:210:in `flush'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", 
    "org/jruby/RubyHash.java:1339:in `each'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:159:in `buffer_receive'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/outputs/elasticsearch_http.rb:188:in `receive'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/outputs/base.rb:86:in `handle'", 
    "(eval):335:in `initialize'", 
    "org/jruby/RubyProc.java:271:in `call'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/pipeline.rb:266:in `output'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/pipeline.rb:225:in `outputworker'", 
    "/usr/local/liveview/shipping/logstash-1.4.2/lib/logstash/pipeline.rb:152:in `start_outputs'"
    ], 
:level=>:warn}

This will repeat until I restart the process, after which Logstash will run ok until it again eats up all available file descriptors on the system.

You can check the file descriptors in use by running sudo lsof | grep java | wc -l. On a busy system (like mine) Logstash is leaking about 10 file descriptors a minute. The number goes up and down as some are released back into the wild, but grows consistently over time.

This issue seems similar to https://logstash.jira.com/browse/LOGSTASH-892.

tl;dr: FTW is leaking file descriptors, causing any and all HTTP-based plugins to eventually stall Logstash

Note: this error can also manifest as a DNS lookup failure (#<SocketError: gethostbyname: name or service not known>,), but I believe this is also fundamentally due to a lack of available file descriptors.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Aug 19, 2014

Contributor

I've never reproduced the file descriptor leak, but enough folks have reported this in the past that I am confident there is a bug.

Contributor

jordansissel commented Aug 19, 2014

I've never reproduced the file descriptor leak, but enough folks have reported this in the past that I am confident there is a bug.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Aug 19, 2014

Contributor

I also agree the bug is in ftw

Contributor

jordansissel commented Aug 19, 2014

I also agree the bug is in ftw

@jordansissel jordansissel added confirmed and removed unconfirmed labels Aug 19, 2014

@tonybruess

This comment has been minimized.

Show comment
Hide comment
@tonybruess

tonybruess Aug 28, 2014

Confirmed, just happened on some of our production servers. Relevant output: http://pastebin.com/raw.php?i=0T7n73ZU

tonybruess commented Aug 28, 2014

Confirmed, just happened on some of our production servers. Relevant output: http://pastebin.com/raw.php?i=0T7n73ZU

@bpesics

This comment has been minimized.

Show comment
Hide comment
@MLnick

This comment has been minimized.

Show comment
Hide comment
@MLnick

MLnick Oct 16, 2014

I can confirm I'm experiencing this issue too: https://gist.github.com/MLnick/128d2d16c14f76b2e068

MLnick commented Oct 16, 2014

I can confirm I'm experiencing this issue too: https://gist.github.com/MLnick/128d2d16c14f76b2e068

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Oct 16, 2014

Contributor

@jsvd's been working on this. Assigning his way.

Contributor

jordansissel commented Oct 16, 2014

@jsvd's been working on this. Assigning his way.

@MLnick

This comment has been minimized.

Show comment
Hide comment
@MLnick

MLnick Oct 18, 2014

Thanks - does this impact all versions? Currently I cannot even run logstash for more than about an hour. If in the meantime I can use an earlier version, that would be great (using latest 1.4.2 currently).

MLnick commented Oct 18, 2014

Thanks - does this impact all versions? Currently I cannot even run logstash for more than about an hour. If in the meantime I can use an earlier version, that would be great (using latest 1.4.2 currently).

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 20, 2014

Member

@MLnick There has been a fix applied to FTW that is included in the next version, would you mind applying it to your deployed environment?

Essentially edit:

  • /home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb
  • /home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb

And make this 2 small changes:
jordansissel/ruby-ftw@b065c3c

Since you're able to "reproduce" the problem in about an hour, it would provide great feedback.

Member

jsvd commented Oct 20, 2014

@MLnick There has been a fix applied to FTW that is included in the next version, would you mind applying it to your deployed environment?

Essentially edit:

  • /home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb
  • /home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb

And make this 2 small changes:
jordansissel/ruby-ftw@b065c3c

Since you're able to "reproduce" the problem in about an hour, it would provide great feedback.

@MLnick

This comment has been minimized.

Show comment
Hide comment
@MLnick

MLnick Oct 20, 2014

Cool - I've made those changes and restarted logstash - will report back
during tomorrow with result.

Thanks!

On Mon, Oct 20, 2014 at 5:44 PM, João Duarte notifications@github.com
wrote:

@MLnick https://github.com/MLnick There has been a fix applied to FTW
that is included in the next version, would you mind applying it to your
deployed environment?

Essentially edit:

/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb

/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb

And make this 2 small changes:
jordansissel/ruby-ftw@b065c3c
jordansissel/ruby-ftw@b065c3c

Since you're able to "reproduce" the problem in about an hour, it would
provide great feedback.


Reply to this email directly or view it on GitHub
#1604 (comment)
.

MLnick commented Oct 20, 2014

Cool - I've made those changes and restarted logstash - will report back
during tomorrow with result.

Thanks!

On Mon, Oct 20, 2014 at 5:44 PM, João Duarte notifications@github.com
wrote:

@MLnick https://github.com/MLnick There has been a fix applied to FTW
that is included in the next version, would you mind applying it to your
deployed environment?

Essentially edit:

/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb

/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb

And make this 2 small changes:
jordansissel/ruby-ftw@b065c3c
jordansissel/ruby-ftw@b065c3c

Since you're able to "reproduce" the problem in about an hour, it would
provide great feedback.


Reply to this email directly or view it on GitHub
#1604 (comment)
.

@MLnick

This comment has been minimized.

Show comment
Hide comment
@MLnick

MLnick Oct 21, 2014

This does seem to have solved the problem.

I am seeing a few of these however (not many) - not sure if this is related:

Failed to flush outgoing items {:outgoing_count=>1, :exception=>#<Errno::EBADF: Bad file descriptor - Bad file descriptor>, :backtrace=>["org/jruby/RubyIO.java:2097:in `close'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:173:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:168:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:156:in `connect'", "org/jruby/RubyArray.java:1613:in `each'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", "org/jruby/RubyProc.java:271:in `call'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:49:in `fetch'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `execute'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217:in `post!'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch/protocol.rb:106:in `bulk_ftw'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch.rb:315:in `flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1339:in `each'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:159:in `buffer_receive'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch.rb:311:in `receive'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/base.rb:86:in `handle'", "(eval):109:in `initialize'", "org/jruby/RubyProc.java:271:in `call'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:266:in `output'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:225:in `outputworker'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:152:in `start_outputs'"], :level=>:warn}

MLnick commented Oct 21, 2014

This does seem to have solved the problem.

I am seeing a few of these however (not many) - not sure if this is related:

Failed to flush outgoing items {:outgoing_count=>1, :exception=>#<Errno::EBADF: Bad file descriptor - Bad file descriptor>, :backtrace=>["org/jruby/RubyIO.java:2097:in `close'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:173:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:168:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:156:in `connect'", "org/jruby/RubyArray.java:1613:in `each'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", "org/jruby/RubyProc.java:271:in `call'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:49:in `fetch'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `execute'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217:in `post!'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch/protocol.rb:106:in `bulk_ftw'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch.rb:315:in `flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1339:in `each'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:159:in `buffer_receive'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/elasticsearch.rb:311:in `receive'", "/home/ec2-user/logstash-1.4.2/lib/logstash/outputs/base.rb:86:in `handle'", "(eval):109:in `initialize'", "org/jruby/RubyProc.java:271:in `call'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:266:in `output'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:225:in `outputworker'", "/home/ec2-user/logstash-1.4.2/lib/logstash/pipeline.rb:152:in `start_outputs'"], :level=>:warn}
@wspi

This comment has been minimized.

Show comment
Hide comment
@wspi

wspi Oct 27, 2014

I believe this happens with the elasticsearch_http output as well, after changing to this output, I usually get too many open files and on lsof I get a lot of this:

java 22147 root 1609u sock 0,7 0t0 2066701787 can't identify protocol
java 22147 root 1610u sock 0,7 0t0 2066702739 can't identify protocol
java 22147 root 1611u sock 0,7 0t0 2066704918 can't identify protocol
java 22147 root 1612u sock 0,7 0t0 2066702383 can't identify protocol
java 22147 root 1613u sock 0,7 0t0 2066705250 can't identify protocol
java 22147 root 1614u sock 0,7 0t0 2066706342 can't identify protocol

wspi commented Oct 27, 2014

I believe this happens with the elasticsearch_http output as well, after changing to this output, I usually get too many open files and on lsof I get a lot of this:

java 22147 root 1609u sock 0,7 0t0 2066701787 can't identify protocol
java 22147 root 1610u sock 0,7 0t0 2066702739 can't identify protocol
java 22147 root 1611u sock 0,7 0t0 2066704918 can't identify protocol
java 22147 root 1612u sock 0,7 0t0 2066702383 can't identify protocol
java 22147 root 1613u sock 0,7 0t0 2066705250 can't identify protocol
java 22147 root 1614u sock 0,7 0t0 2066706342 can't identify protocol

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 27, 2014

Member

@wspi yes it might ,since both use ftw <= 0.0.39, could you also try the patch above please?

Member

jsvd commented Oct 27, 2014

@wspi yes it might ,since both use ftw <= 0.0.39, could you also try the patch above please?

@wspi

This comment has been minimized.

Show comment
Hide comment
@wspi

wspi Oct 27, 2014

@jsvd I'll try it on our testing environment and let you know, thanks

wspi commented Oct 27, 2014

@jsvd I'll try it on our testing environment and let you know, thanks

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 27, 2014

Member

@wspi thank you.

The elasticsearch_http output is going to be deprecated and the elasticsearch output plugin is moving towards the manticore gem for http.

Member

jsvd commented Oct 27, 2014

@wspi thank you.

The elasticsearch_http output is going to be deprecated and the elasticsearch output plugin is moving towards the manticore gem for http.

@wspi

This comment has been minimized.

Show comment
Hide comment
@wspi

wspi Oct 27, 2014

@jsvd As far as I can say, it worked, the number of open files has stabilized. thanks for the help

I'll let you know if anything changes

wspi commented Oct 27, 2014

@jsvd As far as I can say, it worked, the number of open files has stabilized. thanks for the help

I'll let you know if anything changes

@bpesics

This comment has been minimized.

Show comment
Hide comment
@bpesics

bpesics Oct 28, 2014

I have a long standing problem with logstash pipelines jamming up and
eventually the jvm somehow runs out of heap space (took about 25 days...
however I t think messages started getting stuck after cc. 15 days) . I
think it might be related to this ftw issue but might be something
completely different.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/java_pid1430.hprof ...

I don't really know how to check this so here is the heap dump if anyone is
interested: https://dl.dropboxusercontent.com/u/107643447/java_pid1430.zip

Thanks,
/Bela

On Mon, Oct 27, 2014 at 6:29 PM, Wagner notifications@github.com wrote:

@jsvd https://github.com/jsvd As far as I can say, it worked, the
number of open files has stabilized. thanks for the help

I'll let you know if anything changes


Reply to this email directly or view it on GitHub
#1604 (comment)
.

bpesics commented Oct 28, 2014

I have a long standing problem with logstash pipelines jamming up and
eventually the jvm somehow runs out of heap space (took about 25 days...
however I t think messages started getting stuck after cc. 15 days) . I
think it might be related to this ftw issue but might be something
completely different.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/java_pid1430.hprof ...

I don't really know how to check this so here is the heap dump if anyone is
interested: https://dl.dropboxusercontent.com/u/107643447/java_pid1430.zip

Thanks,
/Bela

On Mon, Oct 27, 2014 at 6:29 PM, Wagner notifications@github.com wrote:

@jsvd https://github.com/jsvd As far as I can say, it worked, the
number of open files has stabilized. thanks for the help

I'll let you know if anything changes


Reply to this email directly or view it on GitHub
#1604 (comment)
.

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 29, 2014

Member

FTW will be replaced by manticore in #1777

Member

jsvd commented Oct 29, 2014

FTW will be replaced by manticore in #1777

@bpesics

This comment has been minimized.

Show comment
Hide comment
@bpesics

bpesics Oct 29, 2014

are all plugins switching from ftw?

On Wed, Oct 29, 2014 at 10:34 AM, João Duarte notifications@github.com
wrote:

FTW will be replaced by manticore https://github.com/cheald/manticore
in #1777 #1777


Reply to this email directly or view it on GitHub
#1604 (comment)
.

bpesics commented Oct 29, 2014

are all plugins switching from ftw?

On Wed, Oct 29, 2014 at 10:34 AM, João Duarte notifications@github.com
wrote:

FTW will be replaced by manticore https://github.com/cheald/manticore
in #1777 #1777


Reply to this email directly or view it on GitHub
#1604 (comment)
.

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 29, 2014

Member

We have no plans for that, but it does makes sense to reduce the number of http libraries indeed..

Member

jsvd commented Oct 29, 2014

We have no plans for that, but it does makes sense to reduce the number of http libraries indeed..

@bpesics

This comment has been minimized.

Show comment
Hide comment
@bpesics

bpesics Oct 29, 2014

i'm just asking because lot's of plugins use ftw and inherently all were
affected by this problem before the ftw fix

On Wed, Oct 29, 2014 at 3:17 PM, João Duarte notifications@github.com
wrote:

We have no plans for that, but it makes sense to reduce the number of http
libraries indeed..


Reply to this email directly or view it on GitHub
#1604 (comment)
.

bpesics commented Oct 29, 2014

i'm just asking because lot's of plugins use ftw and inherently all were
affected by this problem before the ftw fix

On Wed, Oct 29, 2014 at 3:17 PM, João Duarte notifications@github.com
wrote:

We have no plans for that, but it makes sense to reduce the number of http
libraries indeed..


Reply to this email directly or view it on GitHub
#1604 (comment)
.

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Oct 29, 2014

Member

True, this has been fixed in ftw 0.0.40 and logstash 1.5 will ship with it.

Member

jsvd commented Oct 29, 2014

True, this has been fixed in ftw 0.0.40 and logstash 1.5 will ship with it.

@jsvd

This comment has been minimized.

Show comment
Hide comment
@jsvd

jsvd Nov 10, 2014

Member

All the plugins that use ftw have been bumped to 0.0.40.

Member

jsvd commented Nov 10, 2014

All the plugins that use ftw have been bumped to 0.0.40.

@jsvd jsvd closed this Nov 10, 2014

@vanga

This comment has been minimized.

Show comment
Hide comment
@vanga

vanga Feb 25, 2015

I have updated the two files mentioned here jordansissel/ruby-ftw@b065c3c
I still dont see the issue fixed, file descriptors goes to max value.
I am using logstast-1.4.2
Is there anyway for me to use ftw 0.0.40 version without having to update logstash version?

vanga commented Feb 25, 2015

I have updated the two files mentioned here jordansissel/ruby-ftw@b065c3c
I still dont see the issue fixed, file descriptors goes to max value.
I am using logstast-1.4.2
Is there anyway for me to use ftw 0.0.40 version without having to update logstash version?

@vasubesimple

This comment has been minimized.

Show comment
Hide comment
@vasubesimple

vasubesimple Mar 24, 2015

{:timestamp=>"2015-03-24T13:41:07.097000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>1, :exception=>java.lang.OutOfMemoryError: Java heap space, :backtrace=>[], :level=>:warn}

I have 103 Gigs of free memory ! I am using 1.4.2-1-2c0f5a1 Version !!!! It seems issue was not solved :) any recommendations ?

vasubesimple commented Mar 24, 2015

{:timestamp=>"2015-03-24T13:41:07.097000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>1, :exception=>java.lang.OutOfMemoryError: Java heap space, :backtrace=>[], :level=>:warn}

I have 103 Gigs of free memory ! I am using 1.4.2-1-2c0f5a1 Version !!!! It seems issue was not solved :) any recommendations ?

@chiriaev

This comment has been minimized.

Show comment
Hide comment
@chiriaev

chiriaev Mar 24, 2015

Guys, Hi. I can also see that on my "1.5 RC2". I currently have 3500+ open file descriptors on my log-stash TCP input, port 12201, for about 400 real servers. (i.e.: the value I have after a re-start)

Also, I have a case when my TCP input crashed. (i.e.: as per the "[B.]" output below)

Happy to provide you with more information. Please

Regards, Andrei

P.S.: again "[A.]" is for an working/running instance with a socket leak
"[B.]" is for a similar "+" the TCP input crashed.

+++
[A.]
...
:timestamp=>"2015-03-23T20:21:40.316000+0000", :message=>"An error occurred. Closing connection", :client=>"10.51.4.5:59480", :exception=>#<TypeError: can't convert Array into String>, :backtrace=>["org/jruby/RubyString.java:1156:in +'", "/opt/logstash/lib/logstash/inputs/base.rb:114:indecorate'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:119:in handle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:40:indecode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:37:in decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-0.1.5/lib/logstash/codecs/line.rb:36:indecode'", "org/jruby/RubyArray.java:1613:in each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-0.1.5/lib/logstash/codecs/line.rb:35:indecode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:35:in decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:116:inhandle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:145:in client_thread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:143:inclient_thread'"], :level=>:error}
...

[B]
...
{:timestamp=>"2015-03-24T14:47:15.764000+0000", :message=>"An error occurred. Closing connection", :client=>"10.51.4.5:33399", :exception=>#<IOError: Too many open files>, :backtrace=>["org/jruby/RubyIO.java:2994:in sysread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:161:inread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:110:in handle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:145:inclient_thread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:143:in `client_thread'"], :level=>:error}
{:timestamp=>"2015-03-24T14:47:16.837000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::Tcp type=>"nxlog_json", tags=>["input_nxlog_json"], host=>"0.0.0.0", mode=>"server">\n Error: Too many open files", :level=>:error}
...

chiriaev commented Mar 24, 2015

Guys, Hi. I can also see that on my "1.5 RC2". I currently have 3500+ open file descriptors on my log-stash TCP input, port 12201, for about 400 real servers. (i.e.: the value I have after a re-start)

Also, I have a case when my TCP input crashed. (i.e.: as per the "[B.]" output below)

Happy to provide you with more information. Please

Regards, Andrei

P.S.: again "[A.]" is for an working/running instance with a socket leak
"[B.]" is for a similar "+" the TCP input crashed.

+++
[A.]
...
:timestamp=>"2015-03-23T20:21:40.316000+0000", :message=>"An error occurred. Closing connection", :client=>"10.51.4.5:59480", :exception=>#<TypeError: can't convert Array into String>, :backtrace=>["org/jruby/RubyString.java:1156:in +'", "/opt/logstash/lib/logstash/inputs/base.rb:114:indecorate'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:119:in handle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:40:indecode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:37:in decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-0.1.5/lib/logstash/codecs/line.rb:36:indecode'", "org/jruby/RubyArray.java:1613:in each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-0.1.5/lib/logstash/codecs/line.rb:35:indecode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json_lines-0.1.6/lib/logstash/codecs/json_lines.rb:35:in decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:116:inhandle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:145:in client_thread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:143:inclient_thread'"], :level=>:error}
...

[B]
...
{:timestamp=>"2015-03-24T14:47:15.764000+0000", :message=>"An error occurred. Closing connection", :client=>"10.51.4.5:33399", :exception=>#<IOError: Too many open files>, :backtrace=>["org/jruby/RubyIO.java:2994:in sysread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:161:inread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:110:in handle_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:145:inclient_thread'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.3/lib/logstash/inputs/tcp.rb:143:in `client_thread'"], :level=>:error}
{:timestamp=>"2015-03-24T14:47:16.837000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::Tcp type=>"nxlog_json", tags=>["input_nxlog_json"], host=>"0.0.0.0", mode=>"server">\n Error: Too many open files", :level=>:error}
...

@f-ld

This comment has been minimized.

Show comment
Hide comment
@f-ld

f-ld Apr 17, 2015

Hi,

I am not using the http output plugin but I had a look to it to make a kind of http input plugin and noticed the same "descriptor leak" issue with ftw.

I made a "quick'n'dirty" workaround to my input plugin by adding a "@agent.shutdown" after my request and it works as expected.

FYI, the agent.shutdown is coded like this

# shutdown this agent.
#
# This will shutdown all active connections.
def shutdown
    @pool.each do |identifier, list|
      list.each do |connection|
        connection.disconnect("stopping agent")
      end
    end
  end # def shutdown 

So this is basically forcing ftw to close all the connections it has.
Fixing the leak first, optimization later.
No change needed in the ftw source code.

I guess you can do the same thing inside the http output plugin, at the end of the receive method (just before "end # def receive"). Since I am not using this plugin, I cannot test it very quickly and see if this helps.

For me this is only a workaround but avoids our logstash instances to crash.

Cheers,
Fabien

f-ld commented Apr 17, 2015

Hi,

I am not using the http output plugin but I had a look to it to make a kind of http input plugin and noticed the same "descriptor leak" issue with ftw.

I made a "quick'n'dirty" workaround to my input plugin by adding a "@agent.shutdown" after my request and it works as expected.

FYI, the agent.shutdown is coded like this

# shutdown this agent.
#
# This will shutdown all active connections.
def shutdown
    @pool.each do |identifier, list|
      list.each do |connection|
        connection.disconnect("stopping agent")
      end
    end
  end # def shutdown 

So this is basically forcing ftw to close all the connections it has.
Fixing the leak first, optimization later.
No change needed in the ftw source code.

I guess you can do the same thing inside the http output plugin, at the end of the receive method (just before "end # def receive"). Since I am not using this plugin, I cannot test it very quickly and see if this helps.

For me this is only a workaround but avoids our logstash instances to crash.

Cheers,
Fabien

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment