Infinate hang when Elasticsearch restarts #1655

mdconner · 2014-08-21T18:14:27Z

Logstash appears to go into infinite hang after events are processed after Elasticsearch restarts.
Tested with Logstash 1.4.1 & Elasticsearch 1.1.1 on RHEL 6.5 by the following steps (tested with File input also):

Start Elasticsearch from command line and wait for "started" message
Start Logstash from command line as agent with following config file
input { stdin { } }
output { stdout { }
elasticsearch { cluster => "elasticsearch" }
}
Send test events (stdin)
Stop Elasticsearch
Send test events (stdin)
Start Elasticsearch
Send test events (stdin)

Seems to recover successfully when no events sent before Elasticsearch restarts (step 5). Here is the entire console output (Exception occurs after Test 5):

[root@lnx65 bin]# ./logstash agent --verbose -f ../mdc_stdin.conf
Pipeline started {:level=>:info}
log4j, [2014-08-21T20:10:44.283] INFO: org.elasticsearch.node: [logstash-lnx65.localdomain-17807-2010] version[1.1.1], pid[17807], build[f1585f0/2014-04-16T14:27:12Z]
log4j, [2014-08-21T20:10:44.283] INFO: org.elasticsearch.node: [logstash-lnx65.localdomain-17807-2010] initializing ...
log4j, [2014-08-21T20:10:44.316] INFO: org.elasticsearch.plugins: [logstash-lnx65.localdomain-17807-2010] loaded [], sites []
log4j, [2014-08-21T20:10:47.029] INFO: org.elasticsearch.node: [logstash-lnx65.localdomain-17807-2010] initialized
log4j, [2014-08-21T20:10:47.029] INFO: org.elasticsearch.node: [logstash-lnx65.localdomain-17807-2010] starting ...
log4j, [2014-08-21T20:10:47.141] INFO: org.elasticsearch.transport: [logstash-lnx65.localdomain-17807-2010] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/10.0.2.15:9301]}
log4j, [2014-08-21T20:10:50.358] INFO: org.elasticsearch.cluster.service: [logstash-lnx65.localdomain-17807-2010] detected_master [lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]], added {[lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]],}, reason: zen-disco-receive(from master [[lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]]])
log4j, [2014-08-21T20:10:50.368] INFO: org.elasticsearch.discovery: [logstash-lnx65.localdomain-17807-2010] elasticsearch/SeDbmuOZTlq5DxxCnENasQ
log4j, [2014-08-21T20:10:50.373] INFO: org.elasticsearch.node: [logstash-lnx65.localdomain-17807-2010] started
New Elasticsearch output {:cluster=>"elasticsearch", :host=>nil, :port=>"9300-9305", :embedded=>false, :protocol=>"node", :level=>:info}
Automatic template management enabled {:manage_template=>"true", :level=>:info}
Using mapping template {:template=>"{ "template" : "logstash-_", "settings" : { "index.refresh_interval" : "5s" }, "mappings" : { "default" : { "all" : {"enabled" : true}, "dynamic_templates" : [ { "string_fields" : { "match" : "", "match_mapping_type" : "string", "mapping" : { "type" : "string", "index" : "analyzed", "omit_norms" : true, "fields" : { "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256} } } } } ], "properties" : { "@Version": { "type": "string", "index": "not_analyzed" }, "geoip" : { "type" : "object", "dynamic": true, "path": "full", "properties" : { "location" : { "type" : "geo_point" } } } } } }}", :level=>:info}

2014-08-21T18:10:53.394+0000 lnx65.localdomain
Test 1
2014-08-21T18:10:58.896+0000 lnx65.localdomain Test 1
Test 2
2014-08-21T18:11:01.376+0000 lnx65.localdomain Test 2
log4j, [2014-08-21T20:11:06.351] INFO: org.elasticsearch.discovery.zen: [logstash-lnx65.localdomain-17807-2010] master_left [[lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]]], reason [transport disconnected (with verified connect)]
log4j, [2014-08-21T20:11:06.353] WARN: org.elasticsearch.discovery.zen: [logstash-lnx65.localdomain-17807-2010] master_left and no other node elected to become master, current nodes: {[logstash-lnx65.localdomain-17807-2010][SeDbmuOZTlq5DxxCnENasQ][lnx65.localdomain][inet[/10.0.2.15:9301]]{client=true, data=false},}
log4j, [2014-08-21T20:11:06.354] INFO: org.elasticsearch.cluster.service: [logstash-lnx65.localdomain-17807-2010] removed {[lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]],}, reason: zen-disco-master_failed ([lnx65][El3W2hxoQvqkvIWo0fT4sg][lnx65.localdomain][inet[/10.0.2.15:9300]])

2014-08-21T18:11:14.608+0000 lnx65.localdomain

Test 3
Test 4
Test 5
Exception in thread "elasticsearch[logstash-lnx65.localdomain-17807-2010][generic][T#2]" org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:117)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
log4j, [2014-08-21T20:11:45.493] INFO: org.elasticsearch.cluster.service: [logstash-lnx65.localdomain-17807-2010] detected_master [lnx65][7vpAU211RHemHzNw1u9EyQ][lnx65.localdomain][inet[/10.0.2.15:9300]], added {[lnx65][7vpAU211RHemHzNw1u9EyQ][lnx65.localdomain][inet[/10.0.2.15:9300]],}, reason: zen-disco-receive(from master [[lnx65][7vpAU211RHemHzNw1u9EyQ][lnx65.localdomain][inet[/10.0.2.15:9300]]])

Test 6
Test 7

hoelzro · 2014-09-29T15:41:28Z

I've been seeing this issue as well.

benediktarnold · 2014-11-11T12:49:21Z

I am seeing this issue with logstash 1.4.2 and es 1.1.1 too.

tiagobnobrega · 2014-11-17T17:03:06Z

I am also seeing this issue. Have to restart logstash also to make everything work again. es 1.1.1 logstash 1.4.2

sbitpdc · 2014-12-09T09:53:07Z

I've been seeing this issue as well.

cisco-sro · 2015-01-09T03:40:59Z

Same here with Logstash 1.4.2 and ES 1.4.2

wangjingbomail · 2015-03-09T07:09:22Z

Yes,I have to restart all logstash when I restart elasticsearch, it is boring

matthewparsons · 2015-04-20T17:26:47Z

I've got a suspicion the filter queue is blocking and not clearing again. Is there any way to better debug what is going on?

skikd636 · 2015-05-01T12:16:30Z

I get similar behavior here except it's not when ES restarts it's when I have what I'll call a connection "hiccup" between Logstash and ES. The Logstash server goes into a continuous loop. I get no errors on the ES side of things which I find strange. Here's the error I see in the Logstash log file:

{:timestamp=>"2015-05-01T06:41:17.479000-0400", :message=>"Failed to flush outgoing items", :outgoing_count=>2, :exception=>#<Errno::ECONNREFUSED: Connection refused - Connection
refused>, :backtrace=>["org/jruby/ext/socket/RubySocket.java:201:in `connect_nonblock'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:156:in `conne
ct'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", "/opt/logstash/vendor/bundle/jruby
/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48:in `fet
ch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `ex
ecute'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217:in `post!'", "/opt/logstash/lib/logstash/outputs/elasticsearch_http.rb:228:in `post'", "/opt/l
ogstash/lib/logstash/outputs/elasticsearch_http.rb:223:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/R
ubyHash.java:1339:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0
.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:112:in `buffer_initialize'", "org/jruby/RubyKernel.java
:1521:in `loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:110:in `buffer_initialize'"], :level=>:warn}

jordansissel · 2015-05-18T06:01:04Z

For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to logstash-plugins/logstash-output-elasticsearch#142. Let's continue the discussion there! :)

suyograo added the needreview label Aug 29, 2014

suyograo self-assigned this Aug 29, 2014

jordansissel mentioned this issue May 18, 2015

Infinate hang when Elasticsearch restarts logstash-plugins/logstash-output-elasticsearch#142

Closed

jordansissel closed this as completed May 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinate hang when Elasticsearch restarts #1655

Infinate hang when Elasticsearch restarts #1655

mdconner commented Aug 21, 2014

hoelzro commented Sep 29, 2014

benediktarnold commented Nov 11, 2014

tiagobnobrega commented Nov 17, 2014

sbitpdc commented Dec 9, 2014

cisco-sro commented Jan 9, 2015

wangjingbomail commented Mar 9, 2015

matthewparsons commented Apr 20, 2015

skikd636 commented May 1, 2015

jordansissel commented May 18, 2015

Infinate hang when Elasticsearch restarts #1655

Infinate hang when Elasticsearch restarts #1655

Comments

mdconner commented Aug 21, 2014

hoelzro commented Sep 29, 2014

benediktarnold commented Nov 11, 2014

tiagobnobrega commented Nov 17, 2014

sbitpdc commented Dec 9, 2014

cisco-sro commented Jan 9, 2015

wangjingbomail commented Mar 9, 2015

matthewparsons commented Apr 20, 2015

skikd636 commented May 1, 2015

jordansissel commented May 18, 2015