Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash 5.4.1 Deadlocks #7588

Closed
ssozonoff opened this issue Jul 4, 2017 · 27 comments
Closed

Logstash 5.4.1 Deadlocks #7588

ssozonoff opened this issue Jul 4, 2017 · 27 comments
Assignees
Labels

Comments

@ssozonoff
Copy link

So we have Logstash deadlocking on us.....

We have a simple configuration using a Kafka input and an HTTP output to ES.
It can run fine for a day or more and then just freeze without warning. Today it has frozen twice !!

Strace shows its frozen on a mutex

[root@ip-10-10-1-169 logstash]# strace -p 15119 Process 15119 attached futex(0x7fe1016479d0, FUTEX_WAIT, 15139, NULL

Thread dump attached.

thread_dump_js.txt

This is of course rather worrying and now we have to babysit it!!

"logstash-input-kafka", "6.3.2"
"logstash-output-http"
logstash 5.4.1

@jsvd
Copy link
Member

jsvd commented Jul 4, 2017

thanks for the report.

did this start happening when logstash was upgraded in production? which version worked fine before?

@jsvd jsvd added the bug label Jul 4, 2017
@ssozonoff
Copy link
Author

Well up until two-three weeks ago we were running a different pipeline and with an older version of Logstash (cant remember which but at least a year old) which worked well for over a year.

A few weeks ago we migrated to Kafka in production and were forced to upgrade Logstash along with it. As of last Friday it has been crashing at least twice per day after having run fine for about two weeks prior. We think it could be "work load" related but as you see its not going to be easy with zero information in the logs or anywhere else!

At this point we have no idea if its the input plugin or output plugin or other. We are currently running with a consumer_count of 1 on the Kafka (input) side.

@ssozonoff
Copy link
Author

Just crashed again this time something interesting in the log file

[2017-07-04T17:14:05,785][INFO ][org.apache.kafka.clients.producer.KafkaProducer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. [2017-07-04T17:14:05,794][INFO ][io.confluent.monitoring.clients.interceptor.MonitoringInterceptor] Closed monitoring interceptor for client ID=logstash-river-consumer-1 [2017-07-04T17:14:05,884][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<IOError: Unmatched first part of surrogate pair (0xd83c) (through reference chain: java.util.HashMap["DATA"]->org.logstash.ConvertedMap["message"]->org.logstash.bivalues.StringBiValue["javaValue()"])>, :backtrace=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:153:in write'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:56:in push'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:354:in push'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/wrapped_write_client.rb:22:in push'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/wrapped_write_client.rb:38:in record_metric'", "/usr/share/logstash/logstash-core/lib/logstash/instrument/wrapped_write_client.rb:22:in push'", "/usr/share/logstash/vendor/local_gems/34c2b704/logstash-input-kafka-6.3.2/lib/logstash/inputs/kafka.rb:266:in thread_runner'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-plain-3.0.2/lib/logstash/codecs/plain.rb:35:in decode'", "/usr/share/logstash/vendor/local_gems/34c2b704/logstash-input-kafka-6.3.2/lib/logstash/inputs/kafka.rb:256:in thread_runner'", "file:/usr/share/logstash/vendor/jruby/lib/jruby.jar!/jruby/java/java_ext/java.lang.rb:12:in each'", "/usr/share/logstash/vendor/local_gems/34c2b704/logstash-input-kafka-6.3.2/lib/logstash/inputs/kafka.rb:255:in thread_runner'"]}`

@ssozonoff
Copy link
Author

This was different in that Logstash actually crashed and died versus a freeze.

@original-brownbear original-brownbear self-assigned this Jul 4, 2017
@original-brownbear
Copy link
Member

original-brownbear commented Jul 4, 2017

@ssozonoff I looked into this and the freezing I believe is fixed in 5.4.2 and newer by this patch 047291d

The serialization issue (at least the crash looks like one) you're seeing, I'm looking into now. Any details on the type of data you're handling here (non-UTF8 encoding and the like is what I'm looking for) are very welcome if you can and/or want to share anything here.

@original-brownbear
Copy link
Member

@ssozonoff this in particular

Unmatched first part of surrogate pair (0xd83c)

seems to suggest you're dealing with UTF-16 data, is that the case?

@ssozonoff
Copy link
Author

Hi. I think we should probably separate these issues since I believe we are now talking about two different ones. My fault for pasting everything in the same.

Secondly I propose as a starting point to upgrade to 5.4.2 and see if we can resolve the freeze as this is the more re-occurring and urgent issue at the moment.

Regarding UTF-16, normally we are working only with UTF-8 but I can dive into this one a little more after we have fixed the freeze.

@original-brownbear
Copy link
Member

@ssozonoff

Hi. I think we should probably separate these issues since I believe we are now talking about two different ones. My fault for pasting everything in the same.

No worries, the first issue at least with the information I have here seems very straight forward and known. If it's not fixed by 5.4.2 I agree, we should track it down separately from the remaining ( encoding/serialization) issue.

Secondly I propose as a starting point to upgrade to 5.4.2 and see if we can resolve the freeze as this is the more re-occurring and urgent issue at the moment.

Sounds good.

Regarding UTF-16, normally we are working only with UTF-8 but I can dive into this one a little more after we have fixed the freeze.

If you indeed have UTF-16 characters in your data it seems plausible that there may be a step in our serialization strategy that would run into trouble here. I'll review this as well. I'll add a separate issue for this in case I find anything and will ping you there then.

@czka
Copy link

czka commented Jul 5, 2017

Couple of weeks ago in my org we tried upgrading from 5.1.2 straight to 5.4.1. Works fine on most instances, except those with http output. They started getting stuck at random.

I downgraded version-by-version. The latest one free from http output problems is 5.2.1. 5.2.2 is affected for sure. So the problematic change must have been introduced somewhere in between.

@ssozonoff
Copy link
Author

We have not had luck with our upgrade to 5.4.2, we are still freezing somewhere but we are also using http output..... so could be that its fixed but now the http output is harming us. What a nightmare!

@jsvd
Copy link
Member

jsvd commented Jul 5, 2017

can you please take new thread dumps as you attached in the creation of this issue?

@ssozonoff
Copy link
Author

I tried but was unable to get it.... VM unresponsive. I will try again when it happens next time

@czka
Copy link

czka commented Jul 5, 2017

@ssozonoff Can you try 5.2.2 and 5.2.1 to verify my findings?

@czka
Copy link

czka commented Jul 5, 2017

Maybe it's relevant: with 5.2.2 I keep getting the following in logstash-plain.log at logstash service restart. No such thing with 5.2.1:

[2017-07-05T10:13:43,535][WARN ][logstash.runner          ] SIGTERM received. Shutting down the agent.
[2017-07-05T10:13:43,542][WARN ][logstash.agent           ] stopping pipeline {:id=>"main"}
[2017-07-05T10:13:48,572][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:13:48,573][ERROR][logstash.shutdownwatcher ] The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information.
[2017-07-05T10:13:53,559][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:13:58,555][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:03,558][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:08,559][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:13,556][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:18,554][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:23,554][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:28,556][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:33,556][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:38,555][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:43,553][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:48,557][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:53,552][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:14:58,551][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:15:03,553][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:15:08,554][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}
[2017-07-05T10:15:13,551][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>1537, "stalling_thread_info"=>{"other"=>[{"thread_id"=>21, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>22, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>25, "name"=>"[main]>worker4", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>26, "name"=>"[main]>worker5", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>27, "name"=>"[main]>worker6", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>28, "name"=>"[main]>worker7", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}, {"thread_id"=>30, "name"=>"[main]>worker9", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-output-http-4.1.0/lib/logstash/outputs/http.rb:141:in `pop'"}]}}

@ssozonoff
Copy link
Author

ssozonoff commented Jul 5, 2017

But my understanding is that with 5.2.1 we will have the other DEADLOCK issue mentioned earlier and fixed in 5.4.2 and I believe that our Kafka input requires something more recent than 5.2.x but not sure on this.

@ssozonoff
Copy link
Author

Thread Dump attached

Archive 2.zip

@original-brownbear
Copy link
Member

@jsvd looking into the http output ...

@jsvd
Copy link
Member

jsvd commented Jul 5, 2017

thanks @czka for doing that regression inspection work! looking at the 5.2.2 vs 5.2.1 I can see the versions use two different http output plugins:

logstash-output-http version 4.1.0 in logstash 5.2.2
logstash-output-http version 3.1.1 in logstash 5.2.1

This major version shouldn't have been included in neither a minor nor patch release of logstash, but it happened..

@ssozonoff can you downgrade the http output plugin and check if the problem still occurs?

The instructions to downgrade require the following calls:

bin/logstash-plugin uninstall logstash-input-http_poller
bin/logstash-plugin uninstall logstash-output-http
bin/logstash-plugin install --no-verify --version "4.0.3" logstash-mixin-http_client
bin/logstash-plugin install --version "3.1.1" logstash-output-http

If you use the http_poller you can reinstall it after these 4 commands with:

bin/logstash-plugin install logstash-input-http_poller

@jsvd
Copy link
Member

jsvd commented Jul 5, 2017

me and @original-brownbear have replicated this embarrassingly easily with:

server side:
bin/logstash -e "input { http { port => 8001 } } output {codec => dots }"

client side:
bin/logstash -e "input { generator {} } output { http { url => 'http://localhost:8001' http_method => post } }"

The client side will stall after 500-1000 events. I'm digging into the code to check what is wrong. until then, the suggested work around is to downgrade the http output with:

bin/logstash-plugin uninstall logstash-input-http_poller
bin/logstash-plugin uninstall logstash-output-http
bin/logstash-plugin install --no-verify --version "4.0.3" logstash-mixin-http_client
bin/logstash-plugin install --version "3.1.1" logstash-output-http

@ssozonoff
Copy link
Author

Can you confirm that we can use the 3.1.1 version of logstash-output-http with Logstash 5.4.2?

@jsvd
Copy link
Member

jsvd commented Jul 5, 2017

@ssozonoff that version works with any logstash version since 2.4 until the latest 5.x

@ssozonoff
Copy link
Author

We have downgraded per instructions and will report back. Thanks

@ssozonoff
Copy link
Author

OK so things are looking a lot better now! Lets give it 24 hours. Thanks for the prompt help guys.

andrewvc added a commit to andrewvc/logstash-output-http that referenced this issue Jul 5, 2017
This plugin could sometimes deadlock when the downstream server responded quickly.
This patch fixes this by making sure the callbacks are declared before the request is sent.

Fixes elastic/logstash#7588 (comment)
@andrewvc
Copy link
Contributor

andrewvc commented Jul 5, 2017

@ssozonoff and @czka Good news! We've found the bug and have a fix for it here which will hopefully be merged by tomorrow: logstash-plugins/logstash-output-http#64

@ssozonoff
Copy link
Author

Thanks again. I will close this issue hoping that we are all good now :-) For now I think we will stick to the version combo currently running ;-)

andrewvc added a commit to logstash-plugins/logstash-output-http that referenced this issue Jul 6, 2017
* Fix deadlock due improper ordering of client API

This plugin could sometimes deadlock when the downstream server responded quickly.
This patch fixes this by making sure the callbacks are declared before the request is sent.

Fixes elastic/logstash#7588 (comment)

* Remove dead method

* Bump to 4.3.1

* Include more detail in changelog and code WRT async bug
@andrewvc
Copy link
Contributor

andrewvc commented Jul 6, 2017

@ssozonoff thanks once again for reporting this! The new http output version 4.3.2 has been released and should fix this.

FWIW, I believe the 4.x series of the HTTP output should be faster due to more efficient resource usage. If you have any perf problems with the old version give the new one a shot.

@czka
Copy link

czka commented Jul 7, 2017

@andrewvc Swiftly done! Thanks a ton. Will 4.3.2 make it to the next Logstash rpm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants