New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash 5.4.1 Deadlocks #7588
Comments
thanks for the report. did this start happening when logstash was upgraded in production? which version worked fine before? |
Well up until two-three weeks ago we were running a different pipeline and with an older version of Logstash (cant remember which but at least a year old) which worked well for over a year. A few weeks ago we migrated to Kafka in production and were forced to upgrade Logstash along with it. As of last Friday it has been crashing at least twice per day after having run fine for about two weeks prior. We think it could be "work load" related but as you see its not going to be easy with zero information in the logs or anywhere else! At this point we have no idea if its the input plugin or output plugin or other. We are currently running with a consumer_count of 1 on the Kafka (input) side. |
Just crashed again this time something interesting in the log file
|
This was different in that Logstash actually crashed and died versus a freeze. |
@ssozonoff I looked into this and the freezing I believe is fixed in The serialization issue (at least the crash looks like one) you're seeing, I'm looking into now. Any details on the type of data you're handling here (non-UTF8 encoding and the like is what I'm looking for) are very welcome if you can and/or want to share anything here. |
@ssozonoff this in particular Unmatched first part of surrogate pair (0xd83c) seems to suggest you're dealing with UTF-16 data, is that the case? |
Hi. I think we should probably separate these issues since I believe we are now talking about two different ones. My fault for pasting everything in the same. Secondly I propose as a starting point to upgrade to 5.4.2 and see if we can resolve the freeze as this is the more re-occurring and urgent issue at the moment. Regarding UTF-16, normally we are working only with UTF-8 but I can dive into this one a little more after we have fixed the freeze. |
No worries, the first issue at least with the information I have here seems very straight forward and known. If it's not fixed by
Sounds good.
If you indeed have |
Couple of weeks ago in my org we tried upgrading from 5.1.2 straight to 5.4.1. Works fine on most instances, except those with http output. They started getting stuck at random. I downgraded version-by-version. The latest one free from http output problems is 5.2.1. 5.2.2 is affected for sure. So the problematic change must have been introduced somewhere in between. |
We have not had luck with our upgrade to 5.4.2, we are still freezing somewhere but we are also using http output..... so could be that its fixed but now the http output is harming us. What a nightmare! |
can you please take new thread dumps as you attached in the creation of this issue? |
I tried but was unable to get it.... VM unresponsive. I will try again when it happens next time |
@ssozonoff Can you try 5.2.2 and 5.2.1 to verify my findings? |
Maybe it's relevant: with 5.2.2 I keep getting the following in logstash-plain.log at logstash service restart. No such thing with 5.2.1:
|
But my understanding is that with 5.2.1 we will have the other DEADLOCK issue mentioned earlier and fixed in 5.4.2 and I believe that our Kafka input requires something more recent than 5.2.x but not sure on this. |
Thread Dump attached |
@jsvd looking into the http output ... |
thanks @czka for doing that regression inspection work! looking at the 5.2.2 vs 5.2.1 I can see the versions use two different http output plugins: logstash-output-http version 4.1.0 in logstash 5.2.2 This major version shouldn't have been included in neither a minor nor patch release of logstash, but it happened.. @ssozonoff can you downgrade the http output plugin and check if the problem still occurs? The instructions to downgrade require the following calls:
If you use the http_poller you can reinstall it after these 4 commands with:
|
me and @original-brownbear have replicated this embarrassingly easily with: server side: client side: The client side will stall after 500-1000 events. I'm digging into the code to check what is wrong. until then, the suggested work around is to downgrade the http output with:
|
Can you confirm that we can use the 3.1.1 version of logstash-output-http with Logstash 5.4.2? |
@ssozonoff that version works with any logstash version since 2.4 until the latest 5.x |
We have downgraded per instructions and will report back. Thanks |
OK so things are looking a lot better now! Lets give it 24 hours. Thanks for the prompt help guys. |
This plugin could sometimes deadlock when the downstream server responded quickly. This patch fixes this by making sure the callbacks are declared before the request is sent. Fixes elastic/logstash#7588 (comment)
@ssozonoff and @czka Good news! We've found the bug and have a fix for it here which will hopefully be merged by tomorrow: logstash-plugins/logstash-output-http#64 |
Thanks again. I will close this issue hoping that we are all good now :-) For now I think we will stick to the version combo currently running ;-) |
* Fix deadlock due improper ordering of client API This plugin could sometimes deadlock when the downstream server responded quickly. This patch fixes this by making sure the callbacks are declared before the request is sent. Fixes elastic/logstash#7588 (comment) * Remove dead method * Bump to 4.3.1 * Include more detail in changelog and code WRT async bug
@ssozonoff thanks once again for reporting this! The new http output version 4.3.2 has been released and should fix this. FWIW, I believe the 4.x series of the HTTP output should be faster due to more efficient resource usage. If you have any perf problems with the old version give the new one a shot. |
@andrewvc Swiftly done! Thanks a ton. Will 4.3.2 make it to the next Logstash rpm? |
So we have Logstash deadlocking on us.....
We have a simple configuration using a Kafka input and an HTTP output to ES.
It can run fine for a day or more and then just freeze without warning. Today it has frozen twice !!
Strace shows its frozen on a mutex
[root@ip-10-10-1-169 logstash]# strace -p 15119 Process 15119 attached futex(0x7fe1016479d0, FUTEX_WAIT, 15139, NULL
Thread dump attached.
thread_dump_js.txt
This is of course rather worrying and now we have to babysit it!!
"logstash-input-kafka", "6.3.2"
"logstash-output-http"
logstash 5.4.1
The text was updated successfully, but these errors were encountered: