Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upLumberjack server resets all tcp connections when client fails SSL negotiation #160
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Feb 6, 2014
Contributor
I did some digging and couldn't reproduce this. Still sounds like there's a bug somewhere, though
|
I did some digging and couldn't reproduce this. Still sounds like there's a bug somewhere, though |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rachin-fab
Feb 21, 2014
The other thing that we are seeing is the following error.
This occurs at logstash server almost same time as the "Read error looking for ack: EOF" at forwarder. This error causes logstash server to crash. A fix would be great, but, at minimum can we handle this error more gracefully ?
Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (ConcurrencyError) Detected invalid array contents due to unsynchronized modifications with concurrent users
at org.jruby.RubyArray.<<(org/jruby/RubyArray.java:1147)
at LogStash::Codecs::Multiline.buffer(file:/opt/logstash/logstash.jar!/logstash/codecs/multiline.rb:159)
at LogStash::Codecs::Multiline.do_previous(file:/opt/logstash/logstash.jar!/logstash/codecs/multiline.rb:180)
at org.jruby.RubyMethod.call(org/jruby/RubyMethod.java:132)
at LogStash::Codecs::Multiline.decode(file:/opt/logstash/logstash.jar!/logstash/codecs/multiline.rb:154)
at RUBY.run(file:/opt/logstash/logstash.jar!/logstash/inputs/lumberjack.rb:46)
at org.jruby.RubyProc.call(org/jruby/RubyProc.java:271)
at Lumberjack::Connection.data(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:241)
at RUBY.run(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:223)
at Lumberjack::Parser.data_field_value(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:182)
at Lumberjack::Parser.feed(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:93)
at Lumberjack::Parser.compressed_payload(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:198)
at Lumberjack::Parser.feed(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:93)
at RUBY.run(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:220)
at RUBY.run(file:/opt/logstash/logstash.jar!/lumberjack/server.rb:59)
rachin-fab
commented
Feb 21, 2014
|
The other thing that we are seeing is the following error. Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (ConcurrencyError) Detected invalid array contents due to unsynchronized modifications with concurrent users |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nicholasf
commented
Feb 26, 2014
|
This needs to be fixed. I've just run into it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
I'll be fixed soon, sorry, as always, for the bugs. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nicholasf
commented
Feb 26, 2014
|
Thanks for the quick response. Looking forward to the fix. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
DanielRedOak
Mar 5, 2014
I added some findings from my setup to this https://logstash.jira.com/browse/LOGSTASH-1598 but didnt know if I should post them here. Might be useful in reproducing the issue.
Basically have a client connected to logstash via logstash-forwarder. Then open a telnet connection to logstash via the logstash-forwarder port, type in some garbage, hit enter. Watch all legitimate connections that are sending events drop "Read error looking for ack: EOF"
DanielRedOak
commented
Mar 5, 2014
|
I added some findings from my setup to this https://logstash.jira.com/browse/LOGSTASH-1598 but didnt know if I should post them here. Might be useful in reproducing the issue. Basically have a client connected to logstash via logstash-forwarder. Then open a telnet connection to logstash via the logstash-forwarder port, type in some garbage, hit enter. Watch all legitimate connections that are sending events drop "Read error looking for ack: EOF" |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nicolas-g
Mar 12, 2014
I've just run into this as well after my SSL certs expired , I have replaced the certs on both the server and the agents with no luck. How can I fix this ? Do I need to uninstall and re-install everything from scratch ?
nicolas-g
commented
Mar 12, 2014
|
I've just run into this as well after my SSL certs expired , I have replaced the certs on both the server and the agents with no luck. How can I fix this ? Do I need to uninstall and re-install everything from scratch ? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
driskell
Mar 16, 2014
Contributor
Hi guys,
I think the cause is the last successful connection is closed if a connection failed to "initialise" - bit of a logic error. This I can reproduce by doing as @DanielRedOak stated, opening a telnet and sending plain text.
However, it only closes the last successful connection - not all of them. Would this sound right? Or is it in fact the case that a single failure causes ALL to fail?
Regardless, I've fixed my particular observation it in my fork and will get a PR set up once @jordansissel and co get some eyes onto the project. Because I've done so many changes I've had to start basing branches on my own changes (its easier for me) so I can't PR at the moment until I know what will be merged and what will not, and where I need to rebase from.
Jason
|
Hi guys, I think the cause is the last successful connection is closed if a connection failed to "initialise" - bit of a logic error. This I can reproduce by doing as @DanielRedOak stated, opening a telnet and sending plain text. However, it only closes the last successful connection - not all of them. Would this sound right? Or is it in fact the case that a single failure causes ALL to fail? Regardless, I've fixed my particular observation it in my fork and will get a PR set up once @jordansissel and co get some eyes onto the project. Because I've done so many changes I've had to start basing branches on my own changes (its easier for me) so I can't PR at the moment until I know what will be merged and what will not, and where I need to rebase from. Jason |
added a commit
to driskell/logstash-forwarder
that referenced
this issue
Mar 16, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
driskell
Mar 19, 2014
Contributor
The changes are now in my repository:
https://github.com/driskell/logstash-forwarder
|
The changes are now in my repository: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jmreicha
May 21, 2014
Has there been any update or fix for this yet? I just ran into the Read error looking for ack: EOF error yesterday.
jmreicha
commented
May 21, 2014
|
Has there been any update or fix for this yet? I just ran into the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mipearson
Aug 18, 2014
Contributor
Confirmed here. Multiple servers with 'stale' bad certificates prevented our server with a good certificate from connecting. Only resolution was to shut down LSF on all servers, restart logstash, then ramp up servers with good certificates one by one.
That's two hours of openssl "what the hell, this certificate is fine" fuckery I won't get back :(
|
Confirmed here. Multiple servers with 'stale' bad certificates prevented our server with a good certificate from connecting. Only resolution was to shut down LSF on all servers, restart logstash, then ramp up servers with good certificates one by one. That's two hours of openssl "what the hell, this certificate is fine" fuckery I won't get back :( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Aug 18, 2014
Contributor
That's two hours of openssl "what the hell, this certificate is fine" fuckery I won't get back :(
For what it's worth, I'm sorry you wasted so much time fighting some bullshit computer problems on what is probably your weekend.
Computers are barely-functional pieces of shit. Software, specifically, is a miracle. I'm surprised my computer boots and gets online. For myself, despite trying to write good software, I often make things that work poorly or generally fuck up someone else's day, it makes me sad. It's times like this I think about changing industries, because I'm not here to make things that ruin other people's days. :(
For what it's worth, I'm sorry you wasted so much time fighting some bullshit computer problems on what is probably your weekend. Computers are barely-functional pieces of shit. Software, specifically, is a miracle. I'm surprised my computer boots and gets online. For myself, despite trying to write good software, I often make things that work poorly or generally fuck up someone else's day, it makes me sad. It's times like this I think about changing industries, because I'm not here to make things that ruin other people's days. :( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mipearson
Aug 18, 2014
Contributor
Australian, remember? It's how I spent the time after my Monday morning coffee :)
All good - was mentioning the lost time in the hopes of highlighting the severity of the issue. High impact combined with very difficult cause analysis.
Regards,
Michael Pearson
(sent from my phone)
On 18 Aug 2014, at 1:25 pm, Jordan Sissel notifications@github.com wrote:
That's two hours of openssl "what the hell, this certificate is fine" fuckery I won't get back :(
For what it's worth, I'm sorry you wasted so much time fighting some bullshit computer problems on what is probably your weekend.
Computers are barely-functional pieces of shit. Software, specifically, is a miracle. I'm surprised my computer boots and gets online. For myself, despite trying to write good software, I often make things that work poorly or generally fuck up someone else's day, it makes me sad. It's times like this I think about changing industries, because I'm not here to make things that ruin other people's days. :(
—
Reply to this email directly or view it on GitHub.
|
Australian, remember? It's how I spent the time after my Monday morning coffee :) All good - was mentioning the lost time in the hopes of highlighting the severity of the issue. High impact combined with very difficult cause analysis. Regards, Michael Pearson
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
driskell
Aug 18, 2014
Contributor
#180 fixes this as well as the too many open files, and adds logging of failing clients (requires logstash plugin changing too though to pass through logger)
I think this specific problem is:
https://github.com/elasticsearch/logstash-forwarder/blob/master/lib/lumberjack/server.rb#L54
If SSL accept fails, client is not changed. It still holds the reference to the last successful accept, and that last connection gets killed.
The duplication is likely the partial ack problem though. If logstash can't keep up working lumberjack network timeout connections begin to reset and resend logs. #180 adds partial ack.
I've stopped maintaining my LSF repo tho (with #180 in it) and maintain Log Courier instead (my LSF rewrite). Just means I'm not actively using #180 anymore myself so needs some testing - but I'm around to help fix bits or explain bits.
Jason
|
#180 fixes this as well as the too many open files, and adds logging of failing clients (requires logstash plugin changing too though to pass through logger) I think this specific problem is: If SSL accept fails, client is not changed. It still holds the reference to the last successful accept, and that last connection gets killed. The duplication is likely the partial ack problem though. If logstash can't keep up working lumberjack network timeout connections begin to reset and resend logs. #180 adds partial ack. I've stopped maintaining my LSF repo tho (with #180 in it) and maintain Log Courier instead (my LSF rewrite). Just means I'm not actively using #180 anymore myself so needs some testing - but I'm around to help fix bits or explain bits. Jason |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Targeting for logstash 1.5 fix |
suyograo
assigned
suyograo and
jordansissel
and unassigned
suyograo
Oct 21, 2014
suyograo
assigned
jsvd
and unassigned
jordansissel
Nov 11, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
maxhyjal
Jan 7, 2015
Is this issue going to be fixed in logstash 1.5 release? Or is log-courier a better solution at this point at least?
maxhyjal
commented
Jan 7, 2015
|
Is this issue going to be fixed in logstash 1.5 release? Or is log-courier a better solution at this point at least? |
jordansissel
assigned
jordansissel
and unassigned
jsvd
Jan 7, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Jan 7, 2015
Contributor
I can't reproduce this on logstash master, but I can reproduce it on logstash 1.5.0 beta1. I'll try to figure out what the problem is.
|
I can't reproduce this on logstash master, but I can reproduce it on logstash 1.5.0 beta1. I'll try to figure out what the problem is. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Jan 7, 2015
Contributor
strangely, I cannot reproduce it from git on v1.5.0.beta1 tag either, but I can reproduce it from the v1.5.0 beta1 release tarball.
|
strangely, I cannot reproduce it from git on v1.5.0.beta1 tag either, but I can reproduce it from the v1.5.0 beta1 release tarball. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Jan 7, 2015
Contributor
On v1.5.0.beta1 I cannot reproduce the "resets all tcp connections" bug. The behavior I observe is that it closes the previously-accepted socket due to the line of code @driskell pointed out (client is the previous connection at exception-time due to the assignment not occurring).
I can fix this "close the previous connection on handshake failure"
I can't yet reproduce the "closes ALL connections" problem this ticket mentions in the subject.
|
On v1.5.0.beta1 I cannot reproduce the "resets all tcp connections" bug. The behavior I observe is that it closes the previously-accepted socket due to the line of code @driskell pointed out ( I can fix this "close the previous connection on handshake failure" I can't yet reproduce the "closes ALL connections" problem this ticket mentions in the subject. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordansissel
Jan 7, 2015
Contributor
Ahh, lsf master has this already fixed, and it works, but the code is not right. I'll patch that.
|
Ahh, lsf master has this already fixed, and it works, but the code is not right. I'll patch that. |
added a commit
that referenced
this issue
Jan 7, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
I believe this is fixed and will appear in logstash 1.5.0 |
nathanlburns commentedFeb 6, 2014
This bug was originally reported in logstash, but it belongs here.
https://logstash.jira.com/browse/LOGSTASH-1598
I've scanned the issues list here (not in depth) and believe there actually may be some other problems reported here that this may explain.
Our setup:
Our issue:
Now I know the root problem here is a configuration issue, but existing sessions should not be closed when a client fails to connect.
I've been looking into the code a bit, currently reading the openssl implementation in ruby right now... At this point not sure how to fix it.