Failed TLS handshake #91

lmorfitt · 2013-09-19T17:41:55Z

Hello,

I suspect this is something dumb but I cant for the life of me figure out why lumberjack has started to throw these errors.

2013/09/19 12:36:37.093618 Failed to tls handshake with 192.168.51.61:5043 read tcp 192.168.51.61:5043: i/o timeout
2013/09/19 12:36:38.093839 Connecting to 192.168.51.61:5043
2013/09/19 12:36:53.094597 Failed to tls handshake with 192.168.51.61:5043 read tcp 192.168.51.61:5043: i/o timeout
2013/09/19 12:36:54.094838 Connecting to 192.168.51.61:5043

We have been using it for a while with no issues, then elastic search ran out of disk space. Cleaned up the disk and now we are seeing these errors.

Restarted logstash, elastic search and the indexer and lumberjack with no update.

thanks
Luke

lmorfitt · 2013-09-19T17:47:38Z

Never mind, think this was a data issue, we had some date mismatch issues.

thanks
Luke

Moqume · 2013-11-08T21:53:31Z

FWIW, I'm getting this for random log shippers as well. If I restart Logstash (with a Lumberjack input), they'll be able to re-connect again.

jesusaurus · 2013-11-13T01:53:09Z

I'm also seeing lumberjack throw these error intermittently. When this happens, sometimes (but not always) I see these errors in the logstash log once every couple seconds:

{:timestamp=>"2013-11-12T19:10:54.440000+0000", :message=>"Input thread exception", :plugin=><LogStash::Inputs::Lumberjack ssl_certificate=>"/etc/ssl/certs/logstash.crt", ssl_key=>"/etc/ssl/private/logstash.key", type=>"msgaas", charset=>"UTF-8", host=>"0.0.0.0">, :exception=>#<SocketError: problem when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "jar:file:/mnt/logstash/logstash.jar!/META-INF/jruby.home/lib/ruby/1.9/openssl/ssl-internal.rb:160:in `accept'", "file:/mnt/logstash/logstash.jar!/lumberjack/server.rb:49:in `run'", "file:/mnt/logstash/logstash.jar!/logstash/inputs/lumberjack.rb:42:in `run'", "file:/mnt/logstash/logstash.jar!/logstash/agent.rb:761:in `run_input'", "file:/mnt/logstash/logstash.jar!/logstash/agent.rb:407:in `start_input'"], :level=>:warn}
{:timestamp=>"2013-11-12T19:10:54.460000+0000", :message=>"Restarting input due to exception", :plugin=><LogStash::Inputs::Lumberjack ssl_certificate=>"/etc/ssl/certs/logstash.crt", ssl_key=>"/etc/ssl/private/logstash.key", type=>"msgaas", charset=>"UTF-8", host=>"0.0.0.0">, :level=>:error}

But these errors don't always occur -- and when they do, the lumberjack errors continue long after these errors stop. After restarting logstash, the lumberjack agents will reconnect... for a while.

I'm running lumberjack 0.3.1 and logstash 1.1.12

phildougherty · 2013-11-21T05:03:37Z

I'm also running into this issue. lumberjack 0.3.1 and logstash 1.2.2.

lmorfitt · 2013-12-09T23:52:51Z

Hi,
@phildougherty, @jesusaurus

This is normally a time /date related issue. ntpdate normally fixes it for me. It seems that the TLS handshake is very sensitive to time changes.

2013/12/09 17:47:54.260621 Read error looking for ack: read tcp 192.168.51.61:5043: i/o timeout
2013/12/09 17:47:54.260768 Loading client ssl certificate: /etc/ssl/lumberjack.crt and /etc/ssl/lumberjack.key
2013/12/09 17:47:54.435974 Setting trusted CA from file: /etc/ssl/lumberjack.pub
2013/12/09 17:47:54.436143 Connecting to 192.168.51.61:5043
2013/12/09 17:47:54.490049 Connected to 192.168.51.61:5043
2013/12/09 17:47:54.510958 Registrar received 1000 events
2013/12/09 17:47:54.511119 Saving registrar state.
2013/12/09 17:48:09.510983 Read error looking for ack: read tcp 192.168.51.61:5043: i/o timeout
2013/12/09 17:48:09.511099 Loading client ssl certificate: /etc/ssl/lumberjack.crt and /etc/ssl/lumberjack.key
2013/12/09 17:48:09.686265 Setting trusted CA from file: /etc/ssl/lumberjack.pub
2013/12/09 17:48:09.686432 Connecting to 192.168.51.61:5043
2013/12/09 17:48:09.740112 Connected to 192.168.51.61:5043

service ntp stop
ntpdate NTP SERVER IP
service ntp start

2013/12/09 17:48:37.053861 Read error looking for ack: read tcp 192.168.51.61:5043: i/o timeout
2013/12/09 17:48:37.053980 Loading client ssl certificate: /etc/ssl/lumberjack.crt and /etc/ssl/lumberjack.key
2013/12/09 17:48:37.233354 Setting trusted CA from file: /etc/ssl/lumberjack.pub
2013/12/09 17:48:37.233532 Connecting to 192.168.51.61:5043
2013/12/09 17:48:37.287108 Connected to 192.168.51.61:5043
2013/12/09 17:48:37.441958 Registrar received 1000 events
2013/12/09 17:48:37.442119 Saving registrar state.
2013/12/09 17:48:49.850381 Registrar received 1000 events
2013/12/09 17:48:49.852707 Saving registrar state.
2013/12/09 17:48:57.735521 Registrar received 1000 events
2013/12/09 17:48:57.735709 Saving registrar state.
2013/12/09 17:49:05.115405 Registrar received 1000 events

phildougherty · 2013-12-17T05:24:27Z

Has anyone managed to find a workaround for this issue? It seems to strike randomly, and fiddling about with ntp is not resolving the problem for me when it happens.

2013/12/17 05:23:43.492445 Read error looking for ack: read tcp x.x.x.x:xxxx: i/o timeout
2013/12/17 05:23:43.492531 Loading client ssl certificate: /opt/logstash-forwarder/ssl/logstash-forwarder.crt and /opt/logstash-forwarder/ssl/logstash-forwarder.key
2013/12/17 05:23:43.704927 Setting trusted CA from file: /opt/logstash-forwarder/ssl/ca.crt
2013/12/17 05:23:43.705219 Connecting to x.x.x.x:xxxx (x.x.x.x)
2013/12/17 05:23:43.764138 Connected to x.x.x.x
2013/12/17 05:23:58.764314 Read error looking for ack: read tcp x.x.x.x:xxxx: i/o timeout
2013/12/17 05:23:58.764395 Loading client ssl certificate: /opt/logstash-forwarder/ssl/logstash-forwarder.crt and /opt/logstash-forwarder/ssl/logstash-forwarder.key
2013/12/17 05:23:58.965437 Setting trusted CA from file: /opt/logstash-forwarder/ssl/ca.crt
2013/12/17 05:23:58.965735 Connecting to x.x.x.x:xxxx (x.x.x.x)
2013/12/17 05:23:59.024270 Connected to x.x.x.x

Moqume · 2013-12-17T14:51:28Z

Unfortunately, for me the "fix" is to restart Logstash with a cron job every so often. The logs get queued with Lumberjack anyway, so no logs get lost. I haven't had the time to do any further investigating on my end.

choffee · 2014-04-07T18:00:40Z

I am getting loads of these from lots of servers. Lots of logs not getting through.
ntp is up and running and it seems to be hit and miss about if restarting logstash helps or not.

I don't think is time related as the error is coming from the lumberjack code not the ssl connections as far as I can tell.

Best "fix" I have so far is changing the end of the server.rb file in the library that logstash is using so that it sends an ack for every line. ( I think that is what is doing ) but even then still getting loads of timeouts.

def data(sequence, map, &block)
  puts sequence
  puts map
  block.call(map)
  #if (sequence - @last_ack) >= @window_size
    @fd.syswrite(["1A", sequence].pack("A*N"))
    @last_ack = sequence
  #end
end

driskell · 2014-04-07T18:06:18Z

Hi @choffee

Is your pipeline in logstash very busy? Might be the cause.

I implemented partial ACK and separated communication from pipeline in #180 to fix this. You could try that? Requires the server.rb copying into logstash and also the forwarder patching.

I have that and other major statefile/prospector issues fixed in my repo too at:
https://github.com/driskell/logstash-forwarder

Jason

choffee · 2014-04-08T10:48:13Z

Hi @driskell

Thanks. I am trying that out now and it seems to be working great.

I don't think I am sending a massive amount of logs through ( /me notes, should graph that ) but when the main logstash process has been down a while I think there is a bit of flood which seemed to break things.

Is there any chance of this making it into master soon? It would be great to have this working. While we wait for zeromq and have an option for those who still want encryption in the future.

john

driskell · 2014-04-08T18:00:29Z

Good to hear @choffee

Unfortunately I'm not project maintainer so wouldn't have a say. I hope it gets in. They might not like the new data frame version I added though - but that's the only way to fix what was a broken implementation.

mrteera · 2015-01-14T06:16:22Z

I've this issue too, I still can't figure out how to fix it.

Jan 14 06:14:41 localhost :41Z - logstash-forwarder[14906]: 2015/01/14 06:14:41.392072 Failed to tls handshake with - read tcp -:5000: i/o timeout
Jan 14 06:14:42 localhost :42Z - logstash-forwarder[14906]: 2015/01/14 06:14:42.392444 Connecting to -:5000 (-)

jordansissel · 2015-01-14T06:25:42Z

@driskell partial acks don't solve stalls.

jordansissel · 2015-01-14T06:27:54Z

I'm not sure what this ticket is about.

The original filing was about TLS handshake timeouts. It was later reopened by @YummyLogs reporting a different problem (read ack timeout).

@mrteera Your problem reports read tcp -:5000 and - is not a valid hostname, so I don't know what's going on with your system.

Can we close this and open new tickets focused on each kind of error?

driskell · 2015-01-14T13:28:45Z

Partial acks solve read ack timeouts and solve event duplication (forwarder resends ALL logs on timeout). If logstash does not push all 1024 spool events (if it's a full spoil) through its pipeline within network timeout of 15 seconds... It will always timeout and resend all 1024. I've seen a logstash continually duplicate logs for hours just because a 100mb log file was added and sent all at once. Brokers alleviate this but if broker gets full too...

I admit I've confused the read ack timeout and TLS handshake timeout in other places though. TLS handshake timeout is unrelated to ack.

I did just recall a stall issue same as @myatu in log courier. I'll raise a ticket for you to look into. It causes failed handshake after a time but restarting logstash fixes it.

karthik87 · 2015-01-29T10:09:00Z

@jordansissel

I have a strange issue where in my logstash forwarder (as a docker) connects to the logstash server( as a docker too) only when I reboot my instance running the forwarder. Both are running on rackspace cloud.

When I run the docker first time, i just get the following,
Failure connecting to x.x.x.x dial tcp x.x.x.x:5043: i/o timeout

Then when I issue reboot and again run the same docker its connected and can ship logs smoothly.
But I can restart my production servers this easily. So I am stuck here.

Is it anything related to firewall or iptables?

juise · 2015-02-13T13:24:27Z

Please, please, please fix it. The problem about 2 years. It's disgusting!

driskell · 2015-02-13T14:31:16Z

@juise Such comments are unhelpful and unwelcome

juise · 2015-02-13T17:57:42Z

Just for note, the follow problem:

... Read error looking for ack: read tcp x.x.x.x:xxxx: i/o timeout
... Loading client ssl certificate: /opt/logstash-forwarder/ssl/logstash-forwarder.crt and /opt/logstash-forwarder/ssl/logstash-forwarder.key
... Setting trusted CA from file: /opt/logstash-forwarder/ssl/ca.crt
... Connecting to x.x.x.x:xxxx (x.x.x.x)
... Connected to x.x.x.x

may appear when you are using a Ruby module, by example without try catch blocks and exception occure. In log doesn't appear any message about it. It's very - very bad.

jordansissel · 2015-02-13T17:59:47Z

... Read error looking for ack: read tcp x.x.x.x:xxxx: i/o timeout

This occurs when downstream server doesn't acknowledge within the defined timeout period.

... Loading client ssl certificate: /opt/logstash-forwarder/ssl/logstash-forwarder.crt and /opt/logstash-forwarder/ssl/logstash-forwarder.key

This is not an error.

... Setting trusted CA from file: /opt/logstash-forwarder/ssl/ca.crt

This is not an error

... Connecting to x.x.x.x:xxxx (x.x.x.x)

This is not an error

... Connected to x.x.x.x

This is not an error.

may appear when you are using a Ruby module

I'm not sure what this means.

juise · 2015-02-13T18:23:08Z

If not an error, why the only one solution to make it work after "not at error" is restart. I can reproduce it without any problems.

It can be problem not on the logstash-forwarder side, but in logstash die.

jordansissel · 2015-02-13T18:38:57Z

is restart

A restart of what? logstash or logstash-forwarder?

juise · 2015-02-13T18:41:09Z

The both, restart logstash and logstash-forwarder. The order doesn't matter.

jordansissel · 2015-02-13T18:45:04Z

The problem usually occurs when logstash is overwhelmed by the load you are giving it with logstash-forwarder, causing the forwarder to give up (i/o timeout) waiting for a response and having it try to connect to another logstash if that can help.

In almost all circumstances, the problem is a due to flow problems in logstash or something downstream of logstash itself (for example, if an output's dependency is down or offline or unavailable).

This ticket is about "Failed TLS handshake" doesn't seem related to the 'Read error looking for ack: read tcp x.x.x.x:xxxx: i/o timeout' message you report.

sledorze · 2015-04-01T10:51:08Z

Got this issue without load... (1 req / 10 mins)

cameronevans · 2015-04-01T12:19:28Z

@sledorze could it possibly be issues with logstash itself? I have had the issue without load myself and it is usually a result of logstash's state.

sledorze · 2015-04-01T12:20:45Z

maybe yes, and what was the fix? (it could be useful for logstash-forwarder
users to know about it)

On Wed, Apr 1, 2015 at 2:19 PM, cameronevans notifications@github.com
wrote:

@sledorze https://github.com/sledorze could it possibly be issues with
logstash itself? I have had the issue without load myself and it is usually
a result of logstash's state.

—
Reply to this email directly or view it on GitHub
#91 (comment)
.

Stéphane Le Dorze

http://lambdabrella.blogspot.com/
http://www.linkedin.com/in/stephaneledorze
http://twitter.com/stephaneledorze
https://twitter.com/#!/stephaneledorze/status/74020060125077504

Tel: +33 (0) 6 08 76 70 15

cameronevans · 2015-04-01T12:28:36Z

It was a number of reasons, usually plugin errors (in my case tcp or elasticsearch) that would cause logstash to halt but have the service just spin it up again. Also, if the forwarder was inhaling a file with a great number of lines it would sometimes fill up the queue and throw this error even though there were not many events at the time. The --debug option is useful for these cases though it is a lot of information to pore through.

e8-preet · 2015-05-28T17:41:03Z

Thanks @choffee modifying the server.rb fixed my issue :)

alexanderilyin · 2015-06-03T18:47:30Z

Faced with this two times in last month. Only restart of logstash helps.

    2015/06/03 11:02:04.669083 Failed to tls handshake with X.X.X.X read tcp X.X.X.X:5000: connection reset by peer
    2015/06/03 11:02:05.669712 Connecting to [X.X.X.X]:5000 (example.com) 
    2015/06/03 11:02:21.670019 Failed to tls handshake with X.X.X.X read tcp X.X.X.X:5000: i/o timeout
    2015/06/03 11:02:22.670621 Connecting to [X.X.X.X]:5000 (example.com)

juise · 2015-06-03T20:26:47Z

Try to modify server.rb as @choffee suggested. This works in all versions, include the latest 1.5. Looks like the elastic folks this problem doesn't interesting

alexanderilyin · 2015-06-23T02:19:31Z

@juise still facing sometimes with same problem and have to restart logstash to fix it.

jordansissel · 2015-06-23T02:34:05Z

We are working on solutions that include better messaging on downstream
failures, and other things.

On Wednesday, June 3, 2015, Alexander Petrovsky notifications@github.com
wrote:

Try to modify server.rb as @choffee https://github.com/choffee
suggested. This works in all versions, include the latest 1.5. Looks like
the elastic folks this problem doesn't interesting

—
Reply to this email directly or view it on GitHub
#91 (comment)
.

jordansissel · 2015-07-31T16:56:25Z

There's so much discussion on unrelated problems in this ticket, I'm going to close this because I cannot follow it easily. If you are still affected by somethingin this ticket, please open a new ticket and provide whatever details you can. Thanks!

lmorfitt closed this as completed Sep 19, 2013

lmorfitt reopened this Dec 9, 2013

driskell mentioned this issue Jan 14, 2015

Failed to TLS handshake occurs after arbitrary time, restarting logstash fixes #346

Closed

ssoroka mentioned this issue Mar 17, 2015

Server connection accepting is not performant, drops connections. #429

Closed

knalli mentioned this issue Apr 15, 2015

Read error looking ack & i/o timeout (since 0.4) #438

Closed

jordansissel closed this as completed Jul 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed TLS handshake #91

Failed TLS handshake #91

lmorfitt commented Sep 19, 2013

lmorfitt commented Sep 19, 2013

Moqume commented Nov 8, 2013

jesusaurus commented Nov 13, 2013

phildougherty commented Nov 21, 2013

lmorfitt commented Dec 9, 2013

phildougherty commented Dec 17, 2013

Moqume commented Dec 17, 2013

choffee commented Apr 7, 2014

driskell commented Apr 7, 2014

choffee commented Apr 8, 2014

driskell commented Apr 8, 2014

mrteera commented Jan 14, 2015

jordansissel commented Jan 14, 2015

jordansissel commented Jan 14, 2015

driskell commented Jan 14, 2015

karthik87 commented Jan 29, 2015

juise commented Feb 13, 2015

driskell commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

sledorze commented Apr 1, 2015

cameronevans commented Apr 1, 2015

sledorze commented Apr 1, 2015

cameronevans commented Apr 1, 2015

e8-preet commented May 28, 2015

alexanderilyin commented Jun 3, 2015

juise commented Jun 3, 2015

alexanderilyin commented Jun 23, 2015

jordansissel commented Jun 23, 2015

jordansissel commented Jul 31, 2015

Failed TLS handshake #91

Failed TLS handshake #91

Comments

lmorfitt commented Sep 19, 2013

lmorfitt commented Sep 19, 2013

Moqume commented Nov 8, 2013

jesusaurus commented Nov 13, 2013

phildougherty commented Nov 21, 2013

lmorfitt commented Dec 9, 2013

phildougherty commented Dec 17, 2013

Moqume commented Dec 17, 2013

choffee commented Apr 7, 2014

driskell commented Apr 7, 2014

choffee commented Apr 8, 2014

driskell commented Apr 8, 2014

mrteera commented Jan 14, 2015

jordansissel commented Jan 14, 2015

jordansissel commented Jan 14, 2015

driskell commented Jan 14, 2015

karthik87 commented Jan 29, 2015

juise commented Feb 13, 2015

driskell commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

juise commented Feb 13, 2015

jordansissel commented Feb 13, 2015

sledorze commented Apr 1, 2015

cameronevans commented Apr 1, 2015

sledorze commented Apr 1, 2015

cameronevans commented Apr 1, 2015

e8-preet commented May 28, 2015

alexanderilyin commented Jun 3, 2015

juise commented Jun 3, 2015

alexanderilyin commented Jun 23, 2015

jordansissel commented Jun 23, 2015

jordansissel commented Jul 31, 2015