Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client doesn't reconnect with TLS #32

Closed
pires opened this issue Jul 14, 2016 · 7 comments
Closed

Client doesn't reconnect with TLS #32

pires opened this issue Jul 14, 2016 · 7 comments
Assignees

Comments

@pires
Copy link
Contributor

pires commented Jul 14, 2016

My client code:

final ConnectionFactory cf = new ConnectionFactory(natsAddresses);
cf.setMaxReconnect(-1); // reconnect forever

// set callbacks
cf.setClosedCallback(event -> log.warn("NATS connection was closed."));
cf.setDisconnectedCallback(event -> log.warn("NATS disconnected."));
cf.setReconnectedCallback(event -> log.error("NATS reconnected to {}.", event.getConnection().getConnectedUrl()));

// prepare encryption
try {
    final InputStream truststore = resourceLoader.getResource("file:".concat(config.getNats().getTlsTruststore())).getInputStream();
    final KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
    ks.load(truststore, config.getNats().getTlsTruststorePassword().toCharArray());
    final TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
    tmf.init(ks);
    final SSLContext sslContext = SSLContext.getInstance(Constants.DEFAULT_SSL_PROTOCOL);
                sslContext.init(null, tmf.getTrustManagers(), new SecureRandom());
    cf.setSecure(true);
    cf.setSSLContext(sslContext);
} catch (IOException | CertificateException | NoSuchAlgorithmException | KeyStoreException | KeyManagementException e) {
    log.error("There was an error while setting up NATS TLS. Ignoring TLS..", e);
}

cf.createConnection();

Now, the connection works and I have thousands of subscriptions and publishers working for days.

But at a certain moment in time, the client disconnects and never reconnects.
I'm sorry about the log format, but I'm using a Logstash-like encoder for further log processing. Also, order is from newest to oldest:

NATS connection lost.
{"log":"\t... 5 common frames omitted\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.write(OutputRecord.java:417)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)\n"}
{"log":"\tat java.net.SocketOutputStream.write(SocketOutputStream.java:153)\n"}
{"log":"\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)\n"}
{"log":"Caused by: java.net.SocketException: Connection reset\n"}
{"log":"\t... 1 common frames omitted\n"}
{"log":"\tat io.nats.client.ConnectionImpl$8.run(ConnectionImpl.java:1172)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.flusher(ConnectionImpl.java:1720)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:128)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1815)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1870)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1906)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)\n"}
{"log":"\tat sun.security.ssl.Alerts.getSSLException(Alerts.java:208)\n"}
{"log":"Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection reset\n"}
{"log":"\tat io.nats.client.ConnectionImpl$6.run(ConnectionImpl.java:1141)\n"}
{"log":"\tat io.nats.client.ConnectionImpl$7.run(ConnectionImpl.java:1163)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.readLoop(ConnectionImpl.java:1374)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.processOpError(ConnectionImpl.java:672)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541)\n"}
{"log":"javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Connection reset\n"}
{"log":"2016-07-14 02:18:37.906 ERROR 6 --- [ readloop] io.nats.client.ConnectionImpl : I/O error during flush\n"}
{"log":"\tat io.nats.client.ConnectionImpl$6.run(ConnectionImpl.java:1141)\n"}
{"log":"\tat io.nats.client.ConnectionImpl$8.run(ConnectionImpl.java:1172)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.flusher(ConnectionImpl.java:1720)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.write(OutputRecord.java:417)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)\n"}
{"log":"\tat java.net.SocketOutputStream.write(SocketOutputStream.java:153)\n"}
{"log":"\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)\n"}
{"log":"java.net.SocketException: Connection reset\n"}
{"log":"2016-07-14 02:18:37.902 ERROR 6 --- [ flusher] io.nats.client.ConnectionImpl : I/O eception encountered during flush\n"}

On the server side, again from newest to oldest:

[10] 2016/07/14 02:18:37.933466 [DBG] 10.4.1.5:51340 - cid:8939 - Client connection closed
[10] 2016/07/14 02:18:37.933425 [DBG] 10.4.1.5:51340 - cid:8939 - TLS handshake error: remote error: unexpected message
[10] 2016/07/14 02:18:37.921470 [DBG] 10.4.1.5:51340 - cid:8939 - Starting TLS client connection handshake
[10] 2016/07/14 02:18:37.921413 [DBG] 10.4.1.5:51340 - cid:8939 - Client connection created
[10] 2016/07/14 02:18:34.536575 [DBG] 10.4.1.5:52502 - cid:8931 - Client connection closed
[10] 2016/07/14 02:18:34.536499 [DBG] 10.4.0.5:57733 - cid:8937 - Error flushing: write tcp 10.4.5.5:4222->10.4.1.5:52502: i/o timeout

I wonder why TLS handhake fails. It seems flushing is happening before TLS handshake is done.

@mcqueary
Copy link

We didn't have a test for this. Baaaad developer (hint: me). Thanks for catching this!

@mcqueary mcqueary changed the title Client doesnt' reconnect with TLS Client doesn't reconnect with TLS Jul 19, 2016
@pires
Copy link
Contributor Author

pires commented Jul 22, 2016

@mcqueary do you agree this is critical? I mean, the workaround is to not use TLS on NATS 👎 .

@mcqueary
Copy link

I do indeed. Looking at code today. If you're on slack I'd like to chat
realtime a bit about this and then post an update here.

On Friday, July 22, 2016, Paulo Pires notifications@github.com wrote:

@mcqueary https://github.com/mcqueary do you agree this is critical? I
mean, the workaround is to not use TLS on NATS 👎 .


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGq4LDp4e32WL4QzwEK-2UuQzpsg0AYpks5qYQbHgaJpZM4JMPkA
.

Larry McQueary | Director, Messaging Technology
larry@apcera.com | @mcqueary | github.com/mcqueary

@pires
Copy link
Contributor Author

pires commented Aug 4, 2016

hi @mcqueary any updates? Branch tls-reconnect-fixes-#32 has nothing I can look at yet.

@ColinSullivan1
Copy link
Member

@mcqueary - Would you like another set of eyes on this?

@pires
Copy link
Contributor Author

pires commented Aug 4, 2016

I'm willing to help as well.

@pires
Copy link
Contributor Author

pires commented Aug 9, 2016

FYI, tried with gnatsd 0.9.2 and same problems exists. However, there's a new debug message in the server that may help here.

Steps to reproduce:

  • Start 1 gnatsd instance
  • Start app instance (one subscription alone, nothing else)
[12] 2016/08/09 11:24:26.632958 [DBG] 10.0.4.3:49909 - cid:3 - Client connection created
[12] 2016/08/09 11:24:26.633011 [DBG] 10.0.4.3:49909 - cid:3 - Starting TLS client connection handshake
[12] 2016/08/09 11:24:26.705967 [DBG] 10.0.4.3:49909 - cid:3 - TLS handshake complete
[12] 2016/08/09 11:24:26.705985 [DBG] 10.0.4.3:49909 - cid:3 - TLS version 1.2, cipher suite TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
[12] 2016/08/09 11:24:26.708807 [TRC] 10.0.4.3:49909 - cid:3 - ->> [CONNECT {"verbose":false,"pedantic":false,"user”:"xyz","pass”:"XYZ","ssl_required":true,"name":"","lang":"java","version":"0.4.1”}]
[12] 2016/08/09 11:24:26.745878 [TRC] 10.0.4.3:49909 - cid:3 - ->> [PING]
[12] 2016/08/09 11:24:26.745891 [TRC] 10.0.4.3:49909 - cid:3 - <<- [PONG]
[12] 2016/08/09 11:24:46.314559 [TRC] 10.0.4.3:49909 - cid:3 - ->> [SUB device.*.connection group_name 1]
  • Stop gnatsd instance - disconnection cb kicks-in and app logs that connection is lost
  • Start 1 gnatsd instance
  • Logs in the new instance show one client is trying to connect but
[12] 2016/08/09 11:27:30.298416 [DBG] 10.0.4.3:57019 - cid:4 - Client connection created
[12] 2016/08/09 11:27:30.298475 [DBG] 10.0.4.3:57019 - cid:4 - Starting TLS client connection handshake
[12] 2016/08/09 11:27:30.303429 [DBG] 10.0.4.3:57019 - cid:4 - TLS handshake error: remote error: unexpected message
[12] 2016/08/09 11:27:30.303483 [TRC] 10.0.4.3:57019 - cid:4 - <<- [-ERR Secure Connection - TLS Required]
[12] 2016/08/09 11:27:30.303505 [DBG] 10.0.4.3:57019 - cid:4 - Client connection closed
[12] 2016/08/09 11:27:32.299104 [DBG] 10.0.4.3:57021 - cid:5 - Client connection created
[12] 2016/08/09 11:27:32.299165 [DBG] 10.0.4.3:57021 - cid:5 - Starting TLS client connection handshake
[12] 2016/08/09 11:27:32.305694 [DBG] 10.0.4.3:57021 - cid:5 - TLS handshake error: remote error: unexpected message
[12] 2016/08/09 11:27:32.305753 [TRC] 10.0.4.3:57021 - cid:5 - <<- [-ERR Secure Connection - TLS Required]
[12] 2016/08/09 11:27:32.305784 [DBG] 10.0.4.3:57021 - cid:5 - Client connection closed
….

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants