Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reconnect to Nats cluster #195

Closed
fajran opened this issue Nov 25, 2018 · 8 comments
Closed

Cannot reconnect to Nats cluster #195

fajran opened this issue Nov 25, 2018 · 8 comments

Comments

@fajran
Copy link

fajran commented Nov 25, 2018

I have a Nats cluster setup. When the Nats server that is being used by the java app is terminated, the app tries to reconnect but it throws the following exception.

java.io.IOException: java.net.URISyntaxException: Illegal character in scheme name at index 0: 172.18.0.101:4222
	at io.nats.client.impl.SocketDataPort.connect(SocketDataPort.java:60)
	at io.nats.client.impl.NatsConnection.tryToConnect(NatsConnection.java:299)
	at io.nats.client.impl.NatsConnection.reconnect(NatsConnection.java:225)
	at io.nats.client.impl.NatsConnection.closeSocket(NatsConnection.java:471)
	at io.nats.client.impl.NatsConnection.lambda$handleCommunicationIssue$2(NatsConnection.java:428)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: 172.18.0.101:4222
	at java.base/java.net.URI$Parser.fail(URI.java:2915)
	at java.base/java.net.URI$Parser.checkChars(URI.java:3086)
	at java.base/java.net.URI$Parser.checkChar(URI.java:3096)
	at java.base/java.net.URI$Parser.parse(URI.java:3111)
	at java.base/java.net.URI.<init>(URI.java:600)
	at io.nats.client.impl.SocketDataPort.connect(SocketDataPort.java:47)
	... 5 more

From what I can see, Nats publishes all addresses of Nats servers in the cluster in the following format ip-address:port without the scheme. Check the connect_urls values below.

{"server_id":"iHAhz9B35WdjjgYytJ35ze","version":"1.3.0","proto":1,"git_commit":"eed4fbc","go":"go1.11","host":"0.0.0.0","port":4222,"max_payload":1048576,"client_id":8,"connect_urls":["172.18.0.100:4222","172.18.0.101:4222","172.18.0.102:4222"]}

When the Nats client tries to reconnect, it will use one of those addresses and create a java.net.URI instance from it. Looks like the URI class does not accept a ip-address:port format. Running the code below, will throws the same exception.

URI url = new URI("172.18.0.1:4222");

However, if the nats scheme is added, the URI will work fine.

URI url = new URI("nats://172.18.0.1:4222");

Should the nats:// scheme be always used when building the server list in NatsConnection#getServers()?

Actually, it is not possible to a Nats server by addressing it without the scheme because looks like java.net.URI requires a scheme. None of the following code works.

Nats.connect("localhost:4222");
Nats.connect("127.0.0.1:4222");
@kozlovic
Copy link
Member

@sasbury This is correct analysis. The server sends connect URLs of all servers in the cluster in the form IP:port, without a scheme (or username and password for that matter). The client library is responsible for adding it.

@fajran
Copy link
Author

fajran commented Nov 25, 2018

I made a quick fix to add nats:// scheme here fajran@ab24c47

@sasbury
Copy link
Contributor

sasbury commented Nov 25, 2018 via email

@kozlovic
Copy link
Member

kozlovic commented Nov 25, 2018

@fajran Adding "nats://" unconditionally may not be the right thing to do. Not sure about the Java library, but some libraries have distinct behavior if the scheme is set to "tls" when TLS is wanted.

@fajran
Copy link
Author

fajran commented Nov 25, 2018

@kozlovic you are right. I was not considering that or even username/password.

@kozlovic
Copy link
Member

@fajran No worries. Your analysis of the issue was spot on. Thanks!

@fajran
Copy link
Author

fajran commented Nov 25, 2018

I think my easiest option right now is to use all individual Nats server that I have in the cluster and set the noRandomize option to true so the client will try to use the configured server list first. I currently use a DNS round robin to all the instances.

looking forward for the proper fix! thanks!

@sasbury
Copy link
Contributor

sasbury commented Nov 27, 2018

ok, it turns out blindly adding nats:// is not necessarily bad, because the place that was failing doesn't care about the protocol. BUT, i did a new fix that is a bit more careful and checks the URI in all the places we use it. I am re-running tests and will commit momentarily into v2.4.0 branch. Hoping to release that branch this week, but if you get a chance to try before I release it would be awesome just in case the tests miss something.

Oh, i also fixed the unit test that was for servers in info to remove the nats:// to test this.

@sasbury sasbury mentioned this issue Dec 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants