Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running dynamic/elastic NATS cluster behind a load balancer not possible #952

Closed
1 of 2 tasks
JensRantil opened this issue Apr 16, 2019 · 13 comments
Closed
1 of 2 tasks
Assignees

Comments

@JensRantil
Copy link

JensRantil commented Apr 16, 2019

  • Defect
  • Feature Request or Change Proposal

Defects

Versions of gnatsd and affected client libraries used:

$ gnatsd --version
nats-server version 1.4.1

Golang client: https://github.com/nats-io/go-nats/tree/v1.7.2

OS/Container environment:

Linux/Docker/Kubernetes.

Steps or code to reproduce the issue:

So my scenario is that I'm setting up NATS in Kubernetes and I will have clients accessing the cluster through a NodePort Service. I'm configuring it in a clustered fashion:

  • All brokers are started in a K8s Deployment (stateless, startup order isn't important).
  • I have a K8s Service called nats which routes to my three gnatsd instances.
  • Each gnatsd instance is running using ./gnatsd -c nats.conf where
$ cat nats.conf
listen: 0.0.0.0:4222
http: localhost:8222 # HTTP monitoring port

cluster {
  listen: 0.0.0.0:6222

  no_advertise: true

  # Routes are actively solicited and connected to from this server.
  # Other servers can connect to us if they supply the correct credentials
  # in their routes definitions from above.
  routes = [
    nats://nats:6222
  ]
}

Given I'm setting no_advertise: true, my clients outside of K8s are not adding the Kubernetes-internal IP addresses to their local list of servers. This is great and expected.

Expected result:

With the above setting I was expecting my NATS cluster to fully form and not have a network partition.

Actual result:

The problem is that my NATS cluster isn't forming properly and occasionally (depending on the order of connecting to each other) has a network partition. This has to do with the fact that no_advertise: true disables advertisement for both cluster and clients.

Potential solutions

I see two different solutions here:

Differentiate between cluster_no_advertise and client_no_advertise

In a way this would be natural since there is already the cluster_advertise and client_advertise differentiation. For config, cluster { no_advertise: true } could be kept for backwards compatability which would configure both *_no_advertises, introducing two new settings.

Minor: This would possibly also clarify the confusion I had here.

Support disabling picking up advertisement in client libraries

This is our preferred solution as it would allow our NATS clients within K8s to discover new brokers (more resilient, allowing us to autoscale etc.) while at the same time supporting K8s-external clients. This proposal is fully backwards compatible but at the cost of supporting this logic in clients.

From an implementation standpoint I don't think the client patches would be very large:

Workarounds

What I think I could do is that I could migrate to run NATS in a StatefulSet instead of a Deployment. That way I could hardcode cluster.routes to every pod (since StatefulSet pods gets unique DNS record).

@wallyqs
Copy link
Member

wallyqs commented Apr 16, 2019

That's right, the simplest way to workaround this issue would be to point the configuration to a static list of the A records generated from a StatefulSet. That is if you have a cluster of size 3 then you
would add them explicitly to the configuration:

cluster {
  listen: 0.0.0.0:6222

  no_advertise: true

  routes = [
    nats://nats-1.nats:6222
    nats://nats-2.nats:6222
    nats://nats-3.nats:6222
  ]
}

Since it is a static list, then the nodes will always attempt to reconnect to the peers so that there are no partitions (using the k8s service as the seed for discovery has a few limitations as well for clustering since nodes could discover a different set of peers and full mesh won't be formed and there is also a limited number of connection attempts against discovered peers).

Then, if you are using a configmap or secret for the config and for some reason you want to change the size of the cluster, then you would issue the update to the configuration and exec a reload signal so that the servers pickup the change, though think this would be very rare:

kubectl exec -it nats-1 gnatsd -sl=reload
kubectl exec -it nats-2 gnatsd -sl=reload
kubectl exec -it nats-3 gnatsd -sl=reload

Your clients within K8S on the other hand, would do fine to connect against the K8S headless service to discover any node from the NATS cluster, on reconnect to the same K8S service name they would likely discover another node too:

nats.Connect("nats")

These are some of the practices adopted that you can find in the nats-operator if you want to give it a try as well. It includes uses a reloader sidecar to detect the change on dynamic changes for example.

Also, one of the reasons why NATS can't be used behind a load balancer is because there might be issues when starting TLS since the first INFO protocol message is done in plain text and after that the
upgrade happens: #291

@JensRantil
Copy link
Author

@wallyqs Thanks for response. Yeah, we'll try to do a workaround like you describe. Unfortunately, we'd prefer to roll out own Kubernetes logic instead of using an operator. Less magic for us - we've learnt a lot this past day! ;)

Nomenclature below: I'll write "clients" when I'm talking about non-brokers. I'll write "brokers" when I'm referring to gnatsd processes.

I thought a bit more about this; What is the actual reason for --no_advertise disabling announcement between brokers? For clients I think the reason is clear - having a TCP load balancer in front where clients can't reach actual brokers - but isn't the whole idea with a NATS cluster that all broker instances should be able to talk to every other brokers in the cluster? I remember reading somewhere messages only make at-most one hop. If so, the if-conditional in route.go could actually be considered a bug. If considered a bug, a third solution would be to actual not have --no_advertise impact cluster discovery, but only clients. What do you think about this?

FYI, the if-conditional was introduced in 1acf330#diff-58b635029911951f32d8110242d486adR663 but I don't think the commit message tells me exactly why it disabled it there.

@wallyqs
Copy link
Member

wallyqs commented Apr 16, 2019

Thanks @JensRantil , yes looks like originally no_advertise would still advertise to cluster members at least but after release v1.0.6 last year that changed /cc @kozlovic

@JensRantil
Copy link
Author

Not super important, but if going with the third solution proposed in #952 (comment) I also propose that --no_advertise <bool> in

$ gnatsd
...
Server Options:
...
Cluster Options:
...
        --no_advertise <bool>        Advertise known cluster IPs to clients
...

gets moved from Cluster Options: to Server Options:.

JensRantil added a commit to JensRantil/nats-server that referenced this issue Apr 16, 2019
This commit came out of [1] and [2]. Clarifies current behaviour much
better.

[1] nats-io#950 (comment)
[2] nats-io#952
@wallyqs
Copy link
Member

wallyqs commented Apr 17, 2019

@JensRantil by the way, have you considered using a HostPort instead of a NodePort for the external access to your clients? That might be better if your Kubelet nodes have an external ip address that clients that can then connect against without having to use a load balancer. This the approach we take in NGS for example:

$ host connect.ngs.global
connect.ngs.global is an alias for asiaeast1.gcp.ngs.global.
asiaeast1.gcp.ngs.global has address 35.194.179.179
asiaeast1.gcp.ngs.global has address 35.221.223.59
asiaeast1.gcp.ngs.global has address 35.236.185.106

$ telnet connect.ngs.global 4222
Trying 35.236.185.106...
Connected to asiaeast1.gcp.ngs.global.
Escape character is '^]'.
INFO {"server_id":"NC3RIHRBBNSWXAFFHSC35ECHMXMG35TOIMKQKD4DVSB2TOCYXC2WDCP6","version":"2.0.0-RC5","proto":1,"git_commit":"8362bda","go":"go1.11.5","host":"35.236.185.106","port":4222,"auth_required":true,"tls_required":true,"max_payload":1048576,"client_id":294,"nonce":"JM6G5AySaJY71g4","cluster":"gcp-asiaeast1","connect_urls":["35.236.185.106:4222","35.194.179.179:4222","35.221.223.59:4222"]} 

@JensRantil
Copy link
Author

have you considered using a HostPort instead a NodePort for the external access to your clients?

I'll discuss it with our local K8s wizards. 😄 I'm still learning, but I suspect we don't want to do that since our K8s is very elastic with nodes coming and going using auto scaling. We are running in cloud and on-prem and have found load balancers are a solid common denominator between them.

@kozlovic
Copy link
Member

I am trying to understand the extent of the issue. I don't think it is a good idea to create routes with nats://nats:6222 if nats can resolve to any pod in the cluster. The reason is that otherwise you may have a situation of a node tries to connect to itself only (if it happens that nats resolves to its own pod). Usually, you would have routes use the internal IPs of the pods.

The reason no_advertise is in the cluster section is simply because it does not make sense without the presence of a cluster. That is, say that you have a single NATS Server (no cluster{} definition). Your clients have no choice but to know the URL of the server to connect to. There is no reason to advertise client URLs since there is only one, the one the single server is listening to for client connections.

There are advertise settings now for each of the following listen specs in the server: client, cluster, leafnode and gateways. This advertise string host:port allows the user to specify which address should be used when communicating between servers/clients.

Now the question is about that no_advertise flag applying to just clients or routes. The idea was that if one can't or does not want the servers in the cluster to advertise their client URLs, they would set this flag to false. Clients can then connect (or reconnect) only to the ones that are provided by the user in the Connect() options.

Regardless of the no_advertise value, routes would still advertise the configured advertise address in cluster{} if one is set. That is, which host:port the server wants other servers to attempt to connect to. But there is no need for a "no advertise" for cluster's URLs (and leafnode, and gateways), simply don't set an advertise for those.
(note in the new LeafNode options, I see a NoAdvertise that was added - copied from cluster options. I think that should be removed).

I am not sure if that helps the discussion or makes things more confusing..

@wallyqs
Copy link
Member

wallyqs commented Apr 17, 2019

Regardless of the no_advertise value, routes would still advertise the configured advertise address in cluster{} if one is set. That is, which host:port the server wants other servers to attempt to connect to.

@kozlovic ↑ remember that is how it worked originally but it seems that this has changed after the v1.0.6 release, now when no_advertise is set cluster routes are not advertised either.

@kozlovic
Copy link
Member

Yes they are, just ran a test. The opts.Cluster.NoAdvertise is just used to know if we update/send the connectURLs back to clients. If you could point me to the code that shows that cluster's own advertise address is not used because NoAdvertise is set to true, I can have a look.

@wallyqs
Copy link
Member

wallyqs commented Apr 17, 2019

Thanks for confirming @kozlovic ... Sorry I got confused with peeking at some of the INFO messages but yes looks like the nats routes are being advertised as you mentioned when setting no_advertise:

$ seq 22 24 | parallel -j 3 -u 'nats-server-v2 -DV -p 42{} -m 82{} --cluster nats://0.0.0.0:62{} --routes nats://127.0.0.1:6222 --no_advertise'

$ sudo tcpdump -nn -A -i any port 6222 or port 6223 or port 6224 2>&1 | grep -o INFO.*
...
INFO {"server_id":"NBXHDHEEL3MLBFJKA5K3LOQAKFWJJZGXOWCEQJ7YTF5FQXGFJ6J5W2NO","version":"2.0.0-RC5",...,"ip":"nats-route://127.0.0.1:6224/","nonce":"igfsWGN0yIWgoWA"}
INFO {"server_id":"NBXHDHEEL3MLBFJKA5K3LOQAKFWJJZGXOWCEQJ7YTF5FQXGFJ6J5W2NO","version":"2.0.0-RC5",...,"nonce":"SZDtUZjIJ8oUO8s"} 
INFO {"server_id":"NBXHDHEEL3MLBFJKA5K3LOQAKFWJJZGXOWCEQJ7YTF5FQXGFJ6J5W2NO","version":"2.0.0-RC5",...,"nonce":"SZDtUZjIJ8oUO8s"} 
INFO {"server_id":"NC2TFNAHYEQKYK2LLEVGHRAFMIVKCBCWUFVOGT7K7U6EJDVZCQYUMHKH","version":"2.0.0-RC5",...,"nonce":"PcdgShGxkLFw5Cg"} 
INFO {"server_id":"NC2TFNAHYEQKYK2LLEVGHRAFMIVKCBCWUFVOGT7K7U6EJDVZCQYUMHKH","version":"2.0.0-RC5",...,"nonce":"PcdgShGxkLFw5Cg"} 
INFO {"server_id":"NC2TFNAHYEQKYK2LLEVGHRAFMIVKCBCWUFVOGT7K7U6EJDVZCQYUMHKH","version":"2.0.0-RC5",...,"ip":"nats-route://127.0.0.1:6223/","nonce":"PcdgShGxkLFw5Cg"}
INFO {"server_id":"NC2TFNAHYEQKYK2LLEVGHRAFMIVKCBCWUFVOGT7K7U6EJDVZCQYUMHKH","version":"2.0.0-RC5",...,"ip":"nats-route://127.0.0.1:6223/","nonce":"PcdgShGxkLFw5Cg"}
INFO {"server_id":"NBXHDHEEL3MLBFJKA5K3LOQAKFWJJZGXOWCEQJ7YTF5FQXGFJ6J5W2NO","version":"2.0.0-RC5",...,"ip":"nats-route://127.0.0.1:6224/","nonce":"SZDtUZjIJ8oUO8s"}
INFO {"server_id":"NBXHDHEEL3MLBFJKA5K3LOQAKFWJJZGXOWCEQJ7YTF5FQXGFJ6J5W2NO","version":"2.0.0-RC5",...,"ip":"nats-route://127.0.0.1:6224/","nonce":"SZDtUZjIJ8oUO8s"}

@wallyqs
Copy link
Member

wallyqs commented Apr 28, 2019

Closing this one, suggestion for k8s is to use a headless service with A records set for each pod, and setting at least one stable name as the seed server to discover the rest (with a statefulset for example):

cluster {
  listen: 0.0.0.0:6222

  no_advertise: true

  routes = [
    nats://nats-1.nats:6222
  ]
}

@wallyqs wallyqs closed this as completed Apr 28, 2019
@JensRantil
Copy link
Author

Sorry for getting back a little late here: I'm not entirely sure I agree with closing this for the following reasons:

  1. Advertisement to clients is orthogonal to advertisement to servers. I don't think it's uncommon that servers are in the same subnet but reached through a load balancer.
  2. Setting up a fixed DNS name is "stable pod" sort of goes against Cloud Nativeness:
  • It imposes managing pets instead of cattle.
  • It reduces resiliency by allowing a single server to break discovery of entire cluster.

Just to make the decision to close this transparent, are we fine with not solving 1) and 2)?

@wallyqs
Copy link
Member

wallyqs commented May 7, 2019

Hi @JensRantil, when enabling no advertise it only disables advertisements for the clients and not for the servers so it is orthogonal, and also using a seed server that ought to be available could be a way to discover the rest in a way that does not causes partitions in the cluster.

Disabling no_advertise is not the reason why the cluster gets partitions when setting nats://nats:6222 as the seed route, the problem lies in that the nodes in the cluster may not discover the same set of members when they attempt to connect to nats://nats:6222. For Kubernetes when trying to use Deployment objects to create NATS clusters backed by a Service without predictable A records, one workaround for this can be to detect whether the server has been partitioned from the cluster due to nats://nats:6222 not resolving to a member of the cluster that could act as a seed (for example, the first instance that got started in the deployment) by checking the /routez and ensure that the number of routes is the same as the number of instances by a liveness check, otherwise have the server exit and have the instance restart until it connects to a seed node that has all the routes. You can find some of those ideas implemented in this repo at this commit: https://github.com/pires/kubernetes-nats-cluster/tree/9be4a440e90a82b9abc610d86966dc7e59787be6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants