New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running dynamic/elastic NATS cluster behind a load balancer not possible #952
Comments
That's right, the simplest way to workaround this issue would be to point the configuration to a static list of the A records generated from a cluster {
listen: 0.0.0.0:6222
no_advertise: true
routes = [
nats://nats-1.nats:6222
nats://nats-2.nats:6222
nats://nats-3.nats:6222
]
} Since it is a static list, then the nodes will always attempt to reconnect to the peers so that there are no partitions (using the k8s service as the seed for discovery has a few limitations as well for clustering since nodes could discover a different set of peers and full mesh won't be formed and there is also a limited number of connection attempts against discovered peers). Then, if you are using a kubectl exec -it nats-1 gnatsd -sl=reload
kubectl exec -it nats-2 gnatsd -sl=reload
kubectl exec -it nats-3 gnatsd -sl=reload Your clients within K8S on the other hand, would do fine to connect against the K8S headless service to discover any node from the NATS cluster, on reconnect to the same K8S service name they would likely discover another node too: nats.Connect("nats") These are some of the practices adopted that you can find in the nats-operator if you want to give it a try as well. It includes uses a reloader sidecar to detect the change on dynamic changes for example. Also, one of the reasons why NATS can't be used behind a load balancer is because there might be issues when starting TLS since the first INFO protocol message is done in plain text and after that the |
@wallyqs Thanks for response. Yeah, we'll try to do a workaround like you describe. Unfortunately, we'd prefer to roll out own Kubernetes logic instead of using an operator. Less magic for us - we've learnt a lot this past day! ;) Nomenclature below: I'll write "clients" when I'm talking about non-brokers. I'll write "brokers" when I'm referring to I thought a bit more about this; What is the actual reason for FYI, the |
Thanks @JensRantil , yes looks like originally |
Not super important, but if going with the third solution proposed in #952 (comment) I also propose that
gets moved from |
This commit came out of [1] and [2]. Clarifies current behaviour much better. [1] nats-io#950 (comment) [2] nats-io#952
@JensRantil by the way, have you considered using a HostPort instead of a NodePort for the external access to your clients? That might be better if your Kubelet nodes have an external ip address that clients that can then connect against without having to use a load balancer. This the approach we take in NGS for example:
|
I'll discuss it with our local K8s wizards. 😄 I'm still learning, but I suspect we don't want to do that since our K8s is very elastic with nodes coming and going using auto scaling. We are running in cloud and on-prem and have found load balancers are a solid common denominator between them. |
I am trying to understand the extent of the issue. I don't think it is a good idea to create routes with The reason There are Now the question is about that Regardless of the I am not sure if that helps the discussion or makes things more confusing.. |
@kozlovic ↑ remember that is how it worked originally but it seems that this has changed after the v1.0.6 release, now when |
Yes they are, just ran a test. The opts.Cluster.NoAdvertise is just used to know if we update/send the connectURLs back to clients. If you could point me to the code that shows that cluster's own advertise address is not used because NoAdvertise is set to true, I can have a look. |
Thanks for confirming @kozlovic ... Sorry I got confused with peeking at some of the INFO messages but yes looks like the nats routes are being advertised as you mentioned when setting
|
Closing this one, suggestion for k8s is to use a headless service with A records set for each pod, and setting at least one stable name as the seed server to discover the rest (with a statefulset for example):
|
Sorry for getting back a little late here: I'm not entirely sure I agree with closing this for the following reasons:
Just to make the decision to close this transparent, are we fine with not solving 1) and 2)? |
Hi @JensRantil, when enabling no advertise it only disables advertisements for the clients and not for the servers so it is orthogonal, and also using a seed server that ought to be available could be a way to discover the rest in a way that does not causes partitions in the cluster. Disabling |
Defects
Versions of
gnatsd
and affected client libraries used:Golang client: https://github.com/nats-io/go-nats/tree/v1.7.2
OS/Container environment:
Linux/Docker/Kubernetes.
Steps or code to reproduce the issue:
So my scenario is that I'm setting up NATS in Kubernetes and I will have clients accessing the cluster through a NodePort Service. I'm configuring it in a clustered fashion:
Deployment
(stateless, startup order isn't important).Service
callednats
which routes to my threegnatsd
instances.gnatsd
instance is running using./gnatsd -c nats.conf
whereGiven I'm setting
no_advertise: true
, my clients outside of K8s are not adding the Kubernetes-internal IP addresses to their local list of servers. This is great and expected.Expected result:
With the above setting I was expecting my NATS cluster to fully form and not have a network partition.
Actual result:
The problem is that my NATS cluster isn't forming properly and occasionally (depending on the order of connecting to each other) has a network partition. This has to do with the fact that
no_advertise: true
disables advertisement for both cluster and clients.Potential solutions
I see two different solutions here:
Differentiate between
cluster_no_advertise
andclient_no_advertise
In a way this would be natural since there is already the
cluster_advertise
andclient_advertise
differentiation. For config,cluster { no_advertise: true }
could be kept for backwards compatability which would configure both*_no_advertise
s, introducing two new settings.Minor: This would possibly also clarify the confusion I had here.
Support disabling picking up advertisement in client libraries
This is our preferred solution as it would allow our NATS clients within K8s to discover new brokers (more resilient, allowing us to autoscale etc.) while at the same time supporting K8s-external clients. This proposal is fully backwards compatible but at the cost of supporting this logic in clients.
From an implementation standpoint I don't think the client patches would be very large:
if (!options.ignoreServerAdvertisement && info != null && info.getConnectURLs() != null) {
and implementing another option to theOptions.Builder
.if rc.Opts.options.ignoreServerAdvertisement || len(ncInfo.ConnectURLs) == 0 {
and adding an option toOptions
struct.Workarounds
What I think I could do is that I could migrate to run NATS in a
StatefulSet
instead of aDeployment
. That way I could hardcodecluster.routes
to every pod (since StatefulSet pods gets unique DNS record).The text was updated successfully, but these errors were encountered: