Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services de-registering randomly #69

Closed
tsmgeek opened this issue Feb 13, 2019 · 6 comments
Closed

services de-registering randomly #69

tsmgeek opened this issue Feb 13, 2019 · 6 comments
Labels
type/bug Something isn't working

Comments

@tsmgeek
Copy link

tsmgeek commented Feb 13, 2019

Seems this #59 is still happening on latest release, just not as bad.

Ive got a few services that just no longer show up, not sure why.

@adilyse
Copy link
Contributor

adilyse commented Feb 13, 2019

Hi @tsmgeek,

Thanks for filing this, we definitely want to get this right! Could you provide some additional information about what you're seeing and when you're seeing it?

Specifically, it would be good to know how the deregistration is manifesting-- are you noticing it through api calls to Consul, through logs on one of the pods (which one?), or through the UI?

Also, when does this happen? Does it usually happen right on startup and then stabilize? Or is it okay for for a certain amount of time (how long?) before you start seeing this? Or does it happen at regular or irregular intervals all the time?

Are all types of services affected, or do you only see it with ClusterIP/NodePort/LoadBalancer services?

Is there anything else about your setup that you think might be contributing?

This additional info should help us find a repro case which with help speed up the process. Thanks!

@josephschadlick
Copy link

josephschadlick commented Feb 22, 2019

I would like to add some datapoints from my own experience here:

I was experiencing a similar issue with the following environment:
Consul server/client version: 1.1.0
consul-k8s version: 0.5.0
kubernetes version: 1.11 (1.11.5-rancher1)
consul-helm version: 0.6.0

Nodeport services would drop out of the catalog and then be re-registered at what seemed the rate of consul sync period. I noticed this via the UI and by the fact that the service was no longer accessible via dns lookups. There were no obvious logs in the consul-k8s pod nor in the consul agents / servers. Deployment and test was very similar to that defined in #60, except this was with the most recent versions of consul-k8s and consul-helm.
image

However - after upgrading the kubernetes cluster to 1.12 (1.12.5-rancher1) and re-deploying everything else with the exact same configs I have not experienced this issue with a one cluster setup.

This may be a separate issue, but once I added a consul-k8s deployment to another cluster registering to the same consul server, they began to compete with each other and deregister the others' services:

2019-02-22T18:18:34.416Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-a4419b4c2ac5
2019-02-22T18:18:34.435Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-b372d17aed63
2019-02-22T18:18:34.456Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs
2019-02-22T18:18:49.448Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs
2019-02-22T18:19:04.456Z [INFO ] to-consul/sink: registering services
2019-02-22T18:19:04.456Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-b372d17aed63
2019-02-22T18:19:04.465Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-a4419b4c2ac5
2019-02-22T18:19:04.484Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs
2019-02-22T18:21:09.259Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-2-0fa2996bd4a8
2019-02-22T18:21:12.091Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs-2
2019-02-22T18:21:39.292Z [INFO ] to-consul/sink: registering services
2019-02-22T18:21:49.629Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs-2
2019-02-22T18:22:04.641Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs-2
2019-02-22T18:22:09.317Z [INFO ] to-consul/sink: registering services
2019-02-22T18:22:09.318Z [INFO ] to-consul/sink: deregistering service: node-name=k8s-sync service-id=nginx-test-cs-2-0fa2996bd4a8
2019-02-22T18:22:12.141Z [INFO ] to-consul/sink: invalid service found, scheduling for delete: service-name=nginx-test-cs-2

This might be remedied by parametrizing the synthetic k8s-sync node per k8s cluster. Thoughts?

@adilyse
Copy link
Contributor

adilyse commented Jun 27, 2019

@josephschadlick Thank you so much for the detailed info, this was super useful!

I've tracked down the culprit to the multi-cluster deregistration-- the sync process is determining which services to pay attention to based on the k8sTag that is defined in the syncCatalog section of the Helm chart. If two clusters have the same value (i.e. the default k8s), they will each think they own everything, deregistering anything that's not locally available.

It's possible to work around this by setting a different k8sTag value for each cluster.

@josephschadlick
Copy link

Differentiating clusters via k8sTag works great. Thanks for your help, @adilyse. I would consider this issue closable if @tsmgeek does.

@lkysow
Copy link
Member

lkysow commented Aug 14, 2019

If @tsmgeek wants we can re-open but I'll close for now.

@lkysow lkysow closed this as completed Aug 14, 2019
@lkysow lkysow added the type/question Question about product, ideally should be pointed to discuss.hashicorp.com label Aug 14, 2019
@saiso12
Copy link

saiso12 commented Jul 22, 2020

@adilyse I am encountering the same issue, where I am using the default k8sTag for multiple clusters. I see services are getting registered and reregistered every 30 seconds. Trying to understand who is initiating registration and who is initiating de-registration. Thanks

@lkysow lkysow added type/bug Something isn't working and removed type/question Question about product, ideally should be pointed to discuss.hashicorp.com labels Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants