Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services deregistering - shouldn't the consul agent service api be used rather than the catalog's? #40

Closed
HristoMohamed opened this issue Nov 29, 2018 · 8 comments

Comments

@HristoMohamed
Copy link

Hello,

I have consul running on my kubernetes nodes and I got the consul-k8s sync so work, no problems there. (Well there are, as it uses the POD IP and not the NODE IP /which also makes no sense to me, as it supports only loadbalancer/nodeport, but I see there is issue opened for this/.
What am I seeing is that on each of my kubernetes nodes, after certain amount of time, the consul agent auto deregisters the services, registered by consul-sync daemon. I experience this issue both running consul as a pod on each kubernetes server and as running it as external to the kubernetes client.
I see that the Catalog registration is used in the from-k8s source code. This triggers the Anti-Entropy feature of consul agent for services registered this way.
Even in the documentation of the API documentation it says

"It is usually preferable to instead use the agent endpoints for registration as they are simpler and perform anti-entropy."

Also this here from the documentation

If any services or checks exist in the catalog that the agent is not aware of, they will be automatically removed to make the catalog reflect the proper set of services and health information for that agent. Consul treats the state of the agent as authoritative; if there are any differences between the agent and catalog view, the agent-local view will always be used.

I see that this can be disabled using enable_tag_override, but did not have the time today to compile the syncing daemon with this option to test it. Also this would imho be a dirty workaround.

Am I doing/understanding something wrong here and this is desired behavior? Because the constant registration/deregistration of services triggers multiple consul-template events.

Shouldn't the consul-sync daemon be run as daemonset on each node and contact and update the client api of that kubernetes node? Or something similar...

@HristoMohamed HristoMohamed changed the title Services deregistering - shouldn't the agent api be used rather than the catalog's? Services deregistering - shouldn't the consul agent service api be used rather than the catalog's? Nov 29, 2018
@descrepes
Copy link

Hi,

This bug is already known and you can find the workaround here: #33
Hope it helps !

Regards.

@HristoMohamed
Copy link
Author

Hey thanks a lot @descrepes ! I will give it a try.
I still believe that the whole philosophy of using the Catalog API vs the Agent API is contradicting with the rest of the Consul documentation!

@HristoMohamed
Copy link
Author

This does not work for me.
I get the services to register, but after some point something kicks them out. I guess there is a timeout and if the health check does not succeed - mark as left. Ah...
consul: member 'kube01' left, deregistering

@HristoMohamed
Copy link
Author

Right, I assume this is because of the fact that the services with NodePort register with POD IP and not the NODE IP and since my consul server is external and my POD IPs are not routable, hence not accessible.

@dedene
Copy link

dedene commented Jan 7, 2019

We're running into the same issue: nodes deregistring on every agent every minute (clearly the anti-entropy at work). I deployed a version of the syncing deamon setting ServiceEnableTagOverride true for all services coming from k8s, but this didn't help. Also the workaround from #33 is not really helping as this builds up nodes in state 'left' over time.

Any other ideas on what might help?

@HristoMohamed
Copy link
Author

What I did to temporary workaround until this is addressed: (I use Flannel FYI, so no idea how this goes for Calico,Cannal and others):
Since my consul masters had a lot of resource padding, I just joined them in the Kubernetes cluster, /A bit overhead for having the three daemons running, but oh well/ and assigned a no schedule policy.
This way, the consul servers have access to the pods networking.
So far so good (month and a half down the road).

@banks
Copy link
Member

banks commented Jan 16, 2019

@HristoMohamed this doesn't quite contradict the "agents are the source of truth" point of view. I'll explain why.

In normal Consul that is true. But where you are registering external services that are not directly on Consul nodes but from some external system, the Catalog endpoing must be used since there is no local agent to that service.

In the case of Kube, the "services" we sync from Kube may well be on the same actual compute nodes as an agent, but since they didn't register directly with the agent we treat them as "external" to Consul and centrally register them.

It's not even a case of preference - it's not really possible to register then via the agent API since we'd have to do that on the actual agent they share a host with and in general we don't know what that is unless we make a lot of assumptions about the mapping of pod data from Kube to Consul agent names/machine host names etc. Even if we could figure that out, it's an unfortunate state that client agents even expose their API to the network currently and one we are trying to fix - in the ideal case the catalog sync process wouldn't even have access to the local agent on each node's API to register.

I'd argue it's actually semantically correct this way even if we could figure that all out because kube not the agent is the source of truth for these registrations.

All that said, the current AE thing does make that problematic - I've not looked in detail but I suspect there are a few ways we could fix it by making things more explicit - either we don't couple kube services to actual Consul nodes but rather virtual ones just like other external services where there is no agent, or we introduce some way to mark the services as registered out-of-band.

@BarthV
Copy link

BarthV commented Jan 23, 2019

Hi @banks !

We desperately need to synchronize pods IP running hostNetwork: true together while the kubelet host is running a consul agent with the same name & IP :'/ ...

By chance, is there any quick & dirty walkaround in consul-k8s to make both components cohabit ? (we'd be glad to maintain a branch/fork for it).

Maybe it is not too hard to override the name used for the registration (using a convention and/or annotation) and then avoid using the host hostname which is shared with the pods when we're using hostNetwork to avoid conflicts ?
Or maybe, we can think about disabling anti-entropy routine in consul agent for such cases ?

Thanks a lot

EDIT : hashicorp/consul#4207 there is an (not that) old closed issue asking for it in consul :-)

ndhanushkodi pushed a commit to ndhanushkodi/consul-k8s that referenced this issue Jul 9, 2021
Add command continue indications where needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants