Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about consul #1916

Closed
hehailong5 opened this issue Apr 5, 2016 · 8 comments
Closed

some questions about consul #1916

hehailong5 opened this issue Apr 5, 2016 · 8 comments

Comments

@hehailong5
Copy link

I know this place is only for real issues, but I don't have access to google servers in my country (thus not able to use IRC and mail list), i have to leave my questions here. thanks in advance for the reply.

  1. Regarding service registration, as the consul client is the only contact point, I understand more than one node should be configured to run as consul client to avoid the single point of failure situation.
    in this case (having multiple consul clients), how do I configure my consul registration tool? usually it requires to specify only one endpoint. what if this endpoint is down? will it automatically forward the registration request to an alive consul client in the cluster?
  2. If I register a HTTP health check in the consul, is it the consul server or the consul client that will do the actual checking?
  3. is there a watchdog implemented in consul cluster? say initially I have 3 consul servers running in the cluster, if one of them goes down and then goes online, will it be automatically re-joined in the cluster?
@slackpad
Copy link
Contributor

slackpad commented Apr 5, 2016

Hi @hehailong5 please see below:

Regarding service registration, as the consul client is the only contact point, I understand more than one node should be configured to run as consul client to avoid the single point of failure situation.
in this case (having multiple consul clients), how do I configure my consul registration tool? usually it requires to specify only one endpoint. what if this endpoint is down? will it automatically forward the registration request to an alive consul client in the cluster?

Normally you'll run a separate set of Consul servers (usually 3 or 5) and then run the Consul client agent on every other machine in your infrastructure. Applications always talk to the local Consul client agent, which will automatically forward requests to and keep track of the Consul servers. The Consul servers provide stable storage for the catalog, key/value store, and provide coordination for things like locks and semaphores. The Consul client agents on each other machine manage registering local services on that machine, and provide interfaces to Consul for applications on that machine (HTTP or DNS). There's usually no need to have redundancy at the Consul agent level on each machine, as that's part of the same failure domain as the machine itself.

If I register a HTTP health check in the consul, is it the consul server or the consul client that will do the actual checking?

The Consul client agent will run the health check locally and update the Consul servers with the results.

is there a watchdog implemented in consul cluster? say initially I have 3 consul servers running in the cluster, if one of them goes down and then goes online, will it be automatically re-joined in the cluster?

Consul has a built in node health check called serfHealth that acts as a watchdog to make sure a node is alive and responding to network probes. This will automatically mark the node failed if it goes down, and will help the server rejoin the cluster if it comes back online.

Hope that helps! I'll close this out but feel free to re-open if you need any more clarifications.

@slackpad slackpad closed this as completed Apr 5, 2016
@hehailong5
Copy link
Author

Hi,

Regarding #1, do you mean if the Consul client crash or the network breaks down between, the application cannot use the consul anymore util the Consul client comes back online? I thought there is a mechanism for Consul clients to prevent single point failure.. in my case, the application connects to the Consul client remotely via HTTP 8500.

@slackpad
Copy link
Contributor

slackpad commented Apr 5, 2016

There's no mechanism on the client side to prevent a single point of failure because the machine it's running on is also a single point of failure. Consul is highly available on the server side with >= 3 servers, so if one of them dies or has its network disrupted then another server will take up leadership and the cluster can continue to operate. It's not recommended to connect to the consul client remotely as then you'll need to manage a load balancer and the list of healthy clients to connect to. If you run the Consul agent on each machine then it manages all this for you.

@hehailong5
Copy link
Author

Thanks for the clarification! that makes clear. thank you!

@hehailong5
Copy link
Author

Hi,
one more question about the watchdog in consul:
if any node fails in the cluster, will serfHealth help push it online?

@slackpad
Copy link
Contributor

The serfHealth check is added to every node and is used to reflect the cluster's low-level health status for that node (basically whether it responded to network probes). This is logically AND-ed with the service checks on that node as well, so if serfHealth fails, instances of the services on that node won't be returned over DNS, for example. Once the node recovers and starts responding to probes, the serfHealth which will become passing on its own. Hope that helps!

@thiagobustamante
Copy link

Thanks for the answers. Very clarifying

@vtahlani
Copy link

Hi @slackpad,
If serfHealth fails for node(N1) for 1min and there are 1000 services registered on node(N1) that means I wouldn't be getting any service for 1min, right? If yes, is it not a bad situation as many of my requests would be failing.
If I use catalog end point to get the services and if few services out of 1000 are down then I may end-up sending requests to down services?
Can you please suggest, how can I get services which are healthy even if their node is down/failing?
Can we not replicate health checks to all server nodes, so if a node is down we would still be able to get healthy services?
Can we not have a http-api/endpoint to get healthy services? Current API "/health/service/:service" returns healthy nodes which are having that service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants