New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent issues with Consul API discovery #301

Closed
kumasath opened this Issue Oct 1, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@kumasath

kumasath commented Oct 1, 2018

We observed few things about the way consul discovery behaves -

We are using version 1.5.18 of consul client. Here are the details -

  1. We observed that in the class ConsulClientImpl under method lookupHealthService() – there is a wait time specified as 10 minutes and a health check-up request waits at consul server for 10 minutes, this happens a few times and we get the response back after 10 minutes.

String path = "/v1/health/service/" + serviceName + "?passing&wait=600s&index=" + lastConsulIndex;

If we reduce the wait time to, say 5 seconds (wait=5s), we get response from consul in 5 seconds(again intermittent)

https://github.com/networknt/light-4j/blob/1.5.18/consul/src/main/java/com/networknt/consul/client/ConsulClientImpl.java

  1. Few times we get the response back immediately irrespective of the wait time specified in the URL.

Could you shed some light on our findings? Also, we have few questions around the implementation –

  1. What is the purpose of having wait interval of 10 minutes, and subsequently holding a request for 10 minutes(in some cases). We observed we also have lookupInterval specified at consul client end.
  2. Which parameter controls the wait at consul server, based on which the server responds immediately or after the wait timeout?
@stevehu

This comment has been minimized.

Contributor

stevehu commented Oct 2, 2018

@kumasath you have asked a very interesting question and it put you into the light-4j super user group :)

I have added a section in the https://doc.networknt.com/concern/consul/ but not published yet.

Consul Blocking Queries

For service discovery, we are using Consul Blocking Queries (Long polling). It basically sends a request to the Consul server and tells Consul don't do anything within 10 minutes if the subscribed service instances are not changed. If the subscribed service instances are changed, it will return the result immediately with the changes. After 10 minutes, the request will timeout and a new request will be issued. In this way, we can maintain a list of healthy service instances all the time at the client side. If it is necessary to create a new connection, there is no need to go to Consul for a discovery as the local cache is the latest. This design is the fast way to let the client be notified if any service instance is gracefully shutdown or crashed.

@stevehu

This comment has been minimized.

Contributor

stevehu commented Oct 3, 2018

These two issues are for the same problem.
#303

@stevehu stevehu closed this Oct 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment