Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and bypass the dead hosts. How to do it quickly? #49

Closed
hellovic opened this issue Mar 5, 2014 · 1 comment
Closed

Detect and bypass the dead hosts. How to do it quickly? #49

hellovic opened this issue Mar 5, 2014 · 1 comment

Comments

@hellovic
Copy link

hellovic commented Mar 5, 2014

Hi Zachary and other guys,

Say I have a number of hosts and some are dead, if I understand correctly, it takes 60 seconds to confirm the host is dead. In some cases, it's just too long and I am looking for a quicker way to test host and skip it if it is dead and go for another one.

Should I change the configuration (sorry I can't find option in http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_configuration.html) to lower the timeout? or there are other approaches?

Thanks for reading. :)

@polyfractal
Copy link
Contributor

It depends slightly on the connection pool being used. For the default StaticNoPingConnectionPool:

  • Attempt to execute the requested API against a random node
  • If the request fails, the node is marked dead and removed from the rotation for 60 seconds
  • A new node is selected in a round-robin fashion and the API is tried again

Let's assume the API succeeded on that second node, and 30s later another request is tried against the same dead node (because it's turn has come up in the round-robin). The dead node will be skipped entirely because it has been less than 60s. It will just get the next alive node in the round-robin.

Now, let's say that 60s later another request goes to the "dead" node.

  • The ping timeout is over, so the node is tried again. If the request succeeds, the node is marked as alive and its timeout counter is cleared
  • If the node fails again, it's timeout counter is increased and marked dead again. The timeout increases exponentially up to 3600s, so this time the node will be out of rotation for longer than 60s.

Make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants