Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the node retry mechanism which could fail without trying all the connected nodes #6829

Closed

Conversation

Projects
None yet
6 participants
@javanna
Copy link
Member

commented Jul 11, 2014

The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed are not completed yet.

The TransportService already notifies the listener of any exception received from a separate thread through the request holder, no need to notify the retry listener again in any other place (either catch or onFailure method itself).

@javanna javanna added bug labels Jul 11, 2014

@bleskes

View changes

src/main/java/org/elasticsearch/client/transport/TransportClientNodesService.java Outdated
if (e.unwrapCause() instanceof ConnectTransportException) {
retryListener.onFailure(e);
} else {
if (!(e.unwrapCause() instanceof ConnectTransportException)) {

This comment has been minimized.

Copy link
@bleskes

bleskes Jul 11, 2014

Member

Add a comment about why we ignore this?

@bleskes

View changes

src/main/java/org/elasticsearch/client/transport/TransportClientNodesService.java Outdated
onFailure(e1);
//no need to retry here, the transport service will notify this same listener
//of the failure through the request holder, which will retry
//ConnectTransportException gets thrown as well by TransportService due to throwConnectException option

This comment has been minimized.

Copy link
@bleskes

bleskes Jul 11, 2014

Member

I think the comment can be clear, maybe:

"ConnectTransportException gets thrown as well by TransportService due to throwConnectException option, which is needed for the correct operation of execute(...). We can ignore it here because it will be passed through the listener interface"

@bleskes

View changes

src/test/java/org/elasticsearch/client/transport/TransportClientRetryTests.java Outdated
import static org.hamcrest.Matchers.greaterThanOrEqualTo;

@ClusterScope(scope = TEST, numClientNodes = 0)
public class TransportClientRetryTests extends ElasticsearchIntegrationTest {

This comment has been minimized.

Copy link
@bleskes

bleskes Jul 11, 2014

Member

I'd love to have a unit test of TransportClientNodesService, with a custom tranport service implementation that both throw error and checks that all nodes were tried...

@bleskes

This comment has been minimized.

Copy link
Member

commented Jul 11, 2014

The fix look good to me (left some comments regarding comments :) ). I'd love to see a unit test as opposed to an integration test. I think we'd get much more out of it.

@s1monw s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014

@s1monw

View changes

src/test/java/org/elasticsearch/test/InternalTestCluster.java Outdated
@@ -143,7 +143,7 @@

static final boolean DEFAULT_ENABLE_RANDOM_BENCH_NODES = true;

private static final String NODE_MODE = nodeMode();
public static final String NODE_MODE = nodeMode();

This comment has been minimized.

Copy link
@s1monw

s1monw Jul 17, 2014

Contributor

can we make nodeMode() public instead?

@s1monw s1monw removed the review label Jul 17, 2014

javanna added some commits Jul 11, 2014

Transport Client: fixed the node retry mechanism which could fail wit…
…hout trying all the connected nodes

The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed are not completed yet.

The TransportService notifies of any exception received from a separate thread through the request holder, no need to notify the retry listener in any other place (either catch or onFailure method itself).

Closes #6829
added notify onFailure to original listener in case of non connect ex…
…ception, which might get thrown before the transport service send request call

only ConnectTransportException gets thrown by transport service (throwConnectException option), if another exception gets thrown it comes from something that happened before, we need to notify the original listener and stop the retries.
beefed up TransportClientRetryTests, make sure that we use both execu…
…te variants: with or without listener.

The variant without listener throws exception on the calling thread (throwConnectException option).
added check for transport client connected nodes and make check more …
…accurate in terms of number of expected nodes
catch only ConnectTransportException, don't even unwrap since the Tra…
…nsportService won't do it either

do notify the listener in case of throwable, cause they can't come from the transport service and the listener needs to be notified of those, otherwise it bubbles up
@javanna

This comment has been minimized.

Copy link
Member Author

commented Jul 25, 2014

Pushed new commits to address comments and a unit test for it as suggested by @bleskes . I also changed a bit how we catch exceptions given how they get thrown by the TransportService. Ready for reviews!

@javanna javanna added the review label Jul 25, 2014

added unit tests that simulate transport failures, one for TransportC…
…lientNodesService and one a bit higher level that tests InternalTransportClient

@javanna javanna self-assigned this Jul 28, 2014

javanna added some commits Jul 28, 2014

removed throwConnectException option from TransportService, simplifie…
…d exception handling in transport client retry mechanism
import java.util.concurrent.CopyOnWriteArraySet;
import java.util.concurrent.atomic.AtomicInteger;

public abstract class FailAndRetryMockTransport<Response extends TransportResponse> implements Transport {

This comment has been minimized.

Copy link
@kimchy

kimchy Jul 28, 2014

Member

can we move this as an internal class to the test that use it? also, throw illegal state exception on implemented methods that should not be called?

This comment has been minimized.

Copy link
@javanna

javanna Jul 28, 2014

Author Member

good point, it is shared between two different tests, one more low level (TransportClientNodesServiceTests) and one that tests the blocking version as well and involves the TransportActionNodeProxy too (InternalTransportClientTests). I'd make it package private if that's ok and move both tests to the same org.elasticsearch.client.transport package?

@kimchy

This comment has been minimized.

Copy link
Member

commented Jul 28, 2014

LGTM, very clean now indeed!

@javanna javanna closed this in fcf4d5a Jul 28, 2014

javanna added a commit that referenced this pull request Jul 28, 2014

Transport Client: fixed the node retry mechanism which could fail wit…
…hout trying all the connected nodes

The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed were not completed yet.

The TransportService used to throw ConnectTransportException due to throwConnectException set to true, and also notify the listener of any exception received from a separate thread through the request holder.

Simplified exception handling by just removing the throwConnectException option from the TransportService, used only in the transport client. The transport client now relies solely on the request holder to notify of failures and eventually retry.

Closes #6829
@javanna

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2014

Side note: as part of the work to fix this issue the throwConnectException option was removed from the TransportService.

@jpountz jpountz removed the review label Jul 29, 2014

javanna added a commit to javanna/elasticsearch that referenced this pull request Aug 27, 2014

Transport Client: fixed the node retry mechanism which could fail wit…
…hout trying all the connected nodes

The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed were not completed yet.

The TransportService used to throw ConnectTransportException due to throwConnectException set to true, and also notify the listener of any exception received from a separate thread through the request holder.

Simplified exception handling by just removing the throwConnectException option from the TransportService, used only in the transport client. The transport client now relies solely on the request holder to notify of failures and eventually retry.

Closes elastic#6829

@javanna javanna added the v1.3.3 label Aug 27, 2014

@clintongormley clintongormley changed the title Transport Client: fixed the node retry mechanism which could fail without trying all the connected nodes Fixed the node retry mechanism which could fail without trying all the connected nodes Jun 7, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Transport Client: fixed the node retry mechanism which could fail wit…
…hout trying all the connected nodes

The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed were not completed yet.

The TransportService used to throw ConnectTransportException due to throwConnectException set to true, and also notify the listener of any exception received from a separate thread through the request holder.

Simplified exception handling by just removing the throwConnectException option from the TransportService, used only in the transport client. The transport client now relies solely on the request holder to notify of failures and eventually retry.

Closes elastic#6829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.