Connection to newly added node is initially unavailable with validateClusterNodeMembership: false. #340

oklahomer · 2016-08-23T07:42:17Z

This issue was previously discovered in another question-marked issue #339 .
Below shows detailed context and error content.

lettuce version

4.2.1

Contextual information

To maximize the availability, I applied some changes to client options: ClusterClientOptions and ClusterTopologyRefreshOptions.
Current settings are shown below:

ClientResources clientResources = DefaultClientResources.builder().build();
List<RedisURI> redisURIs = ...;
RedisClusterClient redisClusterClient = RedisClusterClient.create(clientResources, redisURIs);

ClusterTopologyRefreshOptions topologyRefreshOptions =
        ClusterTopologyRefreshOptions.builder()
                                     .enablePeriodicRefresh(INTERVAL_IN_SECONDS, TimeUnit.SECONDS)
                                     .enableAllAdaptiveRefreshTriggers()
                                     .build();

ClusterClientOptions clusterClientOptions =
        ClusterClientOptions.builder()
                            .validateClusterNodeMembership(false) // To acknowledge brand new nodes
                            .topologyRefreshOptions(topologyRefreshOptions)
                            .build();

redisClusterClient.setOptions(clusterClientOptions);

In this modification, ClusterClientOptions#validateClusterNodeMembership is set to false to acknowledge brand new nodes with new host:port regarding #109.
However, ClusterClientOptions#closeStaleConnections is left with default value: true. (Just got to have a hunch that this is somehow related...)

Simplest possible steps to reproduce

Create lettuce instance with above settings, add nodes to redis cluster, reshard, and an exception is thrown. It is thrown only once and later operations seem to run normally and as expected.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

mp911de · 2016-08-26T15:58:08Z

The reason for the failure: The newly created connection is opened in the same thread (EventLoop) that is assigned to the connection itself. Opening a new connection will assign a different EventLoop which is a different thread and so it eventually works. The solution should be retrying using the computation thread pool.

Previously, cluster command redirects were executed on the I/O thread. This is most of the time not an issue but in cases where lettuce did not open a connection to the target host, a connection has to be established. This is a blocking operation which blocks the event loop. The new connection is assigned in some cases (depending on the connection/event loop assignment) to the event loop which currently executes the retry. This leads to a deadlock. Command retries are scheduled now using the event executor thread group.

mp911de · 2016-08-28T18:43:13Z

That's fixed now.

mp911de added the type: bug A general bug label Aug 23, 2016

oklahomer mentioned this issue Aug 23, 2016

Best practice for Cluster Topology Refresh #339

Closed

mp911de added this to the Lettuce 4.2.3 milestone Aug 28, 2016

mp911de closed this as completed Aug 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection to newly added node is initially unavailable with validateClusterNodeMembership: false. #340

Connection to newly added node is initially unavailable with validateClusterNodeMembership: false. #340

oklahomer commented Aug 23, 2016 •

edited

Loading

mp911de commented Aug 26, 2016

mp911de commented Aug 28, 2016

Connection to newly added node is initially unavailable with validateClusterNodeMembership: false. #340

Connection to newly added node is initially unavailable with validateClusterNodeMembership: false. #340

Comments

oklahomer commented Aug 23, 2016 • edited Loading

lettuce version

Contextual information

Simplest possible steps to reproduce

mp911de commented Aug 26, 2016

mp911de commented Aug 28, 2016

oklahomer commented Aug 23, 2016 •

edited

Loading