New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lettuce can't update cluster structure correctly in provided case #355

Closed
Spikhalskiy opened this Issue Sep 12, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@Spikhalskiy

Spikhalskiy commented Sep 12, 2016

I have a cluster with 3 nodes. I configured the client like this:

Iterable<RedisURI> redisURIs = ... //collection of RedisURI for all redis master/slave processes in the cluster
clusterClient = RedisClusterClient.create(res, redisURIs);

ClusterTopologyRefreshOptions topologyRefreshOptions = 
    ClusterTopologyRefreshOptions.builder()
        .enablePeriodicRefresh(10, TimeUnit.MINUTES)
        .enableAdaptiveRefreshTrigger(
            RefreshTrigger.MOVED_REDIRECT,
            RefreshTrigger.PERSISTENT_RECONNECTS)
        .adaptiveRefreshTriggersTimeout(30, TimeUnit.SECONDS)
        .build();
ClusterClientOptions options = 
    ClusterClientOptions.builder()
        .maxRedirects(maxAttempts)
        .topologyRefreshOptions(topologyRefreshOptions)
        .build();
clusterClient.setOptions(options);

The redisURIs collection contains all the information about all redis cluster processes on my 3 nodes. There are a total of 30 processes - 15 masters and 15 slaves.

The client worked as expected until this sequence of cluster modifications:

  1. I added a new cluster node with 5 masters and 5 slaves using ./redis-trib.rb add-node. I did not move any slots on them
  2. I removed all new added processes via ./redis-trib.rb del-node
  3. I called CLUSTER RESET on all added nodes
  4. I called CLUSTER FORGET on all main cluster nodes to remove new processes from the structure

This seems like a reasonable cluster modification that lettuce should handle without any problems. The the total available masters count in the main cluster part was always > half of total masters etc.,

Instead, lettuce fails with:

RedisException:Cannot determine a partition for slot 7008 (Partitions: Partitions [RedisClusterNodeSnapshot [uri=RedisURI [host='10.201.12.110', port=9000], nodeId='9c7256dad1776dacc24f71f04512b7f56c2fca72', connected=true, slaveOf='null', pingSentTimestamp=0, pongReceivedTimestamp=0, configEpoch=199, flags=[MYSELF, MASTER], slot count=0]])

Where 10.201.12.110 is that new node that I added and deleted during this experiment. After step 3 above, this is how the node and pricess on 9000 port looked:

redis-cli -h 10.201.12.110 -p 9000 CLUSTER NODES
9c7256dad1776dacc24f71f04512b7f56c2fca72 10.201.12.110:9000 myself,master - 0 0 199 connected

redis-cluster $ redis-cli -h 10.201.12.110 -p 9000 INFO | grep "cluster"
redis_mode:cluster
config_file:/opt/redis-cluster/9000/redis-cluster-9000.conf
cluster_enabled:1

Summary:

Lettuce seems to have all necessary information about nodes provided via RedisClusterClient.create() and should recover from the deletion of the new master by re-discovering cluster nodes using this information. Especially if this new master reports incorrect cluster structure with uncovered slots and never had any slots when it was deleted from the main cluster. Instead, lettuce started using the master process of the new empty node and got stuck refreshing cluster structure.

Potential fix:

If no nodes passed to RedisClusterClient.create() are contained in the CLUSTER NODES results of a node used as a "main" node for cluster structure updates, use the original supplied nodes set to re-discover the whole cluster.

Lettuce version: 4.2.2.Final

@mp911de

This comment has been minimized.

Member

mp911de commented Sep 13, 2016

Thanks for the bug report. That's a great description.

The issue has two reasons:

  1. Lettuce Cluster topology updates can run in two modes. Dynamic and static refresh sources. Dynamic update mode is enabled by default. It will discover additional nodes and crawl through your cluster structure as it changes. Getting stuck on an orphaned node is one of the natural possibilities as dynamic updating uses the last topology to start the next topology refresh. Static update mode will use a fixed set of seed nodes that are known as core nodes that won't be reconfigured soon. This way, static nodes are always a guide to a healthy cluster topology and won't lead to orphaned nodes.
  2. Lettuce uses multiple nodes to obtain the cluster topology, but it always uses the first result from the discovery. The first node can be in an unfortunate case the node which got orphaned hence it returns only itself but not the whole cluster. I did some experiments using consensus to agree on topology views that are shared by the majority of nodes, but that code is not part of the driver.

Please retry your case by using static topology refresh sources, see https://github.com/mp911de/lettuce/wiki/Client-options#dynamic-topology-refresh-sources

@mp911de mp911de added the type: bug label Sep 13, 2016

@Spikhalskiy

This comment has been minimized.

Spikhalskiy commented Sep 13, 2016

Yes, I see this flag and setting it to false would be my hotfix for this problem, thanks for a quick feedback.

Still, for this problem could be done at least two things:

  1. Simple one - fallback to non-dynamic discovering if we are failing to discover from the current state
  2. More complex and correct approach - track, which part of the cluster contains the majority of alive masters that we saw before separation and switch to a master from this major part of separated cluster for updating. Looks like it's a correct approach in terms of redis cluster specification. So... if it's not a part of the driver - could I add it in some way to my lettuce setup? Maybe you could put it at a separate module / extension?
@mp911de

This comment has been minimized.

Member

mp911de commented Sep 13, 2016

Thanks for your feedback. I'd argue this issue requires a more thought. It's not possible to determine the operators intent of cluster changes from determining topology details. I agree that in most cases you want to stick with the majority of nodes but that's not always true.

Imagine a case where you want to split a cluster into two or more parts then you don't have any clue what to do or with which cluster part you want to go.

There are also other approaches possible:

  1. You can disable the periodic refresh and refresh the cluster topology on demand (by DevOps?)
  2. You can obtain Partitions yourself from one or more master nodes known belonging to the cluster and set it to RedisClusterClient.

I fear there is no one-fits-all solution but maybe a one-fits-most. I'll give a Strategy API a spin. This API could be used to decide which should be the topology that is effectively used. Users could hook into that to customize the behavior. I'll also evaluate which could be a good approach for most use cases because getting stuck with an orphaned node is not cool.

@Spikhalskiy

This comment has been minimized.

Spikhalskiy commented Sep 13, 2016

Imagine a case where you want to split a cluster into two or more parts then you don't have any clue what to do or with which cluster part you want to go.

In a case of a cluster splitting to two separate, one-half which has < half masters is dead by spec. So we could safely select part that has more than a half alive masters from last "snapshot".

And we still could always fall back to a start nodes setup if we just can't update from the current nodes set and get something like "no nodes for the slot <12345>". It's better than be just effectively dead like in my cases where we have 0 slots covered by a new "discovered" cluster.

As an easy solution, maybe, it's possible to consider reworking dynamicRefreshSources flag to a chain of strategies for flexibility. With a default setup DYNAMIC_SOURCES, STATIC_SOURCES, so dynamic sources would be used if possible and static sources if the first DYNAMIC_SOURCES strategy fails.

mp911de added a commit that referenced this issue Sep 16, 2016

Add support for topology consensus #355
Previously, cluster topology refreshing could get stuck on a node that was previously discovered but got removed from the cluster. This was possible because multiple views were obtained and any arbitrary topology view was chosen.

Lettuce now implements two consensus algorithms: Healthy Majority and Known Majority. Healthy Majority is applied on the very first topology retrieval, Known Majority for all subsequent topology refreshes.

Healthy Majority votes for topology views containing the most nodes with healthy flags (without FAIL/PFAIL/NOADDR flags) to use a most healthy view. Known Majority selects topology views that contain nodes that were previously known. This consensus works for adding and removing nodes one-by-one or even multiple nodes. In case a cluster is split into even partitions the client can still get stuck on either side, but that issue can be solved by disabling dynamic refresh sources and specifying stable cluster seed nodes.

@mp911de mp911de added this to the Lettuce 4.3.0 milestone Sep 16, 2016

@mp911de

This comment has been minimized.

Member

mp911de commented Sep 16, 2016

Hi @Spikhalskiy I implemented a consensus API that follows your proposal to go stick with a topology view containing the most previously known nodes. Maybe you want to give it a try.

mp911de added a commit that referenced this issue Sep 22, 2016

Add support for topology consensus #355
Previously, cluster topology refreshing could get stuck on a node that was previously discovered but got removed from the cluster. This was possible because multiple views were obtained and any arbitrary topology view was chosen.

Lettuce now implements two consensus algorithms: Healthy Majority and Known Majority. Healthy Majority is applied on the very first topology retrieval, Known Majority for all subsequent topology refreshes.

Healthy Majority votes for topology views containing the most nodes with healthy flags (without FAIL/PFAIL/NOADDR flags) to use a most healthy view. Known Majority selects topology views that contain nodes that were previously known. This consensus works for adding and removing nodes one-by-one or even multiple nodes. In case a cluster is split into even partitions the client can still get stuck on either side, but that issue can be solved by disabling dynamic refresh sources and specifying stable cluster seed nodes.

@mp911de mp911de closed this Sep 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment