Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Lettuce can't update cluster structure correctly in provided case #355
I have a cluster with 3 nodes. I configured the client like this:
Iterable<RedisURI> redisURIs = ... //collection of RedisURI for all redis master/slave processes in the cluster clusterClient = RedisClusterClient.create(res, redisURIs); ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(10, TimeUnit.MINUTES) .enableAdaptiveRefreshTrigger( RefreshTrigger.MOVED_REDIRECT, RefreshTrigger.PERSISTENT_RECONNECTS) .adaptiveRefreshTriggersTimeout(30, TimeUnit.SECONDS) .build(); ClusterClientOptions options = ClusterClientOptions.builder() .maxRedirects(maxAttempts) .topologyRefreshOptions(topologyRefreshOptions) .build(); clusterClient.setOptions(options);
The client worked as expected until this sequence of cluster modifications:
This seems like a reasonable cluster modification that lettuce should handle without any problems. The the total available masters count in the main cluster part was always > half of total masters etc.,
Instead, lettuce fails with:
Where 10.201.12.110 is that new node that I added and deleted during this experiment. After step 3 above, this is how the node and pricess on 9000 port looked:
Lettuce seems to have all necessary information about nodes provided via
If no nodes passed to RedisClusterClient.create() are contained in the
Lettuce version: 4.2.2.Final
Thanks for the bug report. That's a great description.
The issue has two reasons:
Please retry your case by using static topology refresh sources, see https://github.com/mp911de/lettuce/wiki/Client-options#dynamic-topology-refresh-sources
Yes, I see this flag and setting it to false would be my hotfix for this problem, thanks for a quick feedback.
Still, for this problem could be done at least two things:
Thanks for your feedback. I'd argue this issue requires a more thought. It's not possible to determine the operators intent of cluster changes from determining topology details. I agree that in most cases you want to stick with the majority of nodes but that's not always true.
Imagine a case where you want to split a cluster into two or more parts then you don't have any clue what to do or with which cluster part you want to go.
There are also other approaches possible:
I fear there is no one-fits-all solution but maybe a one-fits-most. I'll give a Strategy API a spin. This API could be used to decide which should be the topology that is effectively used. Users could hook into that to customize the behavior. I'll also evaluate which could be a good approach for most use cases because getting stuck with an orphaned node is not cool.
In a case of a cluster splitting to two separate, one-half which has < half masters is dead by spec. So we could safely select part that has more than a half alive masters from last "snapshot".
And we still could always fall back to a start nodes setup if we just can't update from the current nodes set and get something like "no nodes for the slot <12345>". It's better than be just effectively dead like in my cases where we have 0 slots covered by a new "discovered" cluster.
As an easy solution, maybe, it's possible to consider reworking dynamicRefreshSources flag to a chain of strategies for flexibility. With a default setup