Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CLUSTER INFO and CLUSTER NODES output unreliable for determining cluster usability #11023

Open
ianling opened this issue Jul 21, 2022 · 1 comment

Comments

@ianling
Copy link

ianling commented Jul 21, 2022

Describe the bug

CLUSTER INFO and CLUSTER NODES may return information indicating that a new cluster is up, ok, connected, and ready to use, even though it isn't.

This can lead to CLUSTERDOWN errors when trying to actually get/set keys in a cluster node, even though CLUSTER INFO and CLUSTER NODES indicate that the cluster is usable.

These errors stop occurring if you simply wait 3-5 seconds after creating the cluster, but there doesn't seem to be a reliable way to determine how long you have to wait, or a way to reliably determine if the cluster is usable.

To reproduce

  1. Spin up a cluster using a script (I used several Redis nodes running in Docker)
  2. Immediately after running the redis-cli --cluster create ... command, run CLUSTER INFO and/or CLUSTER NODES in a loop until the desired output is received (cluster_state:ok and all nodes having the status connected, respectively)
  3. Immediately try to get/set keys in the cluster

Some keys may work, other keys may return a CLUSTERDOWN error. This behavior persists for anywhere from 1-5 seconds until the cluster is truly ready to be used. After that, it works perfectly fine.

Expected behavior

Using the cluster (e.g. getting/setting keys) works every time.

Additional information

I am using the go-redis client, and redis 6.2.5 via Docker.

Is there a reliable way to determine if the cluster is usable that I'm missing? The behavior isn't consistent and there isn't a set number of times that getting/setting a key will fail before the CLUSTERDOWN errors stop occurring.

@ianling
Copy link
Author

ianling commented Jul 21, 2022

It appears as though you must connect to each node in the cluster and run CLUSTER INFO on each one, because the output can differ between them. In other words, one node might report that the cluster is OK, but another might not; they aren't synchronized.

If this behavior is expected, feel free to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant