Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel scan on all shards in the cluster #2799

Closed
vbabenkoru opened this issue Mar 22, 2024 · 3 comments
Closed

Parallel scan on all shards in the cluster #2799

vbabenkoru opened this issue Mar 22, 2024 · 3 comments
Labels
status: declined A suggestion or change that we dont feel we should currently apply

Comments

@vbabenkoru
Copy link

vbabenkoru commented Mar 22, 2024

Feature Request

We're using RedisAdvancedClusterAsyncCommands to execute SCAN on the whole cluster, but it executes on one node at a time. Is there a way to execute SCAN on multiple nodes in parallel?

Is your feature request related to a problem? Please describe

We want to perform maintenance on data (migrations, fixing data, etc.) on a cluster of 25+ shards. It takes a really long time to execute, since scans are performed on one shard at a time.

Describe the solution you'd like

Have an API to split scans across nodes. Maybe a way to get multiple initial state cursors, one for each shard, to be able to pass them to SCAN in parallel.

Describe alternatives you've considered

Not using RedisAdvancedClusterAsyncCommands and not using cluster connections and connecting to each node in non-cluster mode. This is way too complicated from app level, and is better handled/abstracted by the library imo.
ClusterScanCursor is internal and cannot be accessed by apps, so it's not possible to manually specify nodes in the initial cursor either.

Teachability, Documentation, Adoption, Migration Strategy

See the solution above.

@mp911de
Copy link
Collaborator

mp911de commented Apr 2, 2024

The advanced Cluster API emulates behavior known from single-node Redis. SCAN is an iterative command that relies on a previous cursor and therefore we scan cluster nodes sequentially. Introducing parallelism would add a lot of complexity on our side and that isn't something we want to maintain. Also, it would contradict with the emulation principle, it would introduce a timing dependency on the slowest node and a failure dependency if one out of many requests would fail.

I suggest that you implement that kind of behavior in your application that is tailored to your performance and fail-safety needs.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
@mp911de mp911de added the status: declined A suggestion or change that we dont feel we should currently apply label Apr 2, 2024
@vbabenkoru
Copy link
Author

vbabenkoru commented Apr 12, 2024

@mp911de I understand that it may not make sense for this to be a built-in behavior, I'm looking for any way to do this at all (on the app side). Is the only way to basically manually create connections to individual nodes as singletons and manage cluster discovery manually, or use a different library? Is there any way to use the cluster logic of Lettuce, but get raw connections to individual master nodes?

@mp911de
Copy link
Collaborator

mp911de commented Apr 12, 2024

There is, you can obtain node connections from StatefulRedisClusterConnection.getConnection(…). I suggest using ScanStream using reactive API's so you can apply scatter/gather on multiple cluster nodes and merge the individual streams rather easily (at least simpler than a Future-based approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we dont feel we should currently apply
Projects
None yet
Development

No branches or pull requests

2 participants