Parallel scan on all shards in the cluster #2799

vbabenkoru · 2024-03-22T21:21:39Z

Feature Request

We're using RedisAdvancedClusterAsyncCommands to execute SCAN on the whole cluster, but it executes on one node at a time. Is there a way to execute SCAN on multiple nodes in parallel?

Is your feature request related to a problem? Please describe

We want to perform maintenance on data (migrations, fixing data, etc.) on a cluster of 25+ shards. It takes a really long time to execute, since scans are performed on one shard at a time.

Describe the solution you'd like

Have an API to split scans across nodes. Maybe a way to get multiple initial state cursors, one for each shard, to be able to pass them to SCAN in parallel.

Describe alternatives you've considered

Not using RedisAdvancedClusterAsyncCommands and not using cluster connections and connecting to each node in non-cluster mode. This is way too complicated from app level, and is better handled/abstracted by the library imo.
ClusterScanCursor is internal and cannot be accessed by apps, so it's not possible to manually specify nodes in the initial cursor either.

Teachability, Documentation, Adoption, Migration Strategy

See the solution above.

The text was updated successfully, but these errors were encountered:

mp911de · 2024-04-02T06:59:22Z

The advanced Cluster API emulates behavior known from single-node Redis. SCAN is an iterative command that relies on a previous cursor and therefore we scan cluster nodes sequentially. Introducing parallelism would add a lot of complexity on our side and that isn't something we want to maintain. Also, it would contradict with the emulation principle, it would introduce a timing dependency on the slowest node and a failure dependency if one out of many requests would fail.

I suggest that you implement that kind of behavior in your application that is tailored to your performance and fail-safety needs.

vbabenkoru · 2024-04-12T02:04:59Z

@mp911de I understand that it may not make sense for this to be a built-in behavior, I'm looking for any way to do this at all (on the app side). Is the only way to basically manually create connections to individual nodes as singletons and manage cluster discovery manually, or use a different library? Is there any way to use the cluster logic of Lettuce, but get raw connections to individual master nodes?

mp911de · 2024-04-12T08:43:51Z

There is, you can obtain node connections from StatefulRedisClusterConnection.getConnection(…). I suggest using ScanStream using reactive API's so you can apply scatter/gather on multiple cluster nodes and merge the individual streams rather easily (at least simpler than a Future-based approach).

mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024

mp911de added the status: declined A suggestion or change that we dont feel we should currently apply label Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel scan on all shards in the cluster #2799

Parallel scan on all shards in the cluster #2799

vbabenkoru commented Mar 22, 2024 •

edited

Loading

mp911de commented Apr 2, 2024

vbabenkoru commented Apr 12, 2024 •

edited

Loading

mp911de commented Apr 12, 2024

Parallel scan on all shards in the cluster #2799

Parallel scan on all shards in the cluster #2799

Comments

vbabenkoru commented Mar 22, 2024 • edited Loading

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Teachability, Documentation, Adoption, Migration Strategy

mp911de commented Apr 2, 2024

vbabenkoru commented Apr 12, 2024 • edited Loading

mp911de commented Apr 12, 2024

vbabenkoru commented Mar 22, 2024 •

edited

Loading

vbabenkoru commented Apr 12, 2024 •

edited

Loading