Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add Cluster method to split keys by nodes #44

Open
mna opened this issue Oct 29, 2022 · 0 comments
Open

Proposal: add Cluster method to split keys by nodes #44

mna opened this issue Oct 29, 2022 · 0 comments

Comments

@mna
Copy link
Owner

mna commented Oct 29, 2022

Description

Add a Cluster.SplitByNode(keys ...string) [][]string that takes a list of Redis keys and returns them grouped by keys that belong to the same node. It is similar to SplitBySlot, but while SplitBySlot is guaranteed to return groupings of keys that definitely belong together (as they live in the same hash slot), SplitByNode takes a more optimistic approach and returns groupings of keys that are likely to live on the same node, even if they don't necessarily share the same hash slot.

It does so by using the current known slot-to-node mapping of the Cluster (which is why the new function must be a method on the Cluster type, it needs access to its mapping). If the cluster has remained stable since the last time the mapping was updated, and remains stable until the commands for those keys are executed, then it can significantly improve performance as more keys can be grouped together using the same connection. If the cluster has not remained stable, then it will result in MOVED errors and it can be handled as per the usual mechanisms of the redisc package, either manually or automatically via a RetryConn.

It is a new API and as such could be released as part of a minor version.

Use Case

While in general care should be taken to design keys so that related data lives in the same key slot (e.g. {user:123}:foo and {user:123}:bar, so that both foo and bar data for user 123 live in the same slot), it is sometimes useful, to run a number of commands that cannot fit this restriction (e.g. an admin user may want to retrieve all users' foos). In this case, the execution of commands can be optimized with SplitByNode.

Implementation Concerns

What if at the moment this is called, not all slots are mapped to a node? It could return an error, but that would make it troublesome to use that API - the calling code would always have to prepare for the error case and have a fallback scenario (which would likely be to use SplitBySlot). Another option would be to return a random node for the slots that are not covered by a node, but that's suboptimal as it is almost sure that those keys will result in a MOVED error.

A fair option, I think, would be for SplitByNode to return the same groupings as SplitBySlot for the keys that are not covered by any node. So it always returns valid groupings of keys, the calling code is the same whether the internal mappings are complete or partial, and the groupings are as optimal as they can be - the keys with a valid mapping are properly grouped, and those without a mapping are grouped by their slot so that when executed, they will return at most a single MOVED error (for the first command) and after following that redirection all keys in that grouping will be on the right node.

If the Cluster is closed, instead of returning an error, simply call SplitBySlot instead. The "closed cluster" error will be returned when attempting to run any command on that cluster.


(note that this proposal is not a promise of implementation, nor is it a promise of merging a PR that would implement this feature - it is first and foremost for me to give it some thought and as a reminder when I have some time to work on the package, and published for visibility)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant