Add `ReplicationClient` #50

jgaskins · 2024-01-29T04:27:31Z

The ReplicationClient sends read-only commands to replicas by default and all other commands to the primary/master. You can override the decision by calling on_primary or on_replica. This is useful for when this shard doesn't know about the command you're using. For example, it's not yet implemented (issues/PRs welcome!) or it's from a custom Redis module.

redis = Redis::ReplicationClient.new

redis.on_primary do |primary|
  primary.run({"your", "command", "goes", "here"})
end

redis.on_replica do |replica|
  replica.run({"your", "command", "goes", "here"})
end

If you want some commands to automatically be routed to replicas, you can add them to the set of Redis::READ_ONLY_COMMANDS. Add this to your app's Redis configuration file:

Redis::READ_ONLY_COMMANDS << "mymodule.ro_cmd1" << "mymodule.ro_cmd2"

Then any time you use those commands (even passing arbitrary commands to ReplicationClient#run), they'll automatically be routed to replicas. This constant was extracted from Cluster so both classes could benefit from it and expanded to contain all of the read-only commands that redis-stack-server knows about (which should remove the need for lines like require "redis/cluster/json" in your app)

How is `ReplicationClient` different from `Cluster`?

Redis replication is a distinct concept from Redis's "cluster mode" so Redis::ReplicationClient has to be a separate concept from Redis::Cluster.

Cluster mode shards your data across multiple primaries and has replicas follow those primaries. There are restrictions on which primaries can operate on which keys since you can only operate on a key that exists on the node you sent the command to. That also means that atomic multi-key operations (like LPOPRPUSH or BLPOP against multiple lists) requires that all specified keys reside on the same shard.

Replication, conversely, involves no sharding. It just replicates all of your data in the primary to the replicas. This means there are no limitations imposed by Redis on what operations you can perform on which keys on the primary.

The way cluster mode and replication are exposed to clients is also very different — INFO CLUSTER vs INFO REPLICATION. We could probably paper over some of that, but Redis::Cluster defines some methods specifically because the keys aren't all colocated on the same node. For example, Cluster#pipeline requires a key : String argument that Connection does not because it has to know which node you're going to run the pipeline on. This method doesn't make sense for plain-old replication. If you want to run a pipeline with ReplicationClient, you specify redis.on_primary &.pipeline { |pipe| ... }.

Why `ReplicationClient`?

Yeah, I don't love the name. I really like how succinct the name Cluster is, but Replica doesn't make sense here because it also talks to the primary.

Depending on how we move forward with this, though, Cluster could end up using ReplicationClient under the hood to handle the split between primaries and replicas. Currently it implements its own based on the data returned from INFO CLUSTER.

Chained replication is not (yet?) supported

If C is a REPLICAOF B which is a REPLICAOF A, ReplicationClient will send write commands to A and read-only commands to B, but will ignore C completely.

It might be a good idea not to support chained replication, depending on why it's setup that way. For example, if C is used for long-running commands (think analytical vs transactional queries), you may not want your app sending queries to it expecting them to be fast.

Replication topology discovery

You can point this class at any Redis node in your setup (I keep having to stop myself from saying "cluster") and it will discover which is the primary and which are the replicas.

Additionally, changes in the replication topology (replicas added, removed, or primary failover) will automatically be updated in ReplicationClient. Currently, this is implemented by throwing away the previous connection pools, but this could be improved to make smaller changes.

Unfortunately, in the current implementation, taking a replica offline won't be handled gracefully. You'll still get errors in trying to talk to replicas, but those should disappear during the next periodic topology scan (defaults to 10-second intervals).

No replication required

You can even use this if you aren't doing replication at all — it will just send all commands to the one node. If you add replicas later, it will automatically pick them up during the next topology scan and begin routing read-only commands to the replicas.

This is important if you replicate only to a single node, which you take offline to upgrade before upgrading the primary.

Real-world testing

I've been testing this on a 3-node Dragonfly DB in one of my Kubernetes clusters (they offer a Kubernetes operator that's easy to work with and the Redis operator is enterprise-only) and it's working really, really well. As noted above, I get some errors when removing replicas (chaos engineering ftw), but they go away after a few seconds.

Closes #8

jgaskins · 2024-01-29T04:31:35Z

cc @jwoertink IIRC you were using Redis replication on AWS, right?

Add ReplicationClient

9e60ee3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ReplicationClient` #50

Add `ReplicationClient` #50

jgaskins commented Jan 29, 2024 •

edited

Loading

jgaskins commented Jan 29, 2024

Add ReplicationClient #50

Are you sure you want to change the base?

Add ReplicationClient #50

Conversation

jgaskins commented Jan 29, 2024 • edited Loading

How is ReplicationClient different from Cluster?

Why ReplicationClient?

Chained replication is not (yet?) supported

Replication topology discovery

No replication required

Real-world testing

jgaskins commented Jan 29, 2024

Add `ReplicationClient` #50

Add `ReplicationClient` #50

jgaskins commented Jan 29, 2024 •

edited

Loading

How is `ReplicationClient` different from `Cluster`?

Why `ReplicationClient`?