New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Augmenting Redis with multiplexing interface #12873
Comments
For responses, does this require a change in protocol? Can we use the attribute type for this? minor: I would avoid |
This is important, especially if we are changing the protocol after the response. Unlike the The other option I would like to posit is that we should consider listening to the multiplexing protocol on a separate port if it's very different from RESP protocol. There seems to be space to try to further extend the connection interface so it can support an arbitrary number of ways to deliver commands to Redis, of which this is one. |
rueidis also pipelines concurrent commands to one connection and supports server-assisted client side caching based on the shared connection. I believe the proposed multiplexing protocol will definitely benefit all client implementations in terms of both performance and simplicity. One thing missing from the proposal is how will this multiplexing protocol handle out-of-band messages, including pubsub, client side caching, and others? |
Thanks everyone for the great comments!
Regarding the order, every response to a client is preceded by a header defining to which client the response is related. If a response has an attribute, then this attribute will appear as part of the response after the header.
You are both right, MPXHELLO should be the first command, otherwise, there might be issues as you presented above. HELLO might be sent for each client at the multiplexed connection but not for the multiplexed connection itself (unless we want to define some inheritance mode). The issue @madolson raised should be handled, maybe we can forcefully close connection when MPXHELLO received after other commands? Using another port is also as option as @madolson proposed.
out-of-band message received for a specific client is preceded by an header denote this message is targeted to this client. In this case, when many clients at the same multiplexed connection are subscribed to the same channel for example. The message is sent many times, each preceding with an header denote to which client the message is targeted to. We can further extend it to the multiplexed connection level. Maybe add a few headers followed by a message. The client side can also do an optimization and register all its pub subs for example on a dedicated client while scatter the received messages. |
I like this idea too, given it will be simple enough to implement in clients. How is a header represented? Is it sent like a separate response before the real response in the RESP flow, or are we talking about a completely new protocol here? I think RESP3 attributes would be a nice fit for tagging each response to a client ID.
It sounds like it's the head-of-line blocking problem we're trying to solve here. It a somewhat different problem. Sending huge strings in commands is a problem too. Do we need chunked commands too? So far, the Redis approach has been to try to make each command sufficiently fast and small, like recommending SCAN instead of KEYS. For huge string keys, users can use GETRANGE and SETRANGE to get and set it in chunks, though it's not atomic. I think we should avoid reinventing the wheel to solve the head-of-line blocking problem. Maybe we could consider something like RESP-over-HTTP/2 (HTTP/2 does chunked requests and replies) or RESP-over-QUIC? |
Thanks @zuiderkwast.
We can add a few bytes before each resp response, but as long as we don't chop the response/command, we can use attributes. Not sure about inline commands, though.
You are totally right that the preferred way is to avoid using large commands or responses. However, currently, Redis reads from each client up to 16K each time and continues to the next client when writing more than 64K. Even if a client writes a large command or requests a large response, the reads and writes are basically in chunks while providing fair serving of other clients. In this scenario, when we combine several clients at the same connection, writing a large response from one client, for example, hurts serving other clients at the same multiplexing connection. This provides different behavior when using a regular client or a collocated client at a multiplexed connection. This also lets clients send the commands in chunks and provides fairness between the clients it serves. What do you think?
RESP-over-QUIC can be used as you suggested; however, most of the performance gain we were able to achieve in our implementation is due to the usage of a single querybuf and a single cob. The commands are processed directly from the shared querybuf, and the responses are written directly to the shared cob (we fallback to private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers. |
How would we handle authentication & CLIENT KILL commands? Can the internal connections be killed with a CLIENT KILL command? If a CLIENT KILL command kills the multiplexing connection, do all internal connections get killed, too? |
This is intriguing; lots of challenges, but definitely some potential scope. If I think in terms of SE.Redis, I can see that to use this for things like blocking calls without changing the "normal" use of the library, we'd need to opt into this mode early, and have a dedicated default client, but potentially allow access to a dedicated client that is a unique sub-client to redis, i.e. var muxer = ... raw connection, with some new "opt in" flag to this feature
var db = muxer.GetDatabase(); // uses default/shared context
// note that db+muxer can be shared and aggressively used concurrently
// use new mode
using (var dedicated = muxer.CreateClient(/* auth/client details? */))
{
var foo = dedicated.SomeBlockingOperation();
// ...
} so we'd have 1 sub-client automatically just from enabling this mode, then we'd create a second sub-client in the This: could work, and it avoids issues of having lots of physical connections to manage, and having to negotiate TLS repeatedly Things that we'd need to seek clarity on:
The head-of-line blocking issue from large payloads (in either direction): that's a tricky one; there's two ways of dealing with that, I can see:
option 2 is possible, but is a larger change for libraries; but if that is something we want to support, it should be sooner rather than later; for example, that would further support my "this should not be a normal redis command" statement from earlier; as a random protocol example (this is literally me thinking while I type, this is not well considered):
from now on, all payloads in either direction would be prefixed with (in some encoding, most likely binary?), the client id and payload length in bytes, i.e. we might send
where 421231 bytes would be processed by logical client 4123, just as though we'd read it from a socket; this allows any arbitrary fragmentation on the way back from the server, for example:
the point being that the server can now interleave payloads from other sub-clients in-between these, as it chooses - if we had created a client 5422, we might get a response for 5422 in between two parts of a single RESP fragment for 4123 The downside to this: it is a lot more complex, and clients will need to do more routing of bytes. It isn't much more complex, though, because even for a "entire RESP responses only" scenario, clients would still need to track what is outstanding on a per-sub-client basis, routing complete RESP responses (rather than partial RESP data) to reactivate and complete those items. Of course, you might disagree entirely, and think that we can do everything just in RESP, i.e.
but ... that makes me very nervous; I've met users before, and they will find ways to screw this up. I'm thinking of somebody using an older version of a client and the omnipresent ad-hoc API to execute |
Perhaps a bigger problem: I'm aware some folks use a smart proxy in front of their redis servers, and in particular in front of cluster; this is going to be hell for them. I honestly wonder whether that consideration makes it impossible. Can I quietly wave a hand in favor of the much more limited #12716; selfishly, this obtains the bits I care about without introducing huge complex "break the world" changes to the protocol itself, which will take much longer to develop, test (esp. auth concerns), deploy, and get support in proxies and libraries. The proposal here seems ambitious, but potentially too ambitious to be achievable in moderate timescales. If it is what people want: I can absolutely "do my part", but... I see lots of places that would make this trip. To echo a comment from that related post:
|
The smartest thing might just be for the smart proxies to use it internally to connect to the cluster, and ask the individual clients to just connect normally. The benefits outlined here don't really apply if you are putting a proxy in front of the cluster. |
but the client might not know ;p |
I think the scariest thing to me in the proposal is support for multiple authentication contexts on a single port. I'm a bit fuzzy on this from discussions thus far ("may" being the term), but ultimately it looks like the concept is aimed support multiple users with different access multiplexed over the same port. And if anything goes wrong at any point in that multiplexing, client ID reuse, etc.: bad things happen. I'd say it's one level of bad to get commands wrong coming on the protocol but much worse if those can be mixed up across authentication contexts. To Marc's point: load balancers could make this worse again with any delta/change/failover during runtime (often due to clouds patching). IMO something like this should not support multiple authentication contexts on a port. If you try to auth with different credentials: that's would just be rejected. In all of the throughput scenarios we've been approached with - it's the same authentication/user/token wanting multiple connections for throughput and blocking usages (though these are covered by #12716 much more simply). If something like this proceeds, I'd push back against mixed auth/ACL contexts on a single port - there are lots of ways that can go wrong and unless there's a critical need for such a case, the risk likely isn't worth it. Put another way: a lot of the issues raised for justifying multiplexing are totally on point: I agree and that's why we do it in SE.Redis on existing RESP, but how many of those cases need multiple ACLs? I'd love to understand if there's a suite of use cases we just haven't seen yet. I know we're in the same boat, but I echo the sentiment that this is much more complex than RESP2->RESP3 was, and riskier. It's hard for me to currently imagine that risk is justified when we can get most of the gains and practical issue resolutions with less complexity. More protocols means more complexity (in every client) and we can't easily drop old ones - we have to support older servers for quite some time because for better or worse, Redis was awesome going back a long way and some users are on quite ancient versions. IMO, a new protocol is a very high bar and needs to have sufficiently high gains to offer - I'm not sure we actually have that here vs. having a path on blocking commands while allowing multiplexing on the existing protocol to eliminate head of line blocking. Admittedly, I'm biased because SE.Redis already does multiplexing that works pretty well on RESP2/3, so the only gains to be had are blocking paths - if we solve that, every client could multiplex that choses to. |
I did lots more thinking on this over the holidays; my increasing thought is that this is a hugely disruptive change. Disruptive isn't by itself bad or good; IMO we need to get a very clear view on what the intended impact is for this, both negative and positive, to see how the disruption stacks against the benefits. At the technical level, this seems directly comparable to the HTTP/1.1 to HTTP/2 migration Personally - and I emphasize that this is just a point-in-time subjective opinion thing and is open to revision - I think the feels like a bad direction, but I'd love to start mapping out the impact stuff more formally. I can't really list all the areas yet, let alone qualify them as pros/cons - a lot of them would need technical specs to see whether they're affected (positively or negatively); for example:
I would be very keen to get a better view of the intended usage scenarios, to assess whether this level of impact is warranted i.e. gets a suitable payoff; in particular, a lot of things can already be multiplexed if a client wants to, without any protocol changes; the things that need special attention without changing anything:
(source: that's exactly what SE.Redis already does) So I guess what I'm looking for is the killer answer to:
Possible answers off the top of my head:
So of that, the only one that is a "big win with few drawbacks" from my perspective is the blocking operations support; and don't get me wrong: you know I want a solution to that, but: I wonder whether this proposal is massively over-complicating things, and maybe there are simpler ways of skinning that particular cat |
Thanks for your super-detailed response! The motivation for this change is to provide a general interface that provides pipeline performance while allowing the use of regular semantics, as if a connection is used for each client. This holds for current Redis commands as well as for any future command added to Redis with its new semantics and requirements. Our implementation just added 6 bytes as metadata before each command (response or chunk of data). The modifications to Redis are not small (also not huge); however, most of the complications arise due to the fallback required when using a single querybuf and a single cob to serve each multiplexed connection. Regarding different users having different ACLs on the same connection, do you really see this as a concern, even though all requests arrive from the same client machine? I agree that data can be mixed due to a bug in Redis, but the same could be true if a bug exists on the client side. What do you think? |
Absolutely! For the simple fact that multi-tenant applications exist. Related: multiple processes on a single machine running as different security contexts/users (for minimal permissions). And from experience: both are super common. This includes very sensitive authentication providers we deal with. There's no inherent security relationship with "1 per client machine", but there are plenty of them associated with particular sockets. Connections are allowed or not even at firewall levels on the initial connections for example, same for HTTP and pretty much every connection that exists. There are so many assumptions built up over the years that a security context is on a connection (ip:port <-> ip:port). If Redis started with this and built-up over time there'd be far less of a concern here, but this is a major change to do at this point with much more risk. As you said here, a bug on either side makes this a dangerous game to play, and we're talking about something that zero clients in the ecosystem handle today either. We should assume there will be issues here, and adding security escalation concerns on top of that is not awesome. You could argue that once this is much more mature, then it could allow different ACL contexts - that'd add some maturity safeguards at least. But that's just 1 aspect here - will let Marc chime in on a lot of the others above. |
I'm going top leave ACL topics to Nick, since he has opinions, and stick to protocols :) It may help to actually discuss some kind of proposal implementation at the byte level, so we don't talk past each-other; if you're adding 6 bytes, I'm guessing that is just a session identifier (client-id?) before all/some payloads, without additional sub-message framing (i.e. the following message is exactly an entire RESP message) - which means that the head-of-line-blocking issue still applies; that does mean that the implementation is much simpler (no partial buffers to stash and manage), but: we probably need to be very clear on that so we can correctly discuss what the proposal does/does-not address. My first concern is that this is a protocol break; a major "reset the world" change in terms of library support; that's not inherently bad, but the collective "we" are only just finishing RESP3, which has dragged - although I will allow that this topic has some interesting technical advantages for my own selfish needs. My second concern is that this protocol break could have weird and unexpected outcomes (which makes me think "unexpected exploit") if it is possible to get any scenario where the client/server haven't properly agreed that they both understand this change in advance, and there is any even remote ambiguity. This could be mitigated as simply as a new RESP primitive; for example, if the session-id is conveyed as a redis integer, I would strongly suggest against There's also some discussion like "does this token get sent on every message, or only when the message switches context?" |
This is just one way to do it, but we used 6 bytes (no resp) that have two bits for the client id and four for the size of the incoming message. For control commands, such as creating clients, we use 0 to denote it as a control command, followed by 2 for the opcode (e.g., create client opcode) and 2 for the ID (as the requested client ID). The client IDs are unique and private per multiplex (mpx) connection but not unique across different mpx connections. There is an internal mapping from mpx connection + client ID to the client object. Regarding the head-of-line blocking, both Redis and the client can chop the command/response to chunks. We still left with packet loss since it is tcp at the end. |
Just to summarize briefly and address any additional questions, all the answers are related to decisions that were made and can be modified during the design discussion. The motivation is to colocate clients on the same connection while supporting full functionality for each client and providing performance similar to a pipeline. Each client on the multiplexed connections has the same functionality as a regular client and is unaware it is part of a multiplexed connection. This approach offers easy connection management, full client functionality, performance similar to a pipeline, and the option to be fair on both the client and server sides. Fairness is achieved by sending a limited chunk at each time, preventing the starvation of requests and responses for other collocated clients.
I agree we should not allow users to handle this. The motivation is mostly for clients to have full functionality and reduce the hassle of connection management.
It depends on the way we decide to implement it. In our implementation, we used 6 bytes as a header, and all following data can be either RESP2, RESP3, or inline commands. We support all.
Clients do appear in the client list and have their own IDs. These IDs are different from the ones used for the multiplexed connection since IDs for multiplexed connections are unique per connection only. There is a mapping at the server between the client ID at each multiplexed connection and its corresponding client object. These collocated clients also appear in the CLIENT LIST alongside their unique IDs; however, collocated clients do share IP and port information.
AUTH is used per collocated client but not for the multiplexed connection. The multiplexed connection only manages clients and does not perform any commands. However, all internal connections need to authenticate.
A collocated client can be killed using the CLIENT KILL command, and the command should use the client ID since all collocated clients share the same IP:port. If a multiplexed connection is killed, then all its collocated clients are also terminated.
The server will respond with an error. Client IDs used by the multiplexed protocol are unique for each multiplexed client, but different multiplexed connections may use the same IDs. These assumptions, of course, can be changed.
Yes since they share the ip and port, but it's a decision we need to make.
A collocated client can perform any action, just like a regular client. Each collocated client can use a different protocol version, authentication, reset, etc.
Do you mean 'hell' since the proxy will use a multiplexed connection that will multiplex the client connection, and this client might send 'mpxhello.' In this case, we can add a control header with an error. This header in any case needs to be parsed by the proxy. The proxy, in this case, should open a dedicated multiplexed connection for this client's multiplexed connection. |
Ultimately, I guess I'm still struggling with the complexity that this is going to introduce, at every client/proxy level. Managing parallel data streams is hard. And that's before we've dealt with the matrix of up-level vs down-level comms (for clients and servers separately) - i.e. what is the failure mode in each case that isn't all-tick. I would love to see a very clear problem statement of what we're trying to solve with this level of proposed complexity, ideally with some kind of expected metric, for example (and I'm making this up as I type):
Is that anything like your reasoning? Any ideas on the bottom bullet (some kind of metric)? Also: again, I emphasize that this is huge impact - almost a brand new protocol (or at least protocol wrapper). At that point, I genuinely wonder whether it should even colocate on the same port. I also wonder if there's anything we can learn from H2 vs H3 here? H2 and H3 are also multiplexed stream protocols, with H3 deciding to move from TCP to UDP, for reasons. I wonder whether those reasons apply here. |
The switch to QUIC can significantly simplify the implementation as well as solve head-of-line blocking that still exists at the TCP level in our case (as in HTTP/2). However, the problem is that most of the performance gain we achieved in our implementation is due to the usage of a single querybuf and a single cob that improves locality. Commands are processed directly from the shared querybuf, and responses are written directly to the shared cob (we fallback to a private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers. |
@mgravell One way we can mitigate the complexity is with https://github.com/aws/glide-for-redis. I'll save you from going through all the details, but the high level idea is that it's a rust Redis driver that tries to abstract away the actual transport of commands from the client to the server. Each client builds a high-level wrapper that communicates with the rust core, and the rust core is responsible for shipping that command off to Redis. In this architecture, clients are agnostic to all of the complexity introduced by this multiplexing interface as its being handled by the core. |
One of the nice things about Redis is that RESP is really simple (even RESP3 is) and can be implemented and hand-debugged easily when needed. Although GLIDE seems to be a really cool project, abstracting away the complexity is not the same thing as keeping the protocol simple. If we add this multiplexing feature, I hope it will not be RESP over a binary protocol (which is what I would call the 6 byte chunk header where the numbers are stored in binary, while numbers in RESP are stored as ASCII). I wish that the multiplexing can be handled within RESP, either using RESP3 attributes or some new RESP4 chunked reply feature. |
"just throw away your existing network core and switch to wrappers over the Go core" is not hugely compelling, at least to me :) |
It wouldn't be to me either :), if what you have works for you we're not going to deprecate it. If you don't have any of the logic through, it might be an easier alternative.
I am still advocating for RESP instead of a new binary protocol. At some point we might consider a binary variant of RESP, but I also like the ease of debugging RESP. We could probably just continue extending RESP3 instead of releasing a RESP4, as long as there are no breaking changes. This would be an opt-in feature anyways. |
The volume of out-of-band messages including client tracking invalidations is typically huge. In this proposal, a client must still implement pipelining to a sub client to avoid duplicated server-assisted invalidations, which make this proposal less appealing, IMO. |
Attempting to summarize the issues raised in this thread: Question 1 Question 2 Question 3 Question 4 This also lets clients send the commands in chunks and provides fairness between the clients it serves. The switch to RESP-over-QUIC can significantly simplify the implementation as well as solve head-of-line blocking that still exists at the TCP level in our case (as in HTTP/2). However, most of the performance gain we were able to achieve in our implementation is due to the usage of a single querybuf and a single cob. The commands are processed directly from the shared querybuf, and the responses are written directly to the shared cob (we fallback to private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers. Question 5 Question 6 Question 7 Question 8 Question 9 Question 10 Question 11
Question 12 Question 13 |
@redis/core-team I would read the top comment, the proposal summary, and the previous comment, which summarizes open questions. My big pending question is who do we think the user of this will be. This was originally envisioned by If we decide this is mostly a proxy implementation, we should still consider implementing the blocking callbacks proposed by mgravell. If we think this is the better solution for clients and proxies alike, I think we should make sure this API is as friendly as possible to implement (ideally just extend RESP3). Initially I was more of the opinion that all clients should adopt this protocol, but the more I think about it the more I believe this really should be more of a proxy feature. Clients should be kept as simple as possible, and heavy weight libraries could adopt this protocol when they are interested in performance. |
related to topic 4: i.e. in the sense of a single huge argument blocking other virtual clients. And i do think in the case of a proxy we'd want to let use cut-through approach and avoid either side collecting a buffer to be sent after we have all the data. Trying to solve these problem, and others, with the proposed approach can quickly become complicated, so i'd like to suggest to consider implementing both approaches (obviously separately, so one now and one at some future date). related to topics 5 and 7: |
Multiplexing can already be implemented on the clien-side with RESP3 quite easily, therefore I don't see any need for a new protocol.
In most cases this is solved by decoupling requests from the connection. For example, in Boost.Redis creating and sending a request looks like request req;
req.set(...);
req.get(...);
conn.async_exec(req, ...) where
This makes no sense. There are things that would make de-multiplexing easier, for example, if the |
I'm not sure it does. In case I've missed something, please demonstrate how to issue a |
@antirez Did you make thoughts about multiplexing when designing RESP3? |
any update? |
@masx200 Redis is no longer open source. Non-Redis maintainers are moving here: https://github.com/valkey-io/valkey. |
In this proposal we discuss exposing a new protocol that allows serving many Redis clients over a single TCP connection. Using such a protocol achieves similar performance to a pipeline mode while maintaining, for each client, the same functionality and semantics as if the client was served using a single dedicated TCP connection.
Introduction
A TCP connection for Redis introduces a client object that maintains a logical state of the connection. This state is used to provide access control guarantees as well as the required semantics for Redis commands. For example, a client object holds for a connection its ACL user, its watched keys, its blocking state, the pub-sub channels it subscribed to, and more. This state is bound to the TCP connection and is freed upon disconnection.
Since each client uses a dedicated connection, commands for each client are sent to Redis separately and each response is returned by Redis to a different network connection. This cause Redis to spend a large amount of time (62% when using 500 clients) at system calls as well as consume at least one network packet per command and per response.
Pipeline can be applied to reduce the number of packets, amortize the system calls overhead (11% when using 10 clients serving pipelines of 50 commands) and improve the locality (reduction of 44% in L1 cache misses v.s. using 500 clients). However, in many cases, a pipeline cannot be utilized either due to command dependencies or because the client generates only a few commands per second.
For this reason, client implementations like StackExchange.Redis collocate many clients on a single TCP connection while using pipeline to enhance performance. However, since from Redis perspective only a single logical client is bound to a TCP connection. All collocated clients are handled by Redis as if all commands arrived from a single client.
Naturally, with such configuration, blocking commands, multi/exec, ACLs, and additional commands cannot preserve their required semantics. Therefore, StackExchangeRedis does not support blocking commands and utilize LUA or constraints to abstract multi/exec. Buffers limits also cannot be managed at client level, along with ACLs.
Furthermore, since Redis treats all collocated clients as a single client, no fairness guarantees are provided for the clients’ commands. Consequently, a large command or response from one client may impact the latency of other commands from collocated clients.
Our suggestion - multiplexing protocol
In this proposal, we suggested implementing an additional protocol to Redis that supports connection multiplexing. Multiplexing is achieved by using a single TCP connection, collocating many clients through the addition of extra metadata. The collocation of commands (and responses) for multiple clients simulates pipeline behavior across a large number of clients, resulting in performance similar to that of a pipeline.
The multiplexing protocol supports all Redis commands at their original semantics. This means that multi/exec and watches can be applied concurrently to different clients, each client may have different ACLs and even a blocked client does not block the entire connection. Moreover, buffer limits are also enforced per client, and the closing of a client can be performed without disconnecting the connection.
When a multiplexed connection is disconnected, all the clients allocated for this connection are closed and the user needs to request new clients to be allocated.
We suggest defining the multiplexing protocol in such a way that each command or response is preceded by a header indicating the client to which the command (or response) is targeted. Additionally, control commands, such as ‘create client’ and ‘client close’, are also encoded in the protocol header.
The following example shows the usage of a single multiplexing connection with two clients, where each client uses a different user with potentially different ACL rules. After the connection is established, an ‘MPXHELLO’ command is sent to define the connection as a multiplexed connection. This command is followed by two ‘create client’ commands that initialize two clients on the Redis side. After the clients are created, USER1 is set for Client I, and USER2 is set for Client II using Auth commands. Both clients, I and II, then send 'GET' commands for k1 and k2, respectively. At this point, 'Client I' sends a 'BLPOP' command that is blocked since list l1 does not exist. Even though 'Client I' is blocked, and both clients I and II are using the same connection, Client II continues sending 'SET' commands that are processed.
The text was updated successfully, but these errors were encountered: