Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augmenting Redis with multiplexing interface #12873

Open
ohadshacham opened this issue Dec 19, 2023 · 33 comments
Open

Augmenting Redis with multiplexing interface #12873

ohadshacham opened this issue Dec 19, 2023 · 33 comments

Comments

@ohadshacham
Copy link

In this proposal we discuss exposing a new protocol that allows serving many Redis clients over a single TCP connection. Using such a protocol achieves similar performance to a pipeline mode while maintaining, for each client, the same functionality and semantics as if the client was served using a single dedicated TCP connection.

Introduction

A TCP connection for Redis introduces a client object that maintains a logical state of the connection. This state is used to provide access control guarantees as well as the required semantics for Redis commands. For example, a client object holds for a connection its ACL user, its watched keys, its blocking state, the pub-sub channels it subscribed to, and more. This state is bound to the TCP connection and is freed upon disconnection.

Since each client uses a dedicated connection, commands for each client are sent to Redis separately and each response is returned by Redis to a different network connection. This cause Redis to spend a large amount of time (62% when using 500 clients) at system calls as well as consume at least one network packet per command and per response.

Pipeline can be applied to reduce the number of packets, amortize the system calls overhead (11% when using 10 clients serving pipelines of 50 commands) and improve the locality (reduction of 44% in L1 cache misses v.s. using 500 clients). However, in many cases, a pipeline cannot be utilized either due to command dependencies or because the client generates only a few commands per second.

For this reason, client implementations like StackExchange.Redis collocate many clients on a single TCP connection while using pipeline to enhance performance. However, since from Redis perspective only a single logical client is bound to a TCP connection. All collocated clients are handled by Redis as if all commands arrived from a single client.
Naturally, with such configuration, blocking commands, multi/exec, ACLs, and additional commands cannot preserve their required semantics. Therefore, StackExchangeRedis does not support blocking commands and utilize LUA or constraints to abstract multi/exec. Buffers limits also cannot be managed at client level, along with ACLs.
Furthermore, since Redis treats all collocated clients as a single client, no fairness guarantees are provided for the clients’ commands. Consequently, a large command or response from one client may impact the latency of other commands from collocated clients.

Our suggestion - multiplexing protocol

In this proposal, we suggested implementing an additional protocol to Redis that supports connection multiplexing. Multiplexing is achieved by using a single TCP connection, collocating many clients through the addition of extra metadata. The collocation of commands (and responses) for multiple clients simulates pipeline behavior across a large number of clients, resulting in performance similar to that of a pipeline.

The multiplexing protocol supports all Redis commands at their original semantics. This means that multi/exec and watches can be applied concurrently to different clients, each client may have different ACLs and even a blocked client does not block the entire connection. Moreover, buffer limits are also enforced per client, and the closing of a client can be performed without disconnecting the connection.
When a multiplexed connection is disconnected, all the clients allocated for this connection are closed and the user needs to request new clients to be allocated.

We suggest defining the multiplexing protocol in such a way that each command or response is preceded by a header indicating the client to which the command (or response) is targeted. Additionally, control commands, such as ‘create client’ and ‘client close’, are also encoded in the protocol header.

The following example shows the usage of a single multiplexing connection with two clients, where each client uses a different user with potentially different ACL rules. After the connection is established, an ‘MPXHELLO’ command is sent to define the connection as a multiplexed connection. This command is followed by two ‘create client’ commands that initialize two clients on the Redis side. After the clients are created, USER1 is set for Client I, and USER2 is set for Client II using Auth commands. Both clients, I and II, then send 'GET' commands for k1 and k2, respectively. At this point, 'Client I' sends a 'BLPOP' command that is blocked since list l1 does not exist. Even though 'Client I' is blocked, and both clients I and II are using the same connection, Client II continues sending 'SET' commands that are processed.

External MPX (1)

@nihohit
Copy link
Contributor

nihohit commented Dec 19, 2023

each command or response is preceded by a header indicating the client to which the command (or response) is targeted

For responses, does this require a change in protocol? Can we use the attribute type for this?
If not, the order of precedence must be clearly defined. If a response can be preceded by both attributes and headers, we should avoid complex recursive structures to describe all possible permutations.

minor: I would avoid MPXHELLO and instead integrate this with the existing HELLO command, since if these are separate commands, we need to carefully define the interactions between them, in case a user sends both HELLO and MPXHELLO.

@madolson
Copy link
Contributor

minor: I would avoid MPXHELLO and instead integrate this with the existing HELLO command, since if these are separate commands, we need to carefully define the interactions between them, in case a user sends both HELLO and MPXHELLO.

This is important, especially if we are changing the protocol after the response. Unlike the CLIENT SETINFO which are optional and can error, MPXHELLO can not error otherwise the client will may be in a broken state reading the response.

The other option I would like to posit is that we should consider listening to the multiplexing protocol on a separate port if it's very different from RESP protocol. There seems to be space to try to further extend the connection interface so it can support an arbitrary number of ways to deliver commands to Redis, of which this is one.

@rueian
Copy link

rueian commented Dec 20, 2023

rueidis also pipelines concurrent commands to one connection and supports server-assisted client side caching based on the shared connection.

I believe the proposed multiplexing protocol will definitely benefit all client implementations in terms of both performance and simplicity.

One thing missing from the proposal is how will this multiplexing protocol handle out-of-band messages, including pubsub, client side caching, and others?

@ohadshacham
Copy link
Author

Thanks everyone for the great comments!

For responses, does this require a change in protocol? Can we use the attribute type for this? If not, the order of precedence must be clearly defined. If a response can be preceded by both attributes and headers, we should avoid complex recursive structures to describe all possible permutations.

Regarding the order, every response to a client is preceded by a header defining to which client the response is related. If a response has an attribute, then this attribute will appear as part of the response after the header.
Using attributes to capture metadata can be tricky but perhaps doable. To ensure fairness between different client responses in a multiplexed connection, Redis will sometimes send the response in chunks, preceding each chunk with a header. The header will not necessarily appear between a valid part of the protocol, and therefore, in this case, attributes can't match. For example, when returning a string response of hundreds of megabytes, it will be divided into smaller chunks, each preceded by a header. In this scenario, the chunks are segmented within the string itself. The same fairness option is also available for clients that send a large command and would like to send chunks to ensure fairness.

minor: I would avoid MPXHELLO and instead integrate this with the existing HELLO command, since if these are separate commands, we need to carefully define the interactions between them, in case a user sends both HELLO and MPXHELLO.

This is important, especially if we are changing the protocol after the response. Unlike the CLIENT SETINFO which are optional and can error, MPXHELLO can not error otherwise the client will may be in a broken state reading the response.

The other option I would like to posit is that we should consider listening to the multiplexing protocol on a separate port if it's very different from RESP protocol. There seems to be space to try to further extend the connection interface so it can support an arbitrary number of ways to deliver commands to Redis, of which this is one.

You are both right, MPXHELLO should be the first command, otherwise, there might be issues as you presented above. HELLO might be sent for each client at the multiplexed connection but not for the multiplexed connection itself (unless we want to define some inheritance mode). The issue @madolson raised should be handled, maybe we can forcefully close connection when MPXHELLO received after other commands? Using another port is also as option as @madolson proposed.

One thing missing from the proposal is how will this multiplexing protocol handle out-of-band messages, including pubsub, client side caching, and others?

out-of-band message received for a specific client is preceded by an header denote this message is targeted to this client. In this case, when many clients at the same multiplexed connection are subscribed to the same channel for example. The message is sent many times, each preceding with an header denote to which client the message is targeted to. We can further extend it to the multiplexed connection level. Maybe add a few headers followed by a message. The client side can also do an optimization and register all its pub subs for example on a dedicated client while scatter the received messages.

@zuiderkwast
Copy link
Contributor

I like this idea too, given it will be simple enough to implement in clients.

How is a header represented? Is it sent like a separate response before the real response in the RESP flow, or are we talking about a completely new protocol here? I think RESP3 attributes would be a nice fit for tagging each response to a client ID.

To ensure fairness between different client responses in a multiplexed connection, Redis will sometimes send the response in chunks, preceding each chunk with a header.

It sounds like it's the head-of-line blocking problem we're trying to solve here. It a somewhat different problem. Sending huge strings in commands is a problem too. Do we need chunked commands too?

So far, the Redis approach has been to try to make each command sufficiently fast and small, like recommending SCAN instead of KEYS. For huge string keys, users can use GETRANGE and SETRANGE to get and set it in chunks, though it's not atomic.

I think we should avoid reinventing the wheel to solve the head-of-line blocking problem. Maybe we could consider something like RESP-over-HTTP/2 (HTTP/2 does chunked requests and replies) or RESP-over-QUIC?

@ohadshacham
Copy link
Author

ohadshacham commented Dec 21, 2023

Thanks @zuiderkwast.

How is a header represented? Is it sent like a separate response before the real response in the RESP flow, or are we talking about a completely new protocol here? I think RESP3 attributes would be a nice fit for tagging each response to a client ID.

We can add a few bytes before each resp response, but as long as we don't chop the response/command, we can use attributes. Not sure about inline commands, though.

It sounds like it's the head-of-line blocking problem we're trying to solve here. It a somewhat different problem. Sending huge strings in commands is a problem too. Do we need chunked commands too?
So far, the Redis approach has been to try to make each command sufficiently fast and small, like recommending SCAN instead of KEYS. For huge string keys, users can use GETRANGE and SETRANGE to get and set it in chunks, though it's not atomic.

You are totally right that the preferred way is to avoid using large commands or responses. However, currently, Redis reads from each client up to 16K each time and continues to the next client when writing more than 64K. Even if a client writes a large command or requests a large response, the reads and writes are basically in chunks while providing fair serving of other clients.

In this scenario, when we combine several clients at the same connection, writing a large response from one client, for example, hurts serving other clients at the same multiplexing connection. This provides different behavior when using a regular client or a collocated client at a multiplexed connection.

This also lets clients send the commands in chunks and provides fairness between the clients it serves.

What do you think?

I think we should avoid reinventing the wheel to solve the head-of-line blocking problem. Maybe we could consider something like RESP-over-HTTP/2 (HTTP/2 does chunked requests and replies) or RESP-over-QUIC?

RESP-over-QUIC can be used as you suggested; however, most of the performance gain we were able to achieve in our implementation is due to the usage of a single querybuf and a single cob. The commands are processed directly from the shared querybuf, and the responses are written directly to the shared cob (we fallback to private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers.

@nihohit
Copy link
Contributor

nihohit commented Dec 21, 2023

How would we handle authentication & CLIENT KILL commands?
I assume that the connection that sends MPXHELLO has to send AUTH. Do all internal connections also need to authenticate? Can an internal connection use a user with more permissions than the multiplexing connection?

Can the internal connections be killed with a CLIENT KILL command? If a CLIENT KILL command kills the multiplexing connection, do all internal connections get killed, too?

@mgravell
Copy link

mgravell commented Dec 21, 2023

This is intriguing; lots of challenges, but definitely some potential scope. If I think in terms of SE.Redis, I can see that to use this for things like blocking calls without changing the "normal" use of the library, we'd need to opt into this mode early, and have a dedicated default client, but potentially allow access to a dedicated client that is a unique sub-client to redis, i.e.

var muxer = ... raw connection, with some new "opt in" flag to this feature
var db = muxer.GetDatabase(); // uses default/shared context
// note that db+muxer can be shared and aggressively used concurrently

// use new mode
using (var dedicated = muxer.CreateClient(/* auth/client details? */))
{
    var foo = dedicated.SomeBlockingOperation();
    // ...
}

so we'd have 1 sub-client automatically just from enabling this mode, then we'd create a second sub-client in the CreateClient call (presumably this is also killable, hence the using, which is lifetime management in C#).

This: could work, and it avoids issues of having lots of physical connections to manage, and having to negotiate TLS repeatedly

Things that we'd need to seek clarity on:

  • what exactly are the protocol changes to announce the connection context of requests/responses?
    • in particular, I wonder if this should not be a normal redis command, because we do not (presumably, I can only speak for myself) want consumers executing ad-hoc redis commands that change this context - IMO libraries should handle this internally to prevent malicious code trying to sneakily switch identity by issuing clever commands; we should not allow this inside Lua, or in modules (except perhaps for a dedicated "create sub client" module API)
  • these protocol changes: apply to which protocol versions? RESP2? RESP3? the almost-obsolete cli text protocol?
  • do sub-clients appear as clients in the CLIENT LIST sense? and do they get their own client id?
    • is the client id the same identifier used on the protocol? or is there a separate identifier that isn't normally known except to the server and from the response to "create new client"?
    • what happens if you try to use a client id not associated with that socket?
    • can you see which clients share a socket via CLIENT LIST?
    • presumably all client state (above the socket) is retained at the sub-client level, i.e. database number, caching hints, no-reply hints, etc
  • what can a sub-client do / not do? can a sub-client reset? can it change protocol version just for that sub-client? it sounds like you do want it to be able to auth/hello to have a different identity

The head-of-line blocking issue from large payloads (in either direction): that's a tricky one; there's two ways of dealing with that, I can see:

  • option 1: ignore it, accept that unreasonably large payloads will cause blocking
  • option 2: bake full protocol-level independent multiplexing into the protocol

option 2 is possible, but is a larger change for libraries; but if that is something we want to support, it should be sooner rather than later; for example, that would further support my "this should not be a normal redis command" statement from earlier; as a random protocol example (this is literally me thinking while I type, this is not well considered):

> MPXHELLO ... # initial handshake of outer connection
,4123 # success, id of default connection is 4123

from now on, all payloads in either direction would be prefixed with (in some encoding, most likely binary?), the client id and payload length in bytes, i.e. we might send

id: 4123 length: 421231 # length not counted, gibberish
+SET Foo ...
+GET Foo

where 421231 bytes would be processed by logical client 4123, just as though we'd read it from a socket; this allows any arbitrary fragmentation on the way back from the server, for example:

id: 4123, length 5
+OK
id: 4123, length 4096
(first 4096 bytes of the output from the GET, in regular RESP, **incomplete RESP payload**)
id: 4123, length 4096
(next 4096 bytes of the output from the GET, in regular RESP, **incomplete RESP payload**)
id: 4123, length 4096
(etc etc)

the point being that the server can now interleave payloads from other sub-clients in-between these, as it chooses - if we had created a client 5422, we might get a response for 5422 in between two parts of a single RESP fragment for 4123

The downside to this: it is a lot more complex, and clients will need to do more routing of bytes. It isn't much more complex, though, because even for a "entire RESP responses only" scenario, clients would still need to track what is outstanding on a per-sub-client basis, routing complete RESP responses (rather than partial RESP data) to reactivate and complete those items.


Of course, you might disagree entirely, and think that we can do everything just in RESP, i.e.

> CLIENT SWITCH 4123
+OK # next command will operate in logical context of 4123, as long as it shares a socket

but ... that makes me very nervous; I've met users before, and they will find ways to screw this up. I'm thinking of somebody using an older version of a client and the omnipresent ad-hoc API to execute MPXHELLO or CLIENT SWITCH or whatever directly, completely breaking all the invariants of how the library expects a library to behave.

@mgravell
Copy link

mgravell commented Dec 21, 2023

Perhaps a bigger problem: I'm aware some folks use a smart proxy in front of their redis servers, and in particular in front of cluster; this is going to be hell for them. I honestly wonder whether that consideration makes it impossible.

Can I quietly wave a hand in favor of the much more limited #12716; selfishly, this obtains the bits I care about without introducing huge complex "break the world" changes to the protocol itself, which will take much longer to develop, test (esp. auth concerns), deploy, and get support in proxies and libraries.

The proposal here seems ambitious, but potentially too ambitious to be achievable in moderate timescales. If it is what people want: I can absolutely "do my part", but... I see lots of places that would make this trip.


To echo a comment from that related post:

I agree that there are some merits to a fully multiplexed protocol, but: this is a huge upheaval - it is a much much bigger change than RESP3, and support for that is still patchy now. My concern is whether we can get most of the benefits without that huge level of complexity. Again: the full multiplexing is an order of magnitude a bigger change than RESP3

@madolson
Copy link
Contributor

Perhaps a bigger problem: I'm aware some folks use a smart proxy in front of their redis servers, and in particular in front of cluster; this is going to be hell for them. I honestly wonder whether that consideration makes it impossible.

The smartest thing might just be for the smart proxies to use it internally to connect to the cluster, and ask the individual clients to just connect normally. The benefits outlined here don't really apply if you are putting a proxy in front of the cluster.

@mgravell
Copy link

but the client might not know ;p

@NickCraver
Copy link

I think the scariest thing to me in the proposal is support for multiple authentication contexts on a single port. I'm a bit fuzzy on this from discussions thus far ("may" being the term), but ultimately it looks like the concept is aimed support multiple users with different access multiplexed over the same port. And if anything goes wrong at any point in that multiplexing, client ID reuse, etc.: bad things happen. I'd say it's one level of bad to get commands wrong coming on the protocol but much worse if those can be mixed up across authentication contexts.

To Marc's point: load balancers could make this worse again with any delta/change/failover during runtime (often due to clouds patching). IMO something like this should not support multiple authentication contexts on a port. If you try to auth with different credentials: that's would just be rejected. In all of the throughput scenarios we've been approached with - it's the same authentication/user/token wanting multiple connections for throughput and blocking usages (though these are covered by #12716 much more simply). If something like this proceeds, I'd push back against mixed auth/ACL contexts on a single port - there are lots of ways that can go wrong and unless there's a critical need for such a case, the risk likely isn't worth it. Put another way: a lot of the issues raised for justifying multiplexing are totally on point: I agree and that's why we do it in SE.Redis on existing RESP, but how many of those cases need multiple ACLs? I'd love to understand if there's a suite of use cases we just haven't seen yet.

I know we're in the same boat, but I echo the sentiment that this is much more complex than RESP2->RESP3 was, and riskier. It's hard for me to currently imagine that risk is justified when we can get most of the gains and practical issue resolutions with less complexity. More protocols means more complexity (in every client) and we can't easily drop old ones - we have to support older servers for quite some time because for better or worse, Redis was awesome going back a long way and some users are on quite ancient versions.

IMO, a new protocol is a very high bar and needs to have sufficiently high gains to offer - I'm not sure we actually have that here vs. having a path on blocking commands while allowing multiplexing on the existing protocol to eliminate head of line blocking. Admittedly, I'm biased because SE.Redis already does multiplexing that works pretty well on RESP2/3, so the only gains to be had are blocking paths - if we solve that, every client could multiplex that choses to.

@mgravell
Copy link

mgravell commented Jan 2, 2024

I did lots more thinking on this over the holidays; my increasing thought is that this is a hugely disruptive change. Disruptive isn't by itself bad or good; IMO we need to get a very clear view on what the intended impact is for this, both negative and positive, to see how the disruption stacks against the benefits. At the technical level, this seems directly comparable to the HTTP/1.1 to HTTP/2 migration

Personally - and I emphasize that this is just a point-in-time subjective opinion thing and is open to revision - I think the feels like a bad direction, but I'd love to start mapping out the impact stuff more formally. I can't really list all the areas yet, let alone qualify them as pros/cons - a lot of them would need technical specs to see whether they're affected (positively or negatively); for example:

  • sockets: fewer to manage
  • bandwidth: potential risk of saturation of single pipe, if any layer ever pauses for example to parse/process payloads (vs offload, but offload demands more complicated memory management)
  • latency: same points as bandwidth; additionally, head-of-line blocking depends on multiplexed framing mechanism - partial vs complete RESP messages, etc (but partial messsages require independent buffering per inner-stream)
  • security: significant concerns if different user contexts allowed to cohabit - any failure could lead to user switching
  • clients and protocol-aware proxies: significant effort
  • transactions: are able to use unrelated MULTI/EXEC on multiplexer connection, but counterpoint: these days I see more Lua than MULTI/EXEC, because it is very hard to use MULTI/EXEC in non-trivial scenarios regardless of multiplexing
  • lots more here, I'm guessing

I would be very keen to get a better view of the intended usage scenarios, to assess whether this level of impact is warranted i.e. gets a suitable payoff; in particular, a lot of things can already be multiplexed if a client wants to, without any protocol changes; the things that need special attention without changing anything:

  • transactions: can't use WATCH as easily - needs a little coordination, but again: just use Lua
  • current database: needs tracking and auto-switching in the client
  • user-switching: just don't
  • blocking operations: just don't

(source: that's exactly what SE.Redis already does)

So I guess what I'm looking for is the killer answer to:

what you can do with this new multiplexing proposal, that you can't already do today?

Possible answers off the top of my head:

  • full transaction support (see above; most folks just use Lua)
  • blocking operations
  • user-switching on a shared connection (which I think is a very bad idea from a security perspective)
  • fixing head-of-line blocking in shared connection (depends on technical details - the versions that "fix" this are the ones that require the most changes and complexity in the clients, for per-stream buffering)

So of that, the only one that is a "big win with few drawbacks" from my perspective is the blocking operations support; and don't get me wrong: you know I want a solution to that, but: I wonder whether this proposal is massively over-complicating things, and maybe there are simpler ways of skinning that particular cat

@ohadshacham
Copy link
Author

Thanks for your super-detailed response!

The motivation for this change is to provide a general interface that provides pipeline performance while allowing the use of regular semantics, as if a connection is used for each client. This holds for current Redis commands as well as for any future command added to Redis with its new semantics and requirements. Our implementation just added 6 bytes as metadata before each command (response or chunk of data). The modifications to Redis are not small (also not huge); however, most of the complications arise due to the fallback required when using a single querybuf and a single cob to serve each multiplexed connection.

Regarding different users having different ACLs on the same connection, do you really see this as a concern, even though all requests arrive from the same client machine? I agree that data can be mixed due to a bug in Redis, but the same could be true if a bug exists on the client side.

What do you think?

@NickCraver
Copy link

Regarding different users having different ACLs on the same connection, do you really see this as a concern, even though all requests arrive from the same client machine? I agree that data can be mixed due to a bug in Redis, but the same could be true if a bug exists on the client side.

Absolutely! For the simple fact that multi-tenant applications exist. Related: multiple processes on a single machine running as different security contexts/users (for minimal permissions). And from experience: both are super common. This includes very sensitive authentication providers we deal with.

There's no inherent security relationship with "1 per client machine", but there are plenty of them associated with particular sockets. Connections are allowed or not even at firewall levels on the initial connections for example, same for HTTP and pretty much every connection that exists. There are so many assumptions built up over the years that a security context is on a connection (ip:port <-> ip:port). If Redis started with this and built-up over time there'd be far less of a concern here, but this is a major change to do at this point with much more risk.

As you said here, a bug on either side makes this a dangerous game to play, and we're talking about something that zero clients in the ecosystem handle today either. We should assume there will be issues here, and adding security escalation concerns on top of that is not awesome. You could argue that once this is much more mature, then it could allow different ACL contexts - that'd add some maturity safeguards at least. But that's just 1 aspect here - will let Marc chime in on a lot of the others above.

@mgravell
Copy link

mgravell commented Jan 8, 2024

I'm going top leave ACL topics to Nick, since he has opinions, and stick to protocols :)

It may help to actually discuss some kind of proposal implementation at the byte level, so we don't talk past each-other; if you're adding 6 bytes, I'm guessing that is just a session identifier (client-id?) before all/some payloads, without additional sub-message framing (i.e. the following message is exactly an entire RESP message) - which means that the head-of-line-blocking issue still applies; that does mean that the implementation is much simpler (no partial buffers to stash and manage), but: we probably need to be very clear on that so we can correctly discuss what the proposal does/does-not address.

My first concern is that this is a protocol break; a major "reset the world" change in terms of library support; that's not inherently bad, but the collective "we" are only just finishing RESP3, which has dragged - although I will allow that this topic has some interesting technical advantages for my own selfish needs.

My second concern is that this protocol break could have weird and unexpected outcomes (which makes me think "unexpected exploit") if it is possible to get any scenario where the client/server haven't properly agreed that they both understand this change in advance, and there is any even remote ambiguity. This could be mitigated as simply as a new RESP primitive; for example, if the session-id is conveyed as a redis integer, I would strongly suggest against :3412\r\n, but instead something like @3412\r\n, where @ is a new previously never used RESP token, which means that any client or server seeing it unexpectedly: will burn the connection with fire. I'm presuming from the "6 bytes" comment that this is not proposed as a RESP3 "attribute", since a RESP3 "attribute" would (if my math is right) need at least 12 bytes for a single-byte key and value pair. But then the question becomes: "does this work on RESP3? RESP2? the old text protocol?"

There's also some discussion like "does this token get sent on every message, or only when the message switches context?"

@ohadshacham
Copy link
Author

This is just one way to do it, but we used 6 bytes (no resp) that have two bits for the client id and four for the size of the incoming message. For control commands, such as creating clients, we use 0 to denote it as a control command, followed by 2 for the opcode (e.g., create client opcode) and 2 for the ID (as the requested client ID). The client IDs are unique and private per multiplex (mpx) connection but not unique across different mpx connections. There is an internal mapping from mpx connection + client ID to the client object.
Followed the six bytes there is the data, it can be either command or chunk of data corresponds to the previous command that was chopped. Data arriving without an header corresponds to the previous header, and should be included in its size. But theoretically, header may be added whenever one wants to add as long as the sizing is calculated correctly.
Following the header there may RESP2, RESP3, or even inline commands.

Regarding the head-of-line blocking, both Redis and the client can chop the command/response to chunks. We still left with packet loss since it is tcp at the end.

@ohadshacham
Copy link
Author

Just to summarize briefly and address any additional questions, all the answers are related to decisions that were made and can be modified during the design discussion.

The motivation is to colocate clients on the same connection while supporting full functionality for each client and providing performance similar to a pipeline. Each client on the multiplexed connections has the same functionality as a regular client and is unaware it is part of a multiplexed connection. This approach offers easy connection management, full client functionality, performance similar to a pipeline, and the option to be fair on both the client and server sides. Fairness is achieved by sending a limited chunk at each time, preventing the starvation of requests and responses for other collocated clients.

we do not (presumably, I can only speak for myself) want consumers executing ad-hoc redis commands that change this context - IMO libraries should handle this internally to prevent malicious code trying to sneakily switch identity by issuing clever commands; we should not allow this inside Lua, or in modules

I agree we should not allow users to handle this. The motivation is mostly for clients to have full functionality and reduce the hassle of connection management.

these protocol changes: apply to which protocol versions? RESP2? RESP3? the almost-obsolete cli text protocol?

It depends on the way we decide to implement it. In our implementation, we used 6 bytes as a header, and all following data can be either RESP2, RESP3, or inline commands. We support all.

Do sub-clients appear as clients in the CLIENT LIST sense? and do they get their own client id? is the client id the same identifier used on the protocol? or is there a separate identifier that isn't normally known except to the server and from the response to "create new client"?

Clients do appear in the client list and have their own IDs. These IDs are different from the ones used for the multiplexed connection since IDs for multiplexed connections are unique per connection only. There is a mapping at the server between the client ID at each multiplexed connection and its corresponding client object. These collocated clients also appear in the CLIENT LIST alongside their unique IDs; however, collocated clients do share IP and port information.

How would we handle authentication & CLIENT KILL commands? I assume that the connection that sends MPXHELLO has to send AUTH. Do all internal connections also need to authenticate? Can an internal connection use a user with more permissions than the multiplexing connection?

AUTH is used per collocated client but not for the multiplexed connection. The multiplexed connection only manages clients and does not perform any commands. However, all internal connections need to authenticate.

Can the internal connections be killed with a CLIENT KILL command? If a CLIENT KILL command kills the multiplexing connection, do all internal connections get killed, too?

A collocated client can be killed using the CLIENT KILL command, and the command should use the client ID since all collocated clients share the same IP:port. If a multiplexed connection is killed, then all its collocated clients are also terminated.

what happens if you try to use a client id not associated with that socket?

The server will respond with an error. Client IDs used by the multiplexed protocol are unique for each multiplexed client, but different multiplexed connections may use the same IDs. These assumptions, of course, can be changed.

can you see which clients share a socket via CLIENT LIST?

Yes since they share the ip and port, but it's a decision we need to make.

presumably all client state (above the socket) is retained at the sub-client level, i.e. database number, caching hints, no-reply hints, etc
what can a sub-client do / not do? can a sub-client reset? can it change protocol version just for that sub-client? it sounds like you do want it to be able to auth/hello to have a different identity

A collocated client can perform any action, just like a regular client. Each collocated client can use a different protocol version, authentication, reset, etc.

Perhaps a bigger problem: I'm aware some folks use a smart proxy in front of their redis servers, and in particular in front of cluster; this is going to be hell for them. I honestly wonder whether that consideration makes it impossible.

Do you mean 'hell' since the proxy will use a multiplexed connection that will multiplex the client connection, and this client might send 'mpxhello.' In this case, we can add a control header with an error. This header in any case needs to be parsed by the proxy. The proxy, in this case, should open a dedicated multiplexed connection for this client's multiplexed connection.

@mgravell
Copy link

Ultimately, I guess I'm still struggling with the complexity that this is going to introduce, at every client/proxy level. Managing parallel data streams is hard. And that's before we've dealt with the matrix of up-level vs down-level comms (for clients and servers separately) - i.e. what is the failure mode in each case that isn't all-tick. I would love to see a very clear problem statement of what we're trying to solve with this level of proposed complexity, ideally with some kind of expected metric, for example (and I'm making this up as I type):

  • The problems we're targeting are parallel socket connect overheads (multiple TLS handshakes, etc), socket exhaustion, and the overhead of querying a large number of server sockets
  • The use of protocol-level multiplexing allows multiple clients to share a single socket, and by using a new framing layer above RESP, we can achieve this without head-of-line blocking issues for large payloads for either client or server messages
  • This also allows existing multiplexed clients to implement blocking operations without stalling the connection
  • it is projected that at scale, we can achieve a SOME_NUMBER % throughput increase using a multiplexed connection vs the same number of clients on separate connections

Is that anything like your reasoning? Any ideas on the bottom bullet (some kind of metric)?

Also: again, I emphasize that this is huge impact - almost a brand new protocol (or at least protocol wrapper). At that point, I genuinely wonder whether it should even colocate on the same port. I also wonder if there's anything we can learn from H2 vs H3 here? H2 and H3 are also multiplexed stream protocols, with H3 deciding to move from TCP to UDP, for reasons. I wonder whether those reasons apply here.

@ohadshacham
Copy link
Author

  • Clients can easily manage sync interfaces without using a different connection for each client.
  • As you mentioned, a new client does not require TLS handshakes, significantly reducing the overhead of connection storms since Redis is not dealing with TLS negotiation all the time. Connection storms can be a significant issue, especially on small servers when the negotiation is not offloaded.
  • Blocking commands do not block the whole connection but only the corresponding client.
  • Performance improved by 350% while using a few multiplexed connections with a few tens of clients on each versus using a single connection for each client. This shows similar performance to using a pipeline, albeit slightly lower due to additional work needed for handling the client and metadata. This is due to better locality and less system calls. Also network packets are better utilized which also let servers provide better throughput.
  • Future additions to Redis could be easily integrated and used since each client is semantically divided from other clients on the multiplexed connection.
  • Fairness can be achieved in both client and server side, head of line blocking still exists at tcp level though.

I also wonder if there's anything we can learn from H2 vs H3 here? H2 and H3 are also multiplexed stream protocols, with H3 deciding to move from TCP to UDP, for reasons. I wonder whether those reasons apply here.

The switch to QUIC can significantly simplify the implementation as well as solve head-of-line blocking that still exists at the TCP level in our case (as in HTTP/2). However, the problem is that most of the performance gain we achieved in our implementation is due to the usage of a single querybuf and a single cob that improves locality. Commands are processed directly from the shared querybuf, and responses are written directly to the shared cob (we fallback to a private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers.

#12873 (comment)

@madolson
Copy link
Contributor

@mgravell One way we can mitigate the complexity is with https://github.com/aws/glide-for-redis. I'll save you from going through all the details, but the high level idea is that it's a rust Redis driver that tries to abstract away the actual transport of commands from the client to the server. Each client builds a high-level wrapper that communicates with the rust core, and the rust core is responsible for shipping that command off to Redis. In this architecture, clients are agnostic to all of the complexity introduced by this multiplexing interface as its being handled by the core.

@zuiderkwast
Copy link
Contributor

One of the nice things about Redis is that RESP is really simple (even RESP3 is) and can be implemented and hand-debugged easily when needed. Although GLIDE seems to be a really cool project, abstracting away the complexity is not the same thing as keeping the protocol simple.

If we add this multiplexing feature, I hope it will not be RESP over a binary protocol (which is what I would call the 6 byte chunk header where the numbers are stored in binary, while numbers in RESP are stored as ASCII). I wish that the multiplexing can be handled within RESP, either using RESP3 attributes or some new RESP4 chunked reply feature.

@mgravell
Copy link

"just throw away your existing network core and switch to wrappers over the Go core" is not hugely compelling, at least to me :)

@madolson
Copy link
Contributor

"just throw away your existing network core and switch to wrappers over the Go core" is not hugely compelling, at least to me :)

It wouldn't be to me either :), if what you have works for you we're not going to deprecate it. If you don't have any of the logic through, it might be an easier alternative.

If we add this multiplexing feature, I hope it will not be RESP over a binary protocol (which is what I would call the 6 byte chunk header where the numbers are stored in binary, while numbers in RESP are stored as ASCII). I wish that the multiplexing can be handled within RESP, either using RESP3 attributes or some new RESP4 chunked reply feature.

I am still advocating for RESP instead of a new binary protocol. At some point we might consider a binary variant of RESP, but I also like the ease of debugging RESP. We could probably just continue extending RESP3 instead of releasing a RESP4, as long as there are no breaking changes. This would be an opt-in feature anyways.

@rueian
Copy link

rueian commented Jan 23, 2024

out-of-band message received for a specific client is preceded by an header denote this message is targeted to this client. In this case, when many clients at the same multiplexed connection are subscribed to the same channel for example. The message is sent many times, each preceding with an header denote to which client the message is targeted to. We can further extend it to the multiplexed connection level. Maybe add a few headers followed by a message. The client side can also do an optimization and register all its pub subs for example on a dedicated client while scatter the received messages.

The volume of out-of-band messages including client tracking invalidations is typically huge. In this proposal, a client must still implement pipelining to a sub client to avoid duplicated server-assisted invalidations, which make this proposal less appealing, IMO.

@ohadshacham
Copy link
Author

ohadshacham commented Feb 1, 2024

Attempting to summarize the issues raised in this thread:

Question 1
How will this multiplexing protocol handle out-of-band messages, including pubsub, client side caching, and others?
Answer 1
An out-of-band message received for a specific client is preceded by a header denoting that this message is targeted for that client. In this case, when many clients on the same multiplexed connection are subscribed to the same channel, for example, the message is sent multiple times, each preceding with a header indicating which client the message is targeted for. We can further extend this to the multiplexed connection level, possibly adding a few headers followed by a message. The client-side can also optimize and register all its pub-subs, for example, on a dedicated client while scattering the received messages.

Question 2
Does this require a change in protocol? Can we use the attribute type for this?
Answer 2
Using attributes to capture metadata can be challenging but feasible. To ensure fairness among different client responses in a multiplexed connection, Redis will, at times, send the response in chunks, with each chunk preceded by a header. The header may not necessarily appear between valid parts of the protocol; thus, in this case, attributes cannot match. For instance, when returning a string response of hundreds of megabytes, it will be divided into smaller chunks, each preceding with a header. In this scenario, the chunks are segmented within the string itself. The same fairness option is also available for clients that send a large command and wish to send chunks to ensure fairness.

Question 3
Should MPXHELLO be the first command? I would avoid MPXHELLO and instead integrate this with the existing HELLO command, since if these are separate commands, we need to carefully define the interactions between them, in case a user sends both HELLO and MPXHELLO. Also, what if MPXHELLO is sent after other commands, does it return an error?
Answer 3
The MPXHELLO command should be the initial command, signaling to Redis that this connection is a multiplexed one. While HELLO might be sent for each client within the multiplexed connection, it is not intended for the connection itself unless we consider defining some inheritance mode. In the event that MPXHELLO is received after other commands, we may consider forcefully closing the connection. Alternatively, using another port for multiplexed connections is also an option, reducing the necessity for MPXHELLO.

Question 4
Can we use something like RESP-over-HTTP/2 or RESP-over-QUIC to handle head-of-line blocking?
Answer 4
When we combine several clients at the same connection, writing a large response from one client, for example, hurts serving other clients at the same multiplexing connection. This provides different behavior when using a regular client or a collocated client at a multiplexed connection.

This also lets clients send the commands in chunks and provides fairness between the clients it serves.

The switch to RESP-over-QUIC can significantly simplify the implementation as well as solve head-of-line blocking that still exists at the TCP level in our case (as in HTTP/2). However, most of the performance gain we were able to achieve in our implementation is due to the usage of a single querybuf and a single cob. The commands are processed directly from the shared querybuf, and the responses are written directly to the shared cob (we fallback to private one when needed for fairness, blocking, etc.). Using a different stream per client will generate more system calls and also hurt the locality due to the excessive queries and shared buffers.

Question 5
How would we handle authentication & CLIENT KILL commands? If a CLIENT KILL command kills the multiplexing connection, do all internal connections get killed, too?
Answer 5
A collocated client can be killed using the CLIENT KILL command, and the command should use the client ID since all collocated clients share the same IP:port. If a multiplexed connection is killed, then all its collocated clients are also terminated.

Question 6
What happens if you try to use a client id not associated with that socket?
Answer 6
The server will respond with an error. Client IDs used by the multiplexed protocol are unique for each multiplexed client, but different multiplexed connections may use the same IDs. These assumptions, of course, can be changed.

Question 7
Can you see which clients share a socket via CLIENT LIST?
Answer 7
Yes since they share the ip and port, but it's a decision we need to make.

Question 8
what can a sub-client do / not do? can a sub-client reset? can it change protocol version just for that sub-client? it sounds like you do want it to be able to auth/hello to have a different identity.
Answer 8
A collocated client can perform any action, just like a regular client. Each collocated client can use a different protocol version, authentication, reset, etc.

Question 9
What about users that use a smart proxy in front of their redis servers, and in particular in front of cluster?
Answer 9
It can be an issue if the proxy uses a multiplexed connection and receives an MPXHELLO for one of its collocated clients. We can address this by adding a control header with an error in such cases. This header, in any case, needs to be parsed by the proxy, and the proxy should open a dedicated multiplexed connection for this client's multiplexed connection.

Question 10
I think the scariest thing to me in the proposal is support for multiple authentication contexts on a single port.
Answer 10
This is something we should discuss and decide during the design. It should be easy not to support it.

Question 11
I would love to see a very clear problem statement of what we're trying to solve with this level of proposed complexity.
Answer 11

  • Clients can easily manage sync interfaces without using a different connection for each client.
  • A new client does not require TLS handshakes, significantly reducing the overhead of connection storms since Redis is not dealing with TLS negotiation all the time. Connection storms can be a significant issue, especially on small servers when the negotiation is not offloaded.
  • Blocking commands do not block the whole connection but only the corresponding client.
  • Performance improved by 350% while using a few multiplexed connections with a few tens of clients on each versus using a single connection for each client. This shows similar performance to using a pipeline, albeit slightly lower due to additional work needed for handling the client and metadata. This is due to better locality and less system calls. Also network packets are better utilized which also let servers provide better throughput.
  • Future additions to Redis could be easily integrated and used since each client is semantically divided from other clients on the multiplexed connection.
  • Fairness can be achieved in both client and server side, head of line blocking still exists at tcp level though.

Question 12
If we add this multiplexing feature, I hope it will not be RESP over a binary protocol. I wish that the multiplexing can be handled within RESP, either using RESP3 attributes or some new RESP4 chunked reply feature.
Answer 12
We can discuss this during the design, however, we will target towards RESP and not towards a binary protocol.

Question 13
The volume of out-of-band messages including client tracking invalidations is typically huge. In this proposal, a client must still implement pipelining to a sub client to avoid duplicated server-assisted invalidations, which make this proposal less appealing
Answer 13
We can extend the work to return a single out-of-band message per multiplexed connection by extending the header/attribute for that purpose.

@madolson
Copy link
Contributor

@redis/core-team I would read the top comment, the proposal summary, and the previous comment, which summarizes open questions.

My big pending question is who do we think the user of this will be. This was originally envisioned by
AWS primarily to better support proxies. We could implement it with that in mind, focusing primarily to solve proxy problems (such as performance and fairness while owning a lot of complexity). Alternatively, we could envision it primarily as a performance enhancement for typical clients. Instead of the pipelining approach many clients take, they could implement this API instead.

If we decide this is mostly a proxy implementation, we should still consider implementing the blocking callbacks proposed by mgravell. If we think this is the better solution for clients and proxies alike, I think we should make sure this API is as friendly as possible to implement (ideally just extend RESP3).

Initially I was more of the opinion that all clients should adopt this protocol, but the more I think about it the more I believe this really should be more of a proxy feature. Clients should be kept as simple as possible, and heavy weight libraries could adopt this protocol when they are interested in performance.

@oranagra
Copy link
Member

related to topic 4:
i think the approach of using a protocol extension similar to the normal redis commands to multiplex different clients on the same TCP connection is limited, and if we try to apply it to solve some of the more advanced problems we'll have to make it complex.

i.e. in the sense of a single huge argument blocking other virtual clients. And i do think in the case of a proxy we'd want to let use cut-through approach and avoid either side collecting a buffer to be sent after we have all the data.

Trying to solve these problem, and others, with the proposed approach can quickly become complicated, so i'd like to suggest to consider implementing both approaches (obviously separately, so one now and one at some future date).
So if we think about it in this context, we can design something more limited for the shared connection (RESP based) thing and leave some of the more complicated problems to be solved by the other (e.g. QUIC or alike).

related to topics 5 and 7:
i suppose we can add a CLIENT CONNLIST and CONNKILL commands or similar to manage connections, separately from clients.

@criatura2
Copy link

Multiplexing can already be implemented on the clien-side with RESP3 quite easily, therefore I don't see any need for a new protocol.

However, in many cases, a pipeline cannot be utilized either due to command
dependencies

In most cases this is solved by decoupling requests from the connection. For example, in Boost.Redis creating and sending a request looks like

request req;
req.set(...);
req.get(...);

conn.async_exec(req, ...)

where conn above is a connection object that maps into a single physical TCP connection and is shared among multiple independent e.g. HTTP sessions. The connection will coalesce the requests of all conn.async_exec(req, ...) into a single payload and send them with a single write syscall, that can have thousands of commands. I know most clients don't do this but its not the protocol fault.

or because the client generates only a few commands per second.

This makes no sense.

There are things that would make de-multiplexing easier, for example, if the SUBSCRIBE command had a non-push response, but that is a minor thing and not a protocol fault but the way the SUBSCRIBE command was defined.

@mgravell
Copy link

mgravell commented Feb 23, 2024

Multiplexing can already be implemented on the clien-side with RESP3 quite easily, therefore I don't see any need for a new protocol.

I'm not sure it does. In case I've missed something, please demonstrate how to issue a BRPOP in a multiplexed way, that doesn't suffer head-of-line blocking. I'm not suggesting the fix for this needs to be a new protocol, but: I also don't think RESP3 moves the needle very much on this.

@criatura2
Copy link

I'm not sure it does. In case I've missed something, please demonstrate how to issue a BRPOP in a multiplexed way, that doesn't suffer head-of-line blocking.

BRPOP and similar can't be multiplexed in RESP3 and require a new connection, I concede this. On the other hand with client-side-caching and RESP3 push types there is less need for blocking commands. Therefore IMO a new protocol would cover a small number of corner cases which I am not sure are important and therefore doesn't justify the complexity, but that is just my opinion.

@antirez Did you make thoughts about multiplexing when designing RESP3?

@masx200
Copy link

masx200 commented Mar 30, 2024

any update?

@madolson
Copy link
Contributor

@masx200 Redis is no longer open source. Non-Redis maintainers are moving here: https://github.com/valkey-io/valkey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

10 participants