Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Feature proposal: async "block" callbacks #12716

Open
mgravell opened this issue Nov 2, 2023 · 12 comments
Open

[NEW] Feature proposal: async "block" callbacks #12716

mgravell opened this issue Nov 2, 2023 · 12 comments

Comments

@mgravell
Copy link

mgravell commented Nov 2, 2023

At the moment, there are 3 categories of redis interactions:

  • request/response
  • blocking request/response (streams, pops, etc)
  • out-of-band (pub/sub, etc)

As a library author, we have conflicting requests from people, to

  1. Support the blocking operations, and
  2. Minimize the number of connections (we hear this a lot)

These are conflicting and mutually exclusive.

I would like to propose a new category of API usage, to bridge this gap: activated callbacks.

Consider the following scenario:

  • client issues request that would have historically been blocking - stream consumption, for example, with a new flag to toggle this mode XREAD BLOCK 5000 ASYNC ... or similar
  • if no data available immediately, server responds with a message like +ASYNC 162748 where the second part is a connection-specific token issued by the server
  • state is logged, and connection continues serving requests
  • at some later point, either data becomes available or timeout occurs:
  • server issues an out-of-band message, presumably of category ASYNC, including the specific token and the result payload

The result of this is that "blocking" operations can now be issued efficiently without tying up a connection entirely. Multiple "blocking" async operations could be pending on a single connection.

Obviously the out-of-band nature here demands either RESP3 or a mechanism to specify an auxiliary connection id to use for the callbacks. In reality, I'm tempted to say "make this RESP3 only", to avoid inter-connection complications.

This feature would ideally apply similarly to all "blocking" operations. I think it can be applied to the existing commands as an argument, although if there is confusion maybe it also makes sense as a prefix to commands, like the client caching prefix.

Clients would be expected to store the server-issued token and use that to issue any responses. All async tokens would be single-shot only, meaning: they expect at most one reply (zero if the connection dies before it becomes activated, which the client should handle in some way).

@soloestoy
Copy link
Collaborator

Very interesting. Intuitively, it seems like it could have broader applications beyond just block operations. For example, introducing the concept of sessions to decouple the server and connection. It's just an idea, and I haven't thought about it in depth yet : )

@zuiderkwast
Copy link
Contributor

It's a very good idea. In Erlang, it is common that multiple lightweight processes share one connection, where all commands are streamed to redis and the client lib keeps track of the replies and delivers them back to the right caller. Blocking commands are very problematic with this usage. I believe it's the same problem with async-await style programming in other languages.

I don't think it should be a new argument to every blocking command. It's better that it is one command to enable this, like ASYNC BLOCKING ON or an option to HELLO, because it is the client lib that should enable this rather than the app which calls the individual commands. To the app code, the call can still look like a blocking call.

@yossigo
Copy link
Member

yossigo commented Nov 20, 2023

@zuiderkwast Doesn't this Erlang behavior result in undesired head-of-line blocking and excess latency when command latency is not uniform? @soloestoy's point about decoupling sessions from connections would also mean the ordering of replies can be arbitrary and head-of-line blocking can be avoided.

@zuiderkwast
Copy link
Contributor

Doesn't this Erlang behavior result in undesired head-of-line blocking and excess latency when command latency is not uniform?

@yossigo Yes, potentially it can, but it's good for throughput. We keep track of how many commands are pending on each connection and start throttling (dropping commands) if Redis can't keep up the pace. (Using more connections doesn't help if Redis is the bottleneck, though scaling to more cluster shards does help.)

If some code needs to use a blocking command or a slow command, they'd need to use separate instance of the client (or another client).

@soloestoy's point about decoupling sessions from connections would also mean the ordering of replies can be arbitrary and head-of-line blocking can be avoided.

I don't understand how decoupling sessions from connections would work exactly. @soloestoy's idea seemed vague. Maybe you have a more specific idea? Should every command and response be accompained by a session ID? Or something like making all commands return in the way ASYNC? I think that would be excessive. If Redis has a response strait away, better deliver it immediately. But if some slow command can be computed incrementally (let's say KEYS) and the client can handle async results, then sure we can send an ASYNC response for this kind of commands too and expect the client to handle it just like a blocking command.

@zuiderkwast
Copy link
Contributor

The blocking commands have a nice property when it comes to transactions: they can't block inside transactions. Thus, we don't need to worry about async responses for blocking commands inside transactions.

@mgravell In which ways do clients and users minimize the number of connections?

Apart from what we do (multiple lightweight threads sharing the same connection, could also be coroutines), I believe an obvious way is with an async API, like redis.asyncCommand("GET x", function callback(reply) { ... }); which allows issuing multiple commands before the previous commands have returned. Are there other ways?

@mgravell
Copy link
Author

@zuiderkwast sorry, didn't see that update; in SE.Redis this is all hidden inside the library, so from the customer's perspective: they just spin up the library API at the start and make requests; the library deals with routing (cluster, replicas, etc) and ordering of commands on individual connections; since it is .NET, async continuations work with the await pattern rather than passing in a callback, but fundamentally: the caller wouldn't care - they'd just do something like

var val = await db.StringGetAsync(key);

The library deals with all the concurrency concerns, i.e. we fully expect that multiple code-paths (independent requests, whatever) could be issuing requests to the same db instance in parallel - it is entirely thread safe.

My intention would be that we could implement blocking operations in entirely the same way, i.e.

var val = await db.SomeBlockingMethod(...); // for example XREAD BLOCK

so: from their perspective, it works 100% identically; behind the scenes the library would deal with hooking things up such that we can signal the pending operation as completed (whether via success, timeout, or something worse)

@zuiderkwast
Copy link
Contributor

@mgravell Right, async programming can be done in many ways. Any client reusing the same connection for multiple "users" internally need to take care of a lot of special cases, like commands that affect the state of a connection (like WATCH) and automatic reconnects is another tricky topic if any keys are watched when it happens. Just a few examples.

That said, I don't see any new problem with this async-blocking feature. I think it's strait-forward to implement it in a client.

@yossigo Speaking of head-of-line blocking, the worst example of head-of-line blocking is caused by blocking commands. :) So this feature would eliminate that, assuming all other commands are reasonably fast.

@soloestoy
Copy link
Collaborator

My idea is that the "async callback" can be used not only for blocking commands but for all Redis commands. Additionally, we can achieve decoupling of command execution and connection by adding a session layer, enabling connection reuse in unordered scenarios. For example, when multiple clients share a TCP connection, even though some clients already have similar implementations, Redis cannot differentiate between different clients behind a single TCP connection. Therefore, connection reuse relies on strict order consistency for command sending and receiving replies. Furthermore, operations with states such as block, watch, and pubsub cannot reuse connections.

If we add a session layer, we can achieve complete connection reuse for clients, regardless of command order and state. For example, when multiple clients share a connection, each client can initialize by requesting and obtaining session information from Redis, including an ID and seq (a sequence number). As a result, each client will have a unique session ID. Within a single session, the sequence number can guarantee the internal order of commands within the session.

For example, let's consider two clients, A and B, using a shared TCP connection to access Redis. During initialization, they each obtain their respective sessions: A-0 and B-0. In subsequent command sending, they both need to include their session information (A-0 and B-0). In synchronous mode, Redis immediately returns the result after executing a command (the result also includes the session information of the client that sent the command), and the client waits for the execution result. In asynchronous mode, after executing a command, Redis does not proactively return the result to the client. Instead, it temporarily caches the result (the cached result should also have a timeout, such as 10 seconds), and the client can periodically use the session to query the result from Redis.

Furthermore, with the presence of sessions, clients no longer need to use the same TCP connection to asynchronously retrieve command results. As long as they provide the correct session information, they can use any available connection to fetch the results, effectively decoupling the connection. In this scenario, even if a network anomaly causes a disconnection, it does not affect the interaction between the client and Redis. Both the client and Redis have recorded the session information and no longer rely on maintaining the connection state. Client-side command retries are also safer in this case. The client can determine whether its previously sent commands were received based on the id and sequence number recorded in Redis' current sessions. On the Redis side, even if the client sends duplicate commands, they can be rejected based on the session's ID-seq, effectively identifying and rejecting the duplicates.

@zuiderkwast
Copy link
Contributor

@soloestoy Interesting. This idea is very similar to #12873.

@rueian
Copy link

rueian commented Dec 23, 2023

I like the idea of the async block. This looks simple and allows reusing a connection among blocking commands.

I wish this could be done with a few modifications: No need to specify the "ASYNC" as part of the command since I don't want library users can turn async off.

@mgravell
Copy link
Author

mgravell commented Dec 23, 2023

@soloestoy I agree that there are some merits to a fully multiplexed protocol, but: this is a huge upheaval - it is a much much bigger change than RESP3, and support for that is still patchy now. My concern is whether we can get most of the benefits without that huge level of complexity. Again: the full multiplexing is an order of magnitude a bigger change than RESP3

@nihohit
Copy link
Contributor

nihohit commented Feb 4, 2024

This is beyond the scope of this proposal, but I wonder if this will allow us to hand over slow commands (e.g. KEYS) to another thread, and continue operating without those commands blocking the other operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants