Add consumer flow control API #333

henry701 · 2023-05-06T14:19:32Z

Is your feature request related to a problem? Please describe.

In high-volume scenarios, consuming from the start of a very big Stream (many chunks), consumers that consume messages asynchronously and take very long to consume messages will keep receiving chunks until the application runs out of memory.

Describe the solution you'd like

Expose chunk-level control flow mechanism at the Consumer API level.

This could be implemented by exposing a ChunkListener on the Consumer builder. This listener callback would receive a Context object with information about the received chunk, much like the existing API design today.
The crucial part is that the Context would also have a callback so that the listener could control when to request additional chunks (by internally sending Credit messages using the Client).

Example (pseudocode from the top of my head, will elaborate and fix minor errors later, if applicable):

ConcurrentLinkedQueue<Consumer.ChunkListener.ChunkContext> unhandledChunks = new ConcurrentLinkedQueue<>();
AtomicLong pendingMessages = new AtomicLong(0);
Consumer = environment.consumerBuilder()
  .stream("sampleStream")
  .name("application-1")
  .manualTrackingStrategy()   
  .builder()
  .messageHandler((context, message) -> {
    /* handle message asynchronously here... */
    // Assume all remaining code inside this MessageHandler is run ASYNCHRONOUSLY after the message is processed by an external Executor or Thread or whatever
    long localPendingMessages = pendingMessages.decrementAndGet();
    if (conditionToStore()) {
        context.storeOffset();   
    }
    // We're out of messages, mark the chunk as handled so that Client sends a Credit request and more chunks come in later
    if (localPendingMessages == 0) {
        unhandledChunks.poll().markHandled();
    }
  })
  .chunkListener(chunkContext -> {
    unhandledChunks.offer(chunkContext);
    pendingMessages.incrementAndGet(chunkContext.chunkSize());
    // chunkContext.markHandled();
    // Alternatively, expose the quantity of messages to ask for, via .credits(int credits)? Or have both APIs?
  });

Of course, this is just a naive implementation draft. Better flow control would have a buffer to ensure high throughput, instead of asking for credits only when the consumer is already idle.

The default chunkListener would correspond to just asking for additionalCredits, this way current behavior is preserved in a backward-compatible way.

As a bonus, this explicit flow control would make integration with RxJava possible as well, which could be very useful for integrating this library together with Spring, for instance.

Describe alternatives you've considered

An alternative would be to consume messages synchronously from the connection thread, but this blocks some response handling and causes some RPC calls to time out, so it's not ideal.

Additional context

There are similar discussions on issue #262 and https://stackoverflow.com/questions/71932072/add-delay-in-consuming-messages-in-rabbitmq-stream

The text was updated successfully, but these errors were encountered:

henry701 · 2023-05-06T14:50:45Z

Also, I'm available as a contributor to implement this feature, if it would be an option. Let me know.

acogoluegnes · 2023-05-15T13:12:27Z

Thanks for the analysis.

We indeed need a way to give some sort of control over the credit mechanism to the client API. I'd like to avoid references to chunks, as it's an implementation detail.

I had thought we could add a MessageHandler.Context#processed() method that the application would call as soon as it's done with a message. What's done behind this call is up to the client library. There could be different strategies:

provides 1 credit for the first message of the chunk and does nothing for the next messages in the chunk. That's pretty much what we do currently and is a pretty aggressive strategy.
does nothing for the messages of the first half of the chunk, provides 1 credit for the message in the middle of the chunk, and nothing for the next messages in the chunk. New chunks should arrive soon enough and the consumer should not starve, that's a good tradeoff.
does nothing for the first messages and provides 1 credit for the last message of the chunk. That would work for slow consumers: a chunk would have time to arrive during the processing of the last message.

WDYT?

henry701 · 2023-05-15T16:00:50Z

Thanks for the response!

I can't entirely agree that chunks are an implementation detail since they are a part of the stream protocol itself, but I see where you're coming from: Having this control via the credit mechanism only available via the Chunk granularity is not a perfect fit abstraction-wise for consumers that only care about messages.

I think that with careful API design, we can achieve both goals, those being:

Expose a simple API for simple consumers that only care about messages
Retain the capability to control flow based on chunks of messages, for consumers that care about it

Your idea of those three different strategies can be expressed as a group of strategies that mark each message as processed. Other strategies such as chunk-based ones, or even manual-callback ones, wouldn't be concerned about the individual processing status of each message.

Instead of using MessageHandler.Context#processed() to mark each message as processed, we could instead have a getContext() method on those three strategies (we may make them implement an interface named MessageProcessingAwareFlowControlStrategy that returns a new type MessageProcessingAwareStrategy.Context, which has a method void markProcessed(Message message).
This way we would avoid exposing specific API to those who wouldn't need it, and we would retain the capability to use or create other flow control strategies for consumers that need the personalization.

Other very flexible API ideas that could stem from this approach are:

Ask for X credits while there are Y or less chunks left to process
Ask for X credits while there are Y or less messages left to process

Those two alone would offer a lot of "automagic" flow control to consumers because you can specify the minimum messages or chunks that would be ideal to have in memory (queued or processing) at each given point in time, oriented towards high throughput.

Your thoughts?

acogoluegnes · 2023-05-16T08:55:34Z

I can't entirely agree that chunks are an implementation detail since they are a part of the stream protocol itself, but I see where you're coming from: Having this control via the credit mechanism only available via the Chunk granularity is not a perfect fit abstraction-wise for consumers that only care about messages.

Right, I meant "implementation detail" for the "high-level" API of this client library (Consumer, etc).

The advantage of using something like MessageHandler.Context#processed() is that the context knows exactly where the message lies in the chunk (the Message interface does not know anything chunks and it should stay like this). I don't see how we could easily add another abstraction that knows about the details of the end of the processing of a message and its place in the chunk (MessageHandler.Context knows all this).

Feel free to submit a PR, I'll be happy to have a look and to discuss the design.

henry701 · 2023-05-16T12:04:52Z

Right, I meant "implementation detail" for the "high-level" API of this client library (Consumer, etc).

Ah, sorry for the mixup there.

The advantage of using something like MessageHandler.Context#processed() is that the context knows exactly where the message lies in the chunk (the Message interface does not know anything chunks and it should stay like this). I don't see how we could easily add another abstraction that knows about the details of the end of the processing of a message and its place in the chunk (MessageHandler.Context knows all this).

Good point, it would be easier to implement by receiving MessageHandler.Context instead of Message in markProcessed.

Feel free to submit a PR, I'll be happy to have a look and to discuss the design.

I will submit a PR later.

Thanks!

acogoluegnes · 2023-07-17T08:41:34Z

Fixed in #374.

After changes for rabbitmq/rabbitmq-stream-java-client#333.

After changes for rabbitmq/rabbitmq-stream-java-client#333. (cherry picked from commit c594c77) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml

After changes for rabbitmq/rabbitmq-stream-java-client#333. (cherry picked from commit c594c77) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml (cherry picked from commit f57c6db)

After changes for rabbitmq/rabbitmq-stream-java-client#333. (cherry picked from commit c594c77) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml (cherry picked from commit f57c6db) (cherry picked from commit 422be98) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml

After changes for rabbitmq/rabbitmq-stream-java-client#333. (cherry picked from commit c594c77) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml (cherry picked from commit f57c6db) (cherry picked from commit 422be98) # Conflicts: # deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml # deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml (cherry picked from commit 59cf432)

henry701 added the enhancement New feature or request label May 6, 2023

henry701 mentioned this issue May 21, 2023

Add Consumer Flow Control Strategies #340

Closed

henry701 mentioned this issue Jul 12, 2023

Consumer flow strategy (take two) #374

Merged

acogoluegnes closed this as completed Jul 17, 2023

acogoluegnes added this to the 0.12.0 milestone Jul 17, 2023

acogoluegnes changed the title ~~Consumer Flow Control~~ Add consumer flow control API Jul 17, 2023

acogoluegnes added a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023

Adapt stream Java tests to client 0.12.0 snapshot

c594c77

After changes for rabbitmq/rabbitmq-stream-java-client#333.

acogoluegnes mentioned this issue Jul 17, 2023

Adapt stream Java tests to client 0.12.0 snapshot rabbitmq/rabbitmq-server#8880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add consumer flow control API #333

Add consumer flow control API #333

henry701 commented May 6, 2023 •

edited

henry701 commented May 6, 2023

acogoluegnes commented May 15, 2023

henry701 commented May 15, 2023 •

edited

acogoluegnes commented May 16, 2023

henry701 commented May 16, 2023

acogoluegnes commented Jul 17, 2023

Add consumer flow control API #333

Add consumer flow control API #333

Comments

henry701 commented May 6, 2023 • edited

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

henry701 commented May 6, 2023

acogoluegnes commented May 15, 2023

henry701 commented May 15, 2023 • edited

acogoluegnes commented May 16, 2023

henry701 commented May 16, 2023

acogoluegnes commented Jul 17, 2023

henry701 commented May 6, 2023 •

edited

henry701 commented May 15, 2023 •

edited