Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consumer flow control API #333

Closed
henry701 opened this issue May 6, 2023 · 6 comments
Closed

Add consumer flow control API #333

henry701 opened this issue May 6, 2023 · 6 comments
Labels
enhancement New feature or request
Milestone

Comments

@henry701
Copy link

henry701 commented May 6, 2023

Is your feature request related to a problem? Please describe.

In high-volume scenarios, consuming from the start of a very big Stream (many chunks), consumers that consume messages asynchronously and take very long to consume messages will keep receiving chunks until the application runs out of memory.

Describe the solution you'd like

Expose chunk-level control flow mechanism at the Consumer API level.

This could be implemented by exposing a ChunkListener on the Consumer builder. This listener callback would receive a Context object with information about the received chunk, much like the existing API design today.
The crucial part is that the Context would also have a callback so that the listener could control when to request additional chunks (by internally sending Credit messages using the Client).

Example (pseudocode from the top of my head, will elaborate and fix minor errors later, if applicable):

ConcurrentLinkedQueue<Consumer.ChunkListener.ChunkContext> unhandledChunks = new ConcurrentLinkedQueue<>();
AtomicLong pendingMessages = new AtomicLong(0);
Consumer = environment.consumerBuilder()
  .stream("sampleStream")
  .name("application-1")
  .manualTrackingStrategy()   
  .builder()
  .messageHandler((context, message) -> {
    /* handle message asynchronously here... */
    // Assume all remaining code inside this MessageHandler is run ASYNCHRONOUSLY after the message is processed by an external Executor or Thread or whatever
    long localPendingMessages = pendingMessages.decrementAndGet();
    if (conditionToStore()) {
        context.storeOffset();   
    }
    // We're out of messages, mark the chunk as handled so that Client sends a Credit request and more chunks come in later
    if (localPendingMessages == 0) {
        unhandledChunks.poll().markHandled();
    }
  })
  .chunkListener(chunkContext -> {
    unhandledChunks.offer(chunkContext);
    pendingMessages.incrementAndGet(chunkContext.chunkSize());
    // chunkContext.markHandled();
    // Alternatively, expose the quantity of messages to ask for, via .credits(int credits)? Or have both APIs?
  });

Of course, this is just a naive implementation draft. Better flow control would have a buffer to ensure high throughput, instead of asking for credits only when the consumer is already idle.

The default chunkListener would correspond to just asking for additionalCredits, this way current behavior is preserved in a backward-compatible way.

As a bonus, this explicit flow control would make integration with RxJava possible as well, which could be very useful for integrating this library together with Spring, for instance.

Describe alternatives you've considered

An alternative would be to consume messages synchronously from the connection thread, but this blocks some response handling and causes some RPC calls to time out, so it's not ideal.

Additional context

There are similar discussions on issue #262 and https://stackoverflow.com/questions/71932072/add-delay-in-consuming-messages-in-rabbitmq-stream

@henry701 henry701 added the enhancement New feature or request label May 6, 2023
@henry701
Copy link
Author

henry701 commented May 6, 2023

Also, I'm available as a contributor to implement this feature, if it would be an option. Let me know.

@acogoluegnes
Copy link
Contributor

Thanks for the analysis.

We indeed need a way to give some sort of control over the credit mechanism to the client API. I'd like to avoid references to chunks, as it's an implementation detail.

I had thought we could add a MessageHandler.Context#processed() method that the application would call as soon as it's done with a message. What's done behind this call is up to the client library. There could be different strategies:

  • provides 1 credit for the first message of the chunk and does nothing for the next messages in the chunk. That's pretty much what we do currently and is a pretty aggressive strategy.
  • does nothing for the messages of the first half of the chunk, provides 1 credit for the message in the middle of the chunk, and nothing for the next messages in the chunk. New chunks should arrive soon enough and the consumer should not starve, that's a good tradeoff.
  • does nothing for the first messages and provides 1 credit for the last message of the chunk. That would work for slow consumers: a chunk would have time to arrive during the processing of the last message.

WDYT?

@henry701
Copy link
Author

henry701 commented May 15, 2023

Thanks for the response!

I can't entirely agree that chunks are an implementation detail since they are a part of the stream protocol itself, but I see where you're coming from: Having this control via the credit mechanism only available via the Chunk granularity is not a perfect fit abstraction-wise for consumers that only care about messages.

I think that with careful API design, we can achieve both goals, those being:

  1. Expose a simple API for simple consumers that only care about messages
  2. Retain the capability to control flow based on chunks of messages, for consumers that care about it

Your idea of those three different strategies can be expressed as a group of strategies that mark each message as processed. Other strategies such as chunk-based ones, or even manual-callback ones, wouldn't be concerned about the individual processing status of each message.

Instead of using MessageHandler.Context#processed() to mark each message as processed, we could instead have a getContext() method on those three strategies (we may make them implement an interface named MessageProcessingAwareFlowControlStrategy that returns a new type MessageProcessingAwareStrategy.Context, which has a method void markProcessed(Message message).
This way we would avoid exposing specific API to those who wouldn't need it, and we would retain the capability to use or create other flow control strategies for consumers that need the personalization.

Other very flexible API ideas that could stem from this approach are:

  • Ask for X credits while there are Y or less chunks left to process
  • Ask for X credits while there are Y or less messages left to process

Those two alone would offer a lot of "automagic" flow control to consumers because you can specify the minimum messages or chunks that would be ideal to have in memory (queued or processing) at each given point in time, oriented towards high throughput.

Your thoughts?

@acogoluegnes
Copy link
Contributor

I can't entirely agree that chunks are an implementation detail since they are a part of the stream protocol itself, but I see where you're coming from: Having this control via the credit mechanism only available via the Chunk granularity is not a perfect fit abstraction-wise for consumers that only care about messages.

Right, I meant "implementation detail" for the "high-level" API of this client library (Consumer, etc).

The advantage of using something like MessageHandler.Context#processed() is that the context knows exactly where the message lies in the chunk (the Message interface does not know anything chunks and it should stay like this). I don't see how we could easily add another abstraction that knows about the details of the end of the processing of a message and its place in the chunk (MessageHandler.Context knows all this).

Feel free to submit a PR, I'll be happy to have a look and to discuss the design.

@henry701
Copy link
Author

Right, I meant "implementation detail" for the "high-level" API of this client library (Consumer, etc).

Ah, sorry for the mixup there.

The advantage of using something like MessageHandler.Context#processed() is that the context knows exactly where the message lies in the chunk (the Message interface does not know anything chunks and it should stay like this). I don't see how we could easily add another abstraction that knows about the details of the end of the processing of a message and its place in the chunk (MessageHandler.Context knows all this).

Good point, it would be easier to implement by receiving MessageHandler.Context instead of Message in markProcessed.

Feel free to submit a PR, I'll be happy to have a look and to discuss the design.

I will submit a PR later.

Thanks!

@acogoluegnes
Copy link
Contributor

Fixed in #374.

@acogoluegnes acogoluegnes added this to the 0.12.0 milestone Jul 17, 2023
@acogoluegnes acogoluegnes changed the title Consumer Flow Control Add consumer flow control API Jul 17, 2023
acogoluegnes added a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023
mergify bot pushed a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023
After changes for
rabbitmq/rabbitmq-stream-java-client#333.

(cherry picked from commit c594c77)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
mergify bot pushed a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023
After changes for
rabbitmq/rabbitmq-stream-java-client#333.

(cherry picked from commit c594c77)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
(cherry picked from commit f57c6db)
mergify bot pushed a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023
After changes for
rabbitmq/rabbitmq-stream-java-client#333.

(cherry picked from commit c594c77)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
(cherry picked from commit f57c6db)
(cherry picked from commit 422be98)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
mergify bot pushed a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 17, 2023
After changes for
rabbitmq/rabbitmq-stream-java-client#333.

(cherry picked from commit c594c77)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
(cherry picked from commit f57c6db)
(cherry picked from commit 422be98)

# Conflicts:
#	deps/rabbitmq_stream/test/rabbit_stream_SUITE_data/pom.xml
#	deps/rabbitmq_stream_management/test/http_SUITE_data/pom.xml
(cherry picked from commit 59cf432)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants