Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Messenger] Batch handlers v2 #47090

Closed
wants to merge 1 commit into from

Conversation

upyx
Copy link
Contributor

@upyx upyx commented Jul 28, 2022

Q A
Branch? 6.2
Bug fix? no
New feature? yes
Deprecations? possible...
Tickets nope
License MIT
Doc PR TBD

The Messenger component is a good core of asynchronous applications. Since my team utilizes event-driven architecture, we heavily rely on the library. We've been collecting many ideas on improvement over the past years. Most of them are about extending and flexibility, so they require internal changes to the library. Unfortunately, some changes block each other, so it's difficult to do them piece by piece. Nevertheless, I will try. The first small piece is below.

Introdution

The common example of batch handler (merged in #43354) is:

class MyBatchHandler implements BatchHandlerInterface
{
    use BatchHandlerTrait;

    public function __invoke(MyMessage $message, Acknowledger $ack = null)
    {
        return $this->handle($message, $ack);
    }

    private function process(array $jobs): void
    {
        foreach ($jobs as [$message, $ack]) {
            try {
                // [...] compute $result from $message
                $ack->ack($result);
            } catch (\Throwable $e) {
                $ack->nack($e);
            }
        }
    }
}

Where the compute $result from $message is a place for real handler job. The other things are part of batch processing itself, which the handler shouldn't care about.

Proposal

In my vision, the ideal batch handler would look like this:

#[AsMessageHandler(batchStrategy: new BatchCount(10))]
class MyBatchHandler
{
    public function __invoke(MyMessage ...$messages)
    {
        // handle $messages
    }
}

Or an interface alternative to an attribute:

class MyBatchHandler implements BatchStrategyProviderInterface
{
    public function __invoke(MyMessage ...$messages)
    {
        // handle $messages
    }

    public function getBatchStrategy(): BatchStrategyInterface
    {
        return new CountBatchStrategy(10);
    }
}

A bit easier, isn't it? However, this way disables the features which we are used to: separate acknowledging and returning results. There is an option to return them, I'll descire it below.

How batches work

A batch processing consists of parts:

  1. Buffering
  2. Handling the buffer
  3. Acknowledge received messages

The first part (buffering) can be easily done by the library internals. The only thing it needs is trigger where to flush the buffer (run a handler). The second part is the handler's job. And the acknowledging... hm, it's some opinionable part.

Triggering a handler

Generally, rules for triggering a handler are more complex than just message counting. Most likely they would look like: "do not process a batch faster than once per 10 seconds if the buffer is less than 100 messages". Rules can be different, and most likely they would be shared between handlers in a group.

class MyBatchStrategy implements BatchStrategyInterface
{
    private int $bufferSize = 0;
    private int $lastHandledNanoseconds = 0;

    public function shouldHandle(object $lastMessage): bool
    {
        return ($lastMessage->forceHandling)
            || (++$this->bufferSize >= 100)
            || (\hrtime(true) - $this->lastHandledNanoseconds > 1_000_000_000);
    }

    public function beforeHandle(): void
    {
        $this->bufferSize = 0;
    }

    public function afterHandle(): void
    {
        $this->lastHandledNanoseconds = \hrtime(true);
    }
}

Acknowledging

I believe that a batch must be acknowledged or rejected entirely at once. Because it's a batch. If someone needs to acknowledge messages separately, it means that there is no batch but some infrastructure specific optimization or misuse. So, if a handler finishes without an error, the entire batch should be acknowledged.

Returning results

Handlers shouldn't care how they are run synchronously or asynchronously, they should do their job interchangeably and never return a result, but save it or dispatch further. Here is a perfect explanation of the difference between Messenger and Event Dispatcher. It refers to the main architecture idea of the Messenger which makes it so powerful: it never communicates back.

The possibility of returning results through synchronous transport is a historical accident and a misuse of the tool.

Use legacy features

I absolutely sure they are useless. However, it's might be overkill to change all legacy code. As I promised, there is a way to use them:

#[AsMessageHandler(batchStrategy: new BatchCount(10))]
class MyBatchHandler
{
    public function __invoke(Result $r, MyMessage ...$messages)
    {
        $results = $errors = [];
        try {
                // compute $results from $messages
        } catch (\Throwable $e) {
                // save $errors
        }
        foreach ($results as [$message, $result]) $r->ack($message, $result);
        foreach ($errors as [$message, $error]) $r->reject($message, $error);
    }
}

The Result object has a SplObjectStorage that maps messeges with Acknowledger, so it can be shared.

Deprecations

I'd like to deprecate BatchHandlerInterface, Acknowledger and HandledStamp::getResult().

Implementation

There is no proper implementation yet. The PR has a dirty draft to show an idea and play with it. Some tests failed because of a new optional argument in HandlerDescriptor, but it works.

@carsonbot carsonbot added Status: Needs Review Feature Messenger RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Jul 28, 2022
@carsonbot carsonbot added this to the 6.2 milestone Jul 28, 2022
@carsonbot carsonbot changed the title [RFC][Messenger] Batch handlers v2 [Messenger] Batch handlers v2 Jul 28, 2022
@nicolas-grekas
Copy link
Member

Hi, thanks for the proposal and for opening the discussion.

I believe that a batch must be acknowledged or rejected entirely at once. Because it's a batch. If someone needs to acknowledge messages separately, it means that there is no batch but some infrastructure specific optimization or misuse. So, if a handler finishes without an error, the entire batch should be acknowledged.

I have a different opinion here: acknowledging each message separately provides flexibility in handling batches. Eg one can process N messages in parallel and ack/nack as needed. This is a desired feature to me. On the contrary, acknowledging the batch itself would give a false impression that batches are transactions. They're not, since we can't guarantee anything from a transactional pov.

I fear that this difference invalidates most of the proposal :(

@upyx
Copy link
Contributor Author

upyx commented Jul 29, 2022

I have a different opinion here

So let's discuss it :)

Eg one can process N messages in parallel and ack/nack as needed. This is a desired feature to me.

Interesting... How do batches help here? When I want parallel processing, I run N workers with "single" handler. Or I can create a custom middleware to process few messages in parallel by one worker with fiber (in PHP 8.2).

On the contrary, acknowledging the batch itself would give a false impression that batches are transactions. They're not, since we can't guarantee anything from a transactional pov.

We can't, it's true. However, a user can! And it's a typical thing that users I do with batch handlers.

Anyway, I've seen batches a lot, and have never (can't remember at least) seen that partially acknowledging helps. Could you share your experience and give some example?

I fear that this difference invalidates most of the proposal :(

Absolutely not :) The Result object in the example is used to acknowledge and set results to messages separately.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Aug 1, 2022

The Result object in the example is used to acknowledge and set results to messages separately

Note that the current design keeps LSP, aka batch handlers remain valid regular handlers. Putting Result as first argument breaks this since batch handlers can't be called as regular handlers anymore. We'd need to preserve this capability.

@lyrixx
Copy link
Member

lyrixx commented Aug 1, 2022

I agree with @nicolas-grekas about

acknowledging each message separately provides flexibility in handling batches


Anyway, I've seen batches a lot, and have never (can't remember at least) seen that partially acknowledging helps. Could you share your experience and give some example?

Some exemples

  • Instead of calling Elasticsearch each time something changes, we buffer everything and we call the bulk API. It much more efficient for ES. Then is a document cannot be inserted, we can reject the message and send it to a dead letter queue for later inspection.
  • Instead of calling a rate limited API (of translation, but we don't care), we buffer all message, and call it every X secondes, or when the buffer is full. (see https://jolicode.com/blog/batching-symfony-messenger-messages for more information - and an old implementation)

In both situation, we are not in a "transactional mode". So beeing able to ack all messages one by one is a must have feature.

About the API, I prefer a new interface too instead of a trait. But we already discussed that in the PR (or mine, I don't remember). And the trait was finally chosen.

@upyx
Copy link
Contributor Author

upyx commented Aug 2, 2022

The more features, the more code, the harder to keep things clear and flexible. I'm trying to simplify handlers. Especially batch handlers.

@nicolas-grekas

Note that the current design keeps LSP, aka batch handlers remain valid regular handlers. Putting Result as first argument breaks this since batch handlers can't be called as regular handlers anymore. We'd need to preserve this capability.

The capability preserved by HandlerDescriptor and HandlerLocator semantically. There is no superclass to keep LSP by language features.

I'd like to make a single (m.b. internal) interface for all kinds of handlers and few sequential adapters which simplify work for users and keep BC. So the user would make a choice to implement the fullfeatured interface or "just a handler" part. It moves the logic about batches, buffers, acknowledging, and so on from the middleware internals (see how simple it was originally) and incapluletes it in places that are intended to.

Than you for examples @lyrixx !

Well, I see a lot of developers probably want to see that feature...

I won't argue for this.

see https://jolicode.com/blog/batching-symfony-messenger-messages for more information

We are using something close to that right now. As you can see, a batch handler processes a message with the whole batch.

@upyx
Copy link
Contributor Author

upyx commented Aug 2, 2022

Anyway, is this interesting propose or just close it?

@upyx
Copy link
Contributor Author

upyx commented Aug 23, 2022

The answer is clear. Maybe next time...

@upyx upyx closed this Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Messenger RFC RFC = Request For Comments (proposals about features that you want to be discussed) Status: Needs Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants