Skip to content

[kafka] RdKafkaProducer has no way of error handling #749

@Steveb-p

Description

@Steveb-p

When an error occurs in RdKafkaProducer, it is silently ignored - or, more accurately, since it occurs outside of the main thread of php/php-fpm, it is not reported back.

A simple test can be done to see this issue: trying to send a message to a non-existent Kafka server will result in nothing (see point 2. though)

This results in two things:

  1. Process has no way of knowing, that a message that was supposed to be delivered will be not. Producer returns immediately without waiting for message to be acknowledged.
  2. Due to how it is handled in arnaud-lb/php-rdkafka, process that was supposed to send the message will be locked and retry the operation for a long time (from my testing it seems around 5 mins, wasn't able to find a configuration option to change it to something else). In my case with default configuration of php-fpm docker image it resulted in fpm pool becoming locked after 5 requests being made, since thats the default configuration for max spawned children of php-fpm.
    While this particular part is not really possible to fix inside enqueue, it's important since the message might actually be delivered later on.

I'd like to ask for opinion regarding how RdKafkaProducer should handle this situation. IMO it is worthwile to add a configuration option to make sending messages synchronous for this particular Producer - or at least wait a specified amount of time for message to be potentially acknowledged.
This can be done by introducing this code at the end of send method:

$topic = $this->producer->newTopic($destination->getTopicName(), $destination->getConf());
$topic->produce($partition, 0 /* must be 0 */, $payload, $key);
$start = microtime(true);
while ($this->producer->getOutQLen() > 0) {
     $this->producer->poll(1);

     if (microtime(true) - $start > 10) {
           throw new \RuntimeException("Message sending failed");
     }
}

This has a side effect of actually calling dr_msg_cb & error_cb callbacks, which are otherwise ignored (or at least that's what my testing indicated).

Thoughts?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions