Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes with pcntl signals on channel->wait function, when using non_blocking: false #1135

Open
cosminardeleanu opened this issue Oct 23, 2023 · 9 comments

Comments

@cosminardeleanu
Copy link

cosminardeleanu commented Oct 23, 2023

Version(s) affected: 3.6.0

Description
pcntl_alarm no longer work as expected, compared with 3.5.4, it is no longer triggered immediately.
Possible other signals may be affected.

How to reproduce
I don't have exact steps to reproduce right now, but i assume:

  • install alarm, let's say 5 seconds, add some exit(0); in the handler code.
  • create a basic consumer
$channel->wait(allowed_methods: null, non_blocking: false, timeout: 10);
  • the signals are processed only after timeout, so instead of stopping the consumer after 5 seconds, is stopped after 10, when $timeout occurs.
  • the same is valid for the other pcntl signals, for example ctrl+c (break), SIGTERM, etc.

I would expect this to be a major version, because of these breaking changes.

@ramunasd
Copy link
Member

ramunasd commented Oct 23, 2023

I guess You can solve this by enabling async signals with pcntl_async_signals(true)

@cosminardeleanu cosminardeleanu changed the title Breaking changes with pcntl signals on chanel->wait function Breaking changes with pcntl signals on channel->wait function Oct 24, 2023
@cosminardeleanu cosminardeleanu changed the title Breaking changes with pcntl signals on channel->wait function Breaking changes with pcntl signals on channel->wait function, when using non_blocking: true Oct 24, 2023
@cosminardeleanu cosminardeleanu changed the title Breaking changes with pcntl signals on channel->wait function, when using non_blocking: true Breaking changes with pcntl signals on channel->wait function, when using non_blocking: false Oct 24, 2023
@cosminardeleanu
Copy link
Author

Thank you for suggestion, it is already set to true.

Sorry for not being able to provide steps to reproduce right now, i will provide them when the time allows me.

Note: Updated the title, added clarifications that this happens only when the value of non_blocking is false.

@ramunasd
Copy link
Member

Can You clarify the problem, please? Signal handler is not invoked in time? Or You expect that wait() will be terminated because of signal?

@ramunasd
Copy link
Member

BTW, if Your intention is to stop consuming as soon as possible, then probably it would be better to use method $channel->consume() with low poll time and call $channel->stopConsume(); in signal handler.

@cosminardeleanu
Copy link
Author

Script to reproduce:

<?php

use PhpAmqpLib\Connection\AMQPStreamConnection;
use PhpAmqpLib\Message\AMQPMessage;
use PhpAmqpLib\Exception\AMQPTimeoutException;

require_once __DIR__ . '/vendor/autoload.php';

$connection = new AMQPStreamConnection(
    'host',
    5672,
    'user',
    'pass',
    '/',
    false,
    'AMQPLAIN',
    null,
    'en_US',
    3.0,
    3.0,
    null,
    false,
    0,
);
$channel = $connection->channel();
$channel->queue_declare('test');
$basicConsumer = $channel->basic_consume(
    'test',
    'test',
    false,
    false,
    false,
    false,
    function (AMQPMessage $amqpMessage) {
        // Process message
    }
);

pcntl_signal(SIGALRM, static function ($signal) use ($channel, $basicConsumer) {
    if (SIGALRM === $signal) {
        var_dump('signal SIGALRM received');
        $channel->basic_cancel($basicConsumer);
        $channel->close();
    }
});
pcntl_async_signals(true);
pcntl_alarm(2);

$idleTimeout = 5;
while ($channel->is_consuming()) {
    var_dump('while is consuming...');
    try {
        var_dump('started waiting...');
        $channel->wait(null, false, $idleTimeout);
        var_dump('finished waiting...');
    } catch (AMQPTimeoutException) {
        break;
        // Expected, this is idle timeout.
    }
}
var_dump('finished consuming');

if ($channel->is_open()) {
    $channel->basic_cancel($basicConsumer);
    $channel->close();
}
var_dump('the end');

With 3.5.4 the output is:

time php LibAmqpTest.php
string(21) "while is consuming..."
string(18) "started waiting..."
string(23) "signal SIGALRM received"
string(18) "finished consuming"
string(7) "the end"

real	0m2.144s
user	0m0.102s
sys	0m0.030s

With 3.6.0 the output is:

time php LibAmqpTest.php
string(21) "while is consuming..."
string(18) "started waiting..."
string(23) "signal SIGALRM received"
string(18) "finished consuming"
string(7) "the end"

real	0m5.142s
user	0m0.103s
sys	0m0.031s

Bit of context:

  • I use $idleTimeout to stop consumer if was idle for given $idleTimeout (for example, the consumer is stopped after a minute of idle)
  • On top of that, the consumer process cannot run forever, so i also have the sig alarm, to close after a while, to avoid stuck consumers (for example a process cannot run for more than 10 minutes).

@ramunasd
Copy link
Member

Well, we had a bigger problem with previous versions, more context in #1094 and #1118
Method wait() does not give any garantues regarding signals. If signal was sent when socket function was blocked, then yes, it returned instantly. But if signal was sent during any other activity, then method wait() will spend more time. So previous behaviour was more random. Now it is consistent.

@lukebakken
Copy link
Collaborator

I don't have nearly as much experience with this as @ramunasd, but, based on the code provided by @cosminardeleanu, version 3.6.0 is behaving correctly. You specify a 5 second timeout, which is what you get with the new release.

@cosminardeleanu
Copy link
Author

cosminardeleanu commented Oct 25, 2023

I feel that your approach is wrong. Consider some silly real life correlations:

v.3.5.4

  • you throw a party (consumer)
  • police comes at your door and ask you to stop the party (pcntl signal)
  • you ack and cease the party, and throw an apology (exception)

v.3.6.0

  • you throw a party (consumer)
  • police comes at your door and ask you to stop the party (pcntl signal)
  • you ack but decide to ignore them
  • after a while, you go outside to resupply (wait expires)
  • police bust you (pcntl signal finally handled)
  • now you're in even bigger troubles, because you decided to ignore them
    or worse
  • police decides to break your doors and bust you, because you did not respond (grace time period expired)

Is similar when you're working in kube / cloud:

  • you have a grace period to kill the process, otherwise it will be done forcefully
  • now that the process no longer listens to you, it is killed during the mid processing and cause more troubles.

As i said, this is not only related to signal alrm, but sigterm, sigint, etc, whatever signal you decide to handle.
The alarm one was just an easy explanation.

TLDR; Your reasoning/fixes does not change the fact these are breaking changes, in a minor version.
Based on my typical usecases, I feel that the right approach was to throw an exception that a pcntl signal was received and party was forcefully stopped.

@ramunasd
Copy link
Member

You told A but skipped B. Consumer never knows what signal means. It might be termination signal, but also might be heartbeat sender or totally unrelated thing. In heartbeat case we must send special frame and continue read operation. So everything is not so simple as party/police example :)

If You want have more responsive consumer, then use lower read timeout. It's simple like that:

pcntl_signal(SIGALRM, static function () use ($channel) {
    $channel->stopConsume();
});
pcntl_async_signals(true);
pcntl_alarm(2);
$channel->consume(1.0);

TLDR; Your reasoning/fixes does not change the fact these are breaking changes, in a minor version.
Based on my typical usecases, I feel that the right approach was to throw an exception that a pcntl signal was received and party was forcefully stopped.

Additional exception would be breaking change, that's for sure.

Anyway, a term "breaking change" have many interpretations. I've seen many discussions regarding exact meaning of that term. I personally use idea that "breaking change" breaks compatibility and requires adoption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants