Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped connections and heartbeats causing bad_header error and infinite loop #413

Closed
DemonTPx opened this issue Jun 15, 2016 · 3 comments
Closed

Comments

@DemonTPx
Copy link

DemonTPx commented Jun 15, 2016

We have a connection going on between a datacenter and amazon AWS through a elastic load balancer. When trying to secure this connection using SSL, I've bumped into a problem.

When using SSL for the stream, keepalive can no longer be enabled (see issue #371). So I decided to try enabling heartbeats.

After a while I found out that a lot of my php processes just hang and after a bit more investigation, they seems to be stuck at the reconnect phase, triggered by the heartbeat functionality.

It seems to me that the following happens:

  • My php script connects to the rabbitmq server
  • My php script then does some stuff which does not involve rabbitmq for a while
  • The server reports that the client missed a heartbeat
  • My php script then wants to insert something into rabbitmq, which triggers the heartbeat, which tries to reconnect, since it thinks the server has gone away
  • The rabbitmq server receives an unexpected package and closes the connection immediately
  • The heartbeat is triggered again which, tries to reconnect again

The last two steps are executed indefinitely.

I tried disabling SSL, and it gives the same result.

I dug in some more and ran wireshark to figure out what was happening under the hood. When connecting for the first time, it seems that the communication goes like this:

  • Server sends protocol frame
  • Server sends Connection.Start frame
  • Client sends Connection.Start-Ok frame
  • Server sends Connection.Tune frame
  • Client sends Connection.Tune-Ok frame

But when the heartbeat is triggered this happens:

  • Client sends heartbeat frame
  • Server sends protocol frame
  • Server disconnects

The rabbitmq log looks like this:

=INFO REPORT==== 15-Jun-2016::09:41:56 ===
accepting AMQP connection <0.32068.24> (10.0.0.250:41049 -> 10.0.6.10:5672)

=ERROR REPORT==== 15-Jun-2016::09:41:59 ===
closing AMQP connection <0.32068.24> (10.0.0.250:41049 -> 10.0.6.10:5672):
missed heartbeats from client, timeout: 1s

=INFO REPORT==== 15-Jun-2016::09:42:12 ===
accepting AMQP connection <0.32228.24> (10.0.0.250:41063 -> 10.0.6.10:5672)

=ERROR REPORT==== 15-Jun-2016::09:42:12 ===
closing AMQP connection <0.32228.24> (10.0.0.250:41063 -> 10.0.6.10:5672):
{bad_header,<<8,0,0,0,0,0,0,0>>}

My guess is that the heartbeat is sent at the wrong time and that the Connection.Start and Connection.Start-Ok frames need to be exchanged before sending the heartbeat.

This is what I configured to reproduce this:

  • rabbitmq server version 3.6.1
  • heartbeat set to 1
  • read_write_timeout set to 6
  • ELB connection timeout to 30 seconds (does not seem to be the cause)

My test script:

$c->channel(); // Connects, $c is an AMQPLazyConnection
sleep(5);
$c->channel(); // Hangs on the heartbeat
@splio-kjoyeux
Copy link

Hello,

I'm also concerned by this, could someone have a look ?

Thanks

@sp3c73r2038
Copy link

Probable same issue like this issue.

ruicampos added a commit to smarkio/php-amqplib that referenced this issue Apr 21, 2017
This is affecting consumers when it takes more than 2*heartbeat time to process a message.
When trying to close the connection, library will:
* check the heartbeat
* detect that it passed more than 2*times the heartbeat value without receiving anything
* considers that server has gone away and tries to reconnect
* after reconnecting as it is not clearing internal variables with time of last read, it will check the heartbeat again and try to reconnect again in loop.

There are already issues on the library's github: php-amqplib#309 and php-amqplib#413
ruicampos added a commit to ruicampos/php-amqplib that referenced this issue Apr 21, 2017
The problem this PR fixes is:
* StreamConnection with heartbeat is created (e.g. heartbeat=10 seconds)
* Start consuming messages from the queue
* If one of the messages take more than 2*heartbeat interval to process (e.g. 30 seconds), the next time it tries to read something from Rabbit it will check_heartbeat()
* As it finds that it passed more than 2*heartbeat, it will reconnect()
* But as it does not reset the values of last_read and last_write, after reconnect it will do the check_heartbeat() again and as it is based on last_read, it will try to reconnect again
* It keeps trying to reconnect in infinite loop

This PR fixes issues: php-amqplib#413 and php-amqplib#309
@michaelklishin
Copy link
Collaborator

Hopefully should be addressed by #479.

escudeiro pushed a commit to smarkio/php-amqplib that referenced this issue Apr 24, 2017
This is affecting consumers when it takes more than 2*heartbeat time to process a message.
When trying to close the connection, library will:
* check the heartbeat
* detect that it passed more than 2*times the heartbeat value without receiving anything
* considers that server has gone away and tries to reconnect
* after reconnecting as it is not clearing internal variables with time of last read, it will check the heartbeat again and try to reconnect again in loop.

There are already issues on the library's github: php-amqplib#309 and php-amqplib#413
kratkyzobak pushed a commit to kratkyzobak/php-amqplib that referenced this issue Feb 9, 2024
The problem this PR fixes is:
* StreamConnection with heartbeat is created (e.g. heartbeat=10 seconds)
* Start consuming messages from the queue
* If one of the messages take more than 2*heartbeat interval to process (e.g. 30 seconds), the next time it tries to read something from Rabbit it will check_heartbeat()
* As it finds that it passed more than 2*heartbeat, it will reconnect()
* But as it does not reset the values of last_read and last_write, after reconnect it will do the check_heartbeat() again and as it is based on last_read, it will try to reconnect again
* It keeps trying to reconnect in infinite loop

This PR fixes issues: php-amqplib#413 and php-amqplib#309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants