Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rabbit delays writing data to socket till hearbeat is sent. #1228

Closed
pydevd opened this issue May 22, 2017 · 5 comments
Closed

Rabbit delays writing data to socket till hearbeat is sent. #1228

pydevd opened this issue May 22, 2017 · 5 comments

Comments

@pydevd
Copy link

pydevd commented May 22, 2017

Environment:
 - Vagrant (1.9.1) Ubuntu 14.04:
     - Docker (version 1.12.5):
         - RabbitMQ v3.6.5
         - Client (Celery app v4.0.2, Python 2.7)
         - Server (Celery app v4.0.2, Python 3.5)

Use case: functional tests.

Workflow #1:
 1. py.test (TestApp) starts in docker RabbitMQ and Server application.
 2. TestApp registers "new" Client by sending task to Server.
 3. TestApp starts in docker Client application.
 4. Client and Server do a "handshake" by own protocol.
 5. TestApp sends task to Client for test purposes (TestTask).
 6. Client receives task immediately and executes it.


Workflow #2:
 1. py.test (TestApp) starts in docker RabbitMQ and Server application.
 2. TestApp registers "active" Client by sending task to Server.
 3. TestApp starts in docker Client application.
 4. TestApp sends task to Client for test purposes (TestTask).
 6. Client receives task after 60 seconds (THIS IS A PROBLEM).


Workflow #2:
 1. py.test (TestApp) starts in docker RabbitMQ and Server application.
 2. TestApp registers "active" Client by sending task to Server.
 3. TestApp starts in docker Client application.
 4. TestApp sleeps for 20 seconds.
 5. TestApp sends task to Client for test purposes (TestTask).
 6. Client receives task immediately and executes it.

What I've researched:

When Celery app establishing connection with RabbitMQ, they negotiate a Heartbeat Timeout Interval,
which is 60 seconds and this value set in Celery configuration for Client.

Debugging "internals" of Celery in Workflow #2, I've found, that "epoll" returns RabbitMQ connection socket in "ready for read" state
only after RabbiMQ will send his Heartbeat to this connection (60 seconds for my configuration).

From Workflows #1 & #3 we can see, that if there is a little delay between after Client started and TestTask sent
(handshake and synthetic delay), everything is working perfectly.

I have no ideas about this behavior. I need tasks be executed as soon as they will be sent/retrieved, not after this big delay.

I can fix tests by decreasing Heartbeat Timeout Interval, but this is not an option for production.

What can you suggest?

@michaelklishin
Copy link
Member

RabbitMQ does not synchronise writes of, say, basic.deliver (or any other frames) with heartbeats. In fact, any on the wire activity must considered to be a heartbeat according to the spec, by both RabbitMQ and client libraries. Take a look at rabbit_writer if you'd like to see it for yourself.

You haven't provided any details as to what your RabbitMQ TCP listener or kernel settings are but
unless Nagle's algorithm is enabled — which RabbitMQ and all clients maintained by our team disable by default — new data should generally be sent out as fast as possible (ignoring TCP congestion control).

Decreasing heartbeat interval to a value lower than 60 is perfectly suitable for production, I'm not sure what makes you think it's not. In most environments values between 6 and 15 seconds work best and appease TCP proxy and load balancer idle connection timeouts as a nice side effect.

What is dangerous about timeouts is values that are too low, as they result in false positives.

@michaelklishin
Copy link
Member

And sorry to sound like a smug smartypants but empirical observations such as those provided above are not really evidence of write synchronisation with heartbeat frames. A traffic dump, rabbitmqctl environment output, kernel TCP settings and some actual code are all necessary in my opinion to be able to come to an informed conclusion.

@michaelklishin
Copy link
Member

If you'd like to continue digging, please do this on rabbitmq-users, as there is little information for us to work with and our team does not use GitHub issues for questions, root cause analysis and discussions.

@michaelklishin
Copy link
Member

Also worth mentioning that the heartbeat timeout is not the same as heartbeat interval. In other words, when heartbeat timeout is set to to be 60 seconds, both peers will send two heartbeat frames (assuming there was no other traffic) in the same period of time because according to the spec it takes two missed heartbeat deliveries for the peer to be considered unavailable.

If the behaviour claimed in the title was correct, then it should take ~ 30 seconds till first delivery, not 60.

@pydevd
Copy link
Author

pydevd commented May 22, 2017

@michaelklishin, thanks for reply. I'll use rabbitmq-users for future questions.

empirical observations such as those provided above are not really evidence of write synchronisation with heartbeat frames.

I understand, also, they should not be connected anyway, I understand this. My experiments just saying that in Workflow #2 example, delay before task will be read from rabbitmq connection socket will be equal to hearbeat interval. I've checked same scenario many times with different hearbeat interval values and result is always the same: delay time == hearbeat timeout.

I will provide more details later according to your list of necessary things.

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants