-
Notifications
You must be signed in to change notification settings - Fork 847
Socket Error: 104 consuming messages with task that take a long time #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pat1, if the processing takes so long, then it's not a good match for the asynchronous programming model. Asynchronous programming model works well for I/O-bound processing that uses a common ioloop for all blocking operations within a thread. I am also surprised that with Your app is not accessing the same pika connection (and/or its channels) from more than one thread, is it? pika connections are not thread-safe, so you would expect to have trouble in that case. The following would be really helpful for debugging:
|
Thanks for your replay. My app is not accessing the same pika connection (and/or its channels) from more than one thread. test program:
|
Adding socket_timeout=1200 in ConnectionParameters do not solve the problem.
|
What does the network look like? Is there a proxy or similar between the client and rabbitmq broker? |
Also, what does the RabbitMQ log have to say about this? Please post the RabbitMQ log that shows the disconnect. |
I have no proxy, I access rabbitmq via localhost (or public IP, do not change the result).
Setting {heartbeat, 0} in /etc/rabbitmq/rabbitmq.config do not change the behavior. |
@pat1, the rabbitmq log doesn't look like the failure is related to heartbeat. Note that the error is reported as "send failed ... timeout" just 30 seconds after "accepted AMQP connection". A send timeout would usually mean that transmit retries were exhausted and the peer is unreachable. This is what a heartbeat timeout log entry looks like:
|
looking at: adding
before
solve the problem. I think this configuration/example should be published somewhere as template for a very common user case application. Also a ioloop heartbeat poll method should be useful for do not disable heartbeat and execute only consumer task in thread. You can evaluate to close this issue. |
@pat1, thank you for following up. My expectation would be that RMQ should be doing non-blocking I/O, and should not fail this way. This sounds to me like a bug in RMQ. I would like to investigate and follow up on this with the RMQ team. Your code snippet https://gist.github.com/pat1/4017d6565501b657731560af3d2e0b9e contains only the consumer. Would you mind also adding a small script that populates the queue with the size and number of messages that will surely reproduce the failure? Using |
I am seeing this in other AMQP libraries as well. |
To be clear this do not work: @vitaly-krugl I do not have a script to populates the queue, I think you can fill it with 1000 messages of 100k size |
Thank you, @pat1. |
Hello, is there any update regarding this issue? We started getting this error this week, but we didn't change anything except for the environment we run our code. Thanks |
I'm also having this issue, even with the |
moving from time.sleep() to the busy wait:
worked for me :) |
I'm not using any sleeps, all the time is being spent in actual message processing, which might include heavy SQL calls and a lot of requests calls among others |
In process_data_events() the common_terminator that is passed to _flush_output() was also set to true for events that could not be processed because of the call context. Considering those events anyway will lead to _flush_output() returning immediately. This can lead to a dropped connection e.g. when there is a timeout event ready but process_data_events() is called from a callback (e.g. on_message). The timeout event is considered as a terminator in this case but can not be processed. Fixes: pika#753
@pat1 Been struggling with this for days. This solved my problem, you're the man! |
I did all of the following investigation using Pika at tag The provided code to reproduce this issue does indeed show it on my machine. Note: you don't need a 10 minute sleep to reproduce this - anything over 30 seconds works. During the So, to address this issue, you have the following options:
PR #843 does not appear to resolve this issue. |
I am going to close this as everything is working as expected - The reproducer code opens a connection and channel to RabbitMQ, but does not specify In this scenario, the TCP connection can be thought of as a big pipe with no limits other than it's size (i.e. buffer sizes) on how much data can be pushed through it. So, RabbitMQ keeps sending data to the client while it sleeps after receiving the first message. Depending on message size and total count, the TCP "pipe" is big enough to hold all of these interim messages so RabbitMQ does not time out during the send operations. When message size and count is big enough to fill all buffers, RabbitMQ blocks on the send and eventually times out after 30 seconds which is what you see in the logs ( This is one reason why it is critical to use |
As written some time ago the use of Also a ioloop heartbeat poll method should be useful for do not disable heartbeat and execute only consumer task in thread. |
@lukebakken I'm also having this issue, even with the Rabbitmq log look like this.- =INFO REPORT==== 26-Sep-2017::13:14:32 === =INFO REPORT==== 26-Sep-2017::13:14:32 === =ERROR REPORT==== 26-Sep-2017::13:19:02 === Can you please suggest what i'm doing wrong. code for the consumer is same as of @pat1. |
I am also having this issue, and I don't think it is anything to do with having a wait before message acknowledgement (receipt to ack is less than 1 sec, however the task after the ack does take 3-4 minutes). My relevant functions are here and here. Using prefetch_count=1 makes no difference. Calling stop_consuming prior to running the task and start_consuming after makes no difference. |
|
Although @lukebakken 's response is accurate, we have not addressed why the timeout settings do not seem to be working with anyone. |
It has been long enough since I've responded that I don't remember the particulars for this issue. What timeout settings are you talking about? Heartbeats? Socket read/write? Providing details and (especially) configuration and code to demonstrate your issue will get the fastest response. Please remember that Pika's maintainers do so on a volunteer basis so anything you can do to help remove "guesswork" is greatly appreciated. |
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 5 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=5 achieves 75% of the prefetch=0 performance, and higher values do not appreciably improve it. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 5 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=10 achieves 80% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 5 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=10 achieves 80% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. We also set a 100-count prefetch on Tornado workers, as they are potentially susceptible to the same effect. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. We also set a 100-count prefetch on Tornado workers, as they are potentially susceptible to the same effect. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is pika/pika#753 (comment). Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. We also set a 100-count prefetch on Tornado workers, as they are potentially susceptible to the same effect. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
As in #418 I have problem consuming messages with task that take a long time.
I am using git master setting heartbeat_interval=0.
My code is very similar to
http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html
but my on_message method take some minutes to "consume" the message.
after some time consuming message I get:
On the log you can see function and line before meaage.
I have tried to execute my task in thread maintaining the amqp comunication in the main process with:
write_only=True was available in previous version
self._connection.ioloop.poll() create a recursion call with a lot of thread and problems
self._connection.ioloop.start()
So I don't have solution, I cannot consume messages and all is stalled.
Is possible we need a (background) poll to maintain socket and do not get new messages from queue?
The text was updated successfully, but these errors were encountered: