New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This socket has been ended by the other party #4
Comments
Do you know how long the process was running when this happened? |
~500k jobs processed across about 5 different workers, timed out after 12 hours or so I believe (I didn't discover until after the fact). Based on the logs, it looked like it may have been idle for a little while before it timed out. |
Seeing the same issue here also. |
This is helpful, thanks for the info. Do you have logging or debug enabled for the workers? You don't have any heartbeat log statements do you? That would also be helpful if they were available. I'll take a look into this. |
Did the node process die when this happened or was it just hanging around? |
I added some fixes to gracefully reconnect and continue working on connection interruptions. Could you try Please reopen if this persists. |
I am getting this error in
It is happening during a job that takes a decent amount of time to finish. At the same time, Faktory is complaining about no heartbeat. I assume this means my task is blocking the event loop and preventing the heartbeat from working. Regardless, this shouldn't throw an error and crash the process. Here are my Faktory logs around that time:
Line in question: https://github.com/jbielick/faktory_worker_node/blob/master/lib/connection.js#L165 Trying to |
@emhagman so the server is closing the connection in the midst of this library trying to use that socket? |
Correct, Faktory is closing the connection and then the lib is trying to call It is definitely being removed from Faktory because the task that is running is blocking the event loop and the worker cannot send the heartbeat to Faktory in time (every 15s or it is removed). I understand the ideal scenario is to not block the event loop and I have fixed that issue but if there is non-performant code, it would be ideal if the server doesn't crash if the connection is reaped by Faktory as there could be other jobs running on the worker. |
How often does this occur? We're using a connection pool in this library. Before a client sends any command, a connection is checked out from the pool faktory_worker_node/lib/client.js Lines 162 to 164 in 69f6f7f
Before a connection can be checked out from the pool, it is validated that it is connected. faktory_worker_node/lib/client.js Line 58 in 69f6f7f
and faktory_worker_node/lib/connection-factory.js Lines 60 to 62 in 69f6f7f
Before digging further (and I'll probably need some DEBUG=faktory* logs from you) I'm trying to reason about how this error would occur. Is it possible that this is occurring so fast (or the event loop is blocked) that the connection is being used before the "onClose" callback is able to be called for it? That's the only guess I have. The event loop being blocked is sort of a fatal issue IMO. I'm not sure there's a lot of reasonable expectations to be made under those conditions. Writing to the socket would be a synchronous execution, while waiting for a response or handling the "socket closed" event would be async. |
Getting this during the heartbeat for long(ish) running workers:
The text was updated successfully, but these errors were encountered: