-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When the pipeline is stuck, the lumberjack input can make logstash run out of memory #10
Comments
Another solution we could implement at the plugin level is to have a small buffered queues of events inside the lumberjack plugin that support a timeout for the lock, so this queue could be a small broker between the input and the |
@colinsurprenant's new persisted queue could also reduce that pressure on the input side. Could you see some problems of adding some sort of timeout mechanism to your persisted queue? |
In Log Courier I implemented SizedQueue with timeouts, and partial ACK to prevent nearly all timeouts. Works extremely well. Though I think the partial ack is not going to be backwards compatible without versioning of some sort because forwarder does not validate acks. Using the persisted queue with timeouts would be great I think. Thought I'd let you know as it might help with discovery :) |
I still believe threadpool and timeout would be great to have, ty @driskell for some reference |
Sure I was not attempting to change your plans if it came across that way! They are great. Was pointing out that timeout has been worked on before in case it helps as reference. Timeout is biggest win I think (it was for me) and its natural progression (with even bigger win) is partial ack - means nobody has to mess around with timeout settings. Love the work you guys do. I'll leave you to get on with it however you decide 👍 |
@driskell I might have been a bit direct in my last reply. I am sorry. Let me do a bit of explanation. The threadpool/timeout will help a bit for the current OOM problem, I agree this is not the golden solution.
All those things is only to improve the user experience and resiliency of the whole stack. |
SizeQueue and Circuitbreaker implemented in the lumberjack input. |
Like discussed in elastic/logstash#3003.
When the back pressure is applied to the pipeline up to the lumberjack input, the connection threads will block. On the producer side, Logstash-forwarder will never receive an
ack
message for the blocked payload and he will assume the connection had a timeout.The behavior of LSF is to reconnect on timeout and try to resend the unacknowledged frames to logstash. The input will accept this new connection but will block on the queue. LSF will retry forever to send the message to logstash and logstash will go OOM, crawling under the number of connection attempts.
The first goal here is to implement a threadpool to limit the number of connection an input can create and refuse any new connection when we don't have any ressources left.
The text was updated successfully, but these errors were encountered: