-
Notifications
You must be signed in to change notification settings - Fork 102
Corrupted gelf messages with TCP sender #96
Conversation
GelfTcpSender flushing gelf message buffer maven source plugin
@@ -208,6 +208,19 @@ | |||
</executions> | |||
</plugin> | |||
|
|||
<plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this change? maven-source-plugin
is pulled in through the parent pom and executed on release builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no reason to this change. May be ignored.
Thanks for your PR. Could you help me understand the issue? Reviewing my code, I found I'm also not sure whether |
Ok, I'll try to explain the problem (sorry for my English). So I made research. sniffed traffic from application (which used logstash-gelf) and found, that some gelf messages are broken. This occurs then application sends a lot of messages in a short time. Usually it looked like one incomplete gelf message and next complete gelf message like this: Yes. SocketChannelImpl synchronizes writes internally. But GelfTCPSender uses socketChannel in non-blocking mode ( Writing to socketChannel in loop doesn't solve multiple thread log writers problem. Example: one thread can write a piece of gelf message, then another thread writes own piece of gelf message and then first thread completes writing it gelf message - this two messages blended, stream of gelf messages brokes. To separate multiple gelf writers I used synchronization. We cann't start writing next gelf message to socket before complete writing previous gelf message.
In non-blocking mode nothing happens. We write as much bytes as possible, and when send buffer fills GelfTCPSender doesn't try to write remaining data in the buffer. In this PR there is one possible issue fix, most importantly - the idea, may be there is any better solution. |
Awesome, thanks for the explanation. Don't worry about your English, it's really good. So the solution is to make sure that all bytes from the buffer are written and the buffer does not contain any remaining bytes? Message loss could then also happen if the buffer is smaller than the message (say 100 bytes output buffer and 110 message bytes). I will do further investigation on this issue to find out how to fix all affected senders. |
Yes. And second thing - synchronization multiple threads, so only one thread writes gelf message. Another threads should not start writing to a socket, while first thread is not finished writing it's gelf message. Loss of messages due to buffer size - is at least predictable behavior. We just need to remember about this option. |
Hello! Here is new branch (socketSendBufferOverflowTest remained the same): https://github.com/koeff/logstash-gelf/tree/send-buffer-overflow-2 Is there any advantages of using non-blocking socket channel here? We don't use selectors here. |
There's a reason for non-blocking mode: |
I took a look on your work and adopted it into ecddbfd. Could you give the changes a try? If so, then I'd merge the changes to |
Reviewed. This commit fixes corrupting messages if there is only one log writer. |
A high write volume can cause package loss when using TCP or UDP senders. This is because of saturated send buffers. NIO channels don't block/continue writing if send buffers are full but return from the call. Inspecting the remaining buffer size is a good measure to determine whether the buffer was written entirely or whether bytes to write are left. Add synchronization to prevent GELF message interleaving. TCP and UDP senders now retry writes until the message buffer is written entirely to the Channel (send buffer). See also PR #96.
Thanks a lot for your kind help to discover and fix multiple issues. I'm closing this PR in favor of a slightly different fix. |
On high loads some log messages are corrupted or lost. Writed reproducable test.
Environment: java version "1.8.0_102"; logstash-gelf-1.10.0; logback 1.1.7; graylog.
I'm using logback with logstash-gelf-1.10.0 appender, configured using tcp protocol.
Problem is still reproducable on logstash-gelf-1.11.0-SNAPSHOT.
Logback.xml appender config: