Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In UDP mode, messages are truncated #8

Merged
merged 1 commit into from Aug 29, 2017

Conversation

vcostet
Copy link
Contributor

@vcostet vcostet commented Jul 19, 2017

In startUDP, the buffer has to be created inside the for loop, otherwise processMessage will always get the same buffer (and the content could change before the processMessage goroutine is executed).

@hartfordfive
Copy link
Owner

@vcostet Your messages are likely truncated because the default buffer size of 4096 bytes is too small. You can try increasing it to a larger value, maybe something like 8096 or 16384.

In general with IPv4, I believe the limit could be up to 65507 bytes but keep in mind you should only be sending messages to a protologbeat instance running locally on the same server. Connecting to even another host on the same private network would likely cause splitting of messages due to MTU.

As for the buffer being initialized outside of the for loop, that is purposely done so that the same buffer can be reused over and over, without initializing and allocating a new byte slice on each message which would consequently cause increased GC overhead. The ReadFrom method simply reuses this buffer by reading in bytes form the socket connection.

@vcostet
Copy link
Contributor Author

vcostet commented Jul 19, 2017

The issue is not related to the buffer size (the messages are very short, < 20 characters and UDP stream is coming from a local service, and are not splitted).

processMessage is called asynchronously, but will always get a reference to the same buffer, and the content of the buffer could be altered before (or during) the execution of processMessage.

For example: if I receive two messages "abc" and "wxyz" almost at the same time, the execution order could likely be:

  • "abc" is received, buffer = "abc" and the goroutine processMessage is scheduled (but not yet executed)
  • "wxyz" is received, buffer = "wxyz" and the goroutine processMessage is scheduled (but not yet executed)
  • processMessage is executed, but it will produce the message "wxy" (current value of the buffer, but truncated (because it uses a length of 3)) instead of "abc"
  • processMessage is executed for the second time, and will produce the message "wxyz"

I can see two solutions:

  • do not call processMessage asynchronously (but it will increase the risk of loosing packets if processMessage is too slow)
  • create a new buffer for each messages; i don't think the GC overhead will have a huge impact (startTCP already creates a new buffer for each messages, processMessage will allocate a new string for each messages in any case)

@hartfordfive
Copy link
Owner

hartfordfive commented Jul 21, 2017

@vcostet Thanks for the good feedback. Taking a second look at it yes I now see that I might have potentially overlooked this issue. The fact that I failed to take into account before is the byte slice is simply passed along essentially as a descriptor to the underlying data structure and not as a copy.

Although unlikely when logging a single message per second or less, I now see this could happen in environments with higher message volumes. Considering that in the processMessage function i end up converting the byte slice to a string, another option could be to immediately convert the byte slice to a string and then pass it along as a function argument like this:

	for {
		length, _, err := l.ReadFrom(buffer)
		if err != nil {
			logp.Err("Error reading from buffer: %v", err.Error())
			continue
		}
                if length == 0 {
		  continue
	        }
		go ll.processMessage(strings.TrimSpace(string(buffer[:length])))
	}

Considering strings are passed as copies of the original value, this would then eliminate the risk off the data being modified prior to the actual execution of the goroutine.

What do you think about that solution? Although yes the one you proposed would also work.

@hartfordfive
Copy link
Owner

Haven't heard anything back from you on this last post @vcostet although I'll simply merge your PR for the sake of simplicity and considering it should do the job just fine. Thanks again.

@hartfordfive hartfordfive merged commit 123c3af into hartfordfive:master Aug 29, 2017
@hartfordfive hartfordfive mentioned this pull request Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants