Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle receiving duplicate packets #69

Merged
merged 2 commits into from
Aug 5, 2016
Merged

Conversation

phemmer
Copy link
Contributor

@phemmer phemmer commented Jul 10, 2016

fixes #68


As an unrelated note, it happens at numerous places in the code where a method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this PR replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of passing it?

Other example:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket in which .Variables can replace the pdus argument. .PDUType can replace the packetOut.PDUType argument, .MsgID can replace the msgID argument, and .RequestID can replace the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if these were cleaned up. I personally found it highly confusing trying to figure out why this was being done.

@phemmer phemmer force-pushed the handle-dups branch 2 times, most recently from c93be00 to cf7eda1 Compare July 10, 2016 03:36
@wdreeveii
Copy link
Collaborator

Patrick, Thank you for taking the time to prepare a patch.

I do want to note for further discussion that this commit fundamentally
changes the way retransmits are performed. Placing the write outside of the
for loop in sendOneRequest seems that retransmits will not occur.

Here are a few comments on the existing design of this function
#31 (comment)

I need to understand the implications a little better here though.

On Sat, Jul 9, 2016 at 7:32 PM, Patrick Hemmer notifications@github.com
wrote:

fixes #68 #68

As an unrelated note, it happens at numerous places in the code where a
method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of
passing it?

Other examples:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket
https://godoc.org/github.com/soniah/gosnmp#SnmpPacket in which
.Variables can replace the pdus argument. .PDUType can replace the
packetOut.PDUType argument, .MsgID can replace the msgID argument, and
.RequestID can replacea the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if

these were cleaned up.

You can view, comment on, or merge this pull request online at:

#69
Commit Summary

  • handle receiving duplicate packets

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#69, or mute the thread
https://github.com/notifications/unsubscribe/AApLwVmWwiuO2WsKdvZglYjdAo11-sFKks5qUGfEgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

@wdreeveii
Copy link
Collaborator

Alright, I have verified

  1. Timeout based retransmits are performed in the sendOneRequest layer, so
    should remain unaffected.
  2. We are now talking about retransmits that occur when the expected buffer
    is too small. I believe this is used to avoid the complexity behind
    fragmented reads on a udp socket.

It would be a good idea to write a test for this scenario, request
something we expect to be small, but is actually quite large.

On Sun, Jul 10, 2016 at 12:57 AM, Whitham Reeve thetawaves@gmail.com
wrote:

Patrick, Thank you for taking the time to prepare a patch.

I do want to note for further discussion that this commit fundamentally
changes the way retransmits are performed. Placing the write outside of the
for loop in sendOneRequest seems that retransmits will not occur.

Here are a few comments on the existing design of this function
#31 (comment)

I need to understand the implications a little better here though.

On Sat, Jul 9, 2016 at 7:32 PM, Patrick Hemmer notifications@github.com
wrote:

fixes #68 #68

As an unrelated note, it happens at numerous places in the code where a
method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of
passing it?

Other examples:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket
https://godoc.org/github.com/soniah/gosnmp#SnmpPacket in which
.Variables can replace the pdus argument. .PDUType can replace the
packetOut.PDUType argument, .MsgID can replace the msgID argument, and
.RequestID can replacea the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if

these were cleaned up.

You can view, comment on, or merge this pull request online at:

#69
Commit Summary

  • handle receiving duplicate packets

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#69, or mute the thread
https://github.com/notifications/unsubscribe/AApLwVmWwiuO2WsKdvZglYjdAo11-sFKks5qUGfEgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

Whitham D. Reeve II
907-764-5263

@phemmer
Copy link
Contributor Author

phemmer commented Jul 10, 2016

  1. We are now talking about retransmits that occur when the expected buffer
    is too small. I believe this is used to avoid the complexity behind
    fragmented reads on a udp socket.

I don't see how this PR has any impact on that.
Even before this PR was created, if a response is received bigger than the buffer used to read, dispatch() would return an error. sendOneRequest would then resend the request, without changing the receive buffer size, and the error would recur.
With the new code, the exact same thing happens.

I'm not saying it's not a problem, but it's not a result of this PR.
Edit: nix that, I forgot the retry in the above scenario was done within dispatch(), not sendOneRequest().
Also the fix is really simple. Allocate a 64kb buffer instead of trying to guess one. No more partial reads, smaller code, and slightly (infinitesimally) faster as well.

The only behavior change this PR introduces is that when a response is received, but it either cannot be unmarshalled, or it contains an unexpected request ID, we continue waiting for the response instead of sending a new request.

Oh, and and I just realized that timeout retry behavior in the issue you linked is also another cause of this issue.
By dividing the timeout by number of retries, and resending after the partial timeout, if the response to the first request is received while waiting for the second, the code leaves the second response sitting on the socket buffer.

@wdreeveii
Copy link
Collaborator

I suppose on windows that is the expected behavior. Though not on unix like
systems where a read with a small buffer is successful - but the packet is
truncated. To receive the whole packet,
the request is sent once more and a read is attempted with a new larger
buffer.

Also, changing the buffer size to 64k does not change the problem. It
simply makes it less likely while causing this routine to use a whole lot
more memory. That is an issue.

On Sun, Jul 10, 2016 at 10:06 AM, Patrick Hemmer notifications@github.com
wrote:

  1. We are now talking about retransmits that occur when the expected buffer
    is too small. I believe this is used to avoid the complexity behind
    fragmented reads on a udp socket.

I don't see how this PR has any impact on that.
Even before this PR was created, if a response is received bigger than the
buffer used to read, dispatch() would return an error. sendOneRequest
would then resend the request, without changing the receive buffer size,
and the error would recur.
With the new code, the exact same thing happens.

I'm not saying it's not a problem, but it's not a result of this PR. Also
the fix is really simple. Allocate a 64kb buffer instead of trying to guess
one. No more partial reads, smaller code, and slightly (infinitesimally)
faster as well.

The only behavior change this PR introduces is that when a response is
received, but it either cannot be unmarshalled, or it contains an
unexpected request ID, we continue waiting for the response instead of
sending a new request.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#69 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApLwZqYwekvOsXMAmSa4m-WpABZFssdks5qUTShgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

@phemmer
Copy link
Contributor Author

phemmer commented Jul 10, 2016

Also, changing the buffer size to 64k does not change the problem. It
simply makes it less likely while causing this routine to use a whole lot
more memory. That is an issue.

Not true.

  1. it makes the issue impossible. IP doesn't allow packets bigger than 64k, not even fragmented. Thus unless using something like a unix domain socket (which is not supported here), a response >64k is impossible.
  2. Yes, it results in up to an additional 64kb of memory being used. I don't think 64kb of memory is a huge price to pay. And on linux systems with overcommit enabled (which is the default), it won't even use 64kb, it'll use however big the read is.

@phemmer
Copy link
Contributor Author

phemmer commented Jul 10, 2016

I just wrote a quick benchmark with the idea:

Master branch:

# go test -run=X -bench=OneRequest -benchmem ./
PASS
BenchmarkSendOneRequest-8      30000         56819 ns/op        9173 B/op        160 allocs/op
ok      github.com/soniah/gosnmp    2.308s

64kb buffer:

# go test -run=X -bench=OneRequest -benchmem ./
PASS
BenchmarkSendOneRequest-8      30000         56872 ns/op        8149 B/op        159 allocs/op
ok      github.com/soniah/gosnmp    2.269s

@phemmer
Copy link
Contributor Author

phemmer commented Jul 11, 2016

I've added a commit to use a 64kb read buffer.

@phemmer phemmer force-pushed the handle-dups branch 4 times, most recently from 26be96e to 07893db Compare July 11, 2016 04:22
@wdreeveii
Copy link
Collaborator

I rather like this commit.

Due diligence requires me to understand why this code was implemented in
the first place:

b84d786

and then:

3e6fc99

I'm guessing that the second commit supersedes any benefits provided by the
first commit. So that leaves undesirable memory usage on non-linux systems.
Do you have any comments or benchmarks there?

On Sun, Jul 10, 2016 at 8:11 PM, Patrick Hemmer notifications@github.com
wrote:

I've added a commit to use a 64kb read buffer.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#69 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApLwZVip1CVqDR2xS6WX5Ok2SpWzBlrks5qUcJpgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

@phemmer
Copy link
Contributor Author

phemmer commented Jul 12, 2016

It looks like both those commits were trying to address what is described here: https://blog.golang.org/go-slices-usage-and-internals at the bottom in the A possible "gotcha" section.

Looking at the old code (that was removed in b84d786), it was allocating 64kb each time something was received. If reference to that allocated slice was maintained, then it could end up sitting in memory until all references were released. So b84d786 tried to solve the issue by allocating less memory. But still if thousands of references were held, it would still add up to a lot of memory. So it looks like 3e6fc99 tried to then fix the issue by performing a copy of exactly the size needed, which would allow the original over-sized allocation to be released.

This PR changes the approach entirely so that instead of performing an allocation every time we read something, we re-use the same buffer over and over. Though we still do an allocation of the exact size needed to hold the response, and if thousands of those are held, then it could add up. But that would be the caller's fault for holding them, not gosnmp (at least it shouldn't be. I don't think there's anywhere that maintains a permanent reference to the received response).
We could allocate the 64k buffer within receive() (what was previously dispatch()), but the compiler escape detector isn't good enough yet to figure out that the variable doesn't escape the function, and so it ends up on the heap. So doing the 64k allocation within receive() would result in the GC doing a lot of work (though it wouldn't use up much memory). In the future, if the compiler escape detector gets good enough, we could change this. But it seems for now we just have to persist the buffer between calls.

Edit: I amended the commit to add a description given the controversial history

This commit changes the `receive()` method to use a static 64k buffer.
This is done to alleviate the issue that arises where if we try to read a UDP message that is larger than the buffer, then the message is truncated. 64k is the maximum size of a IP packet (even when fragmented), so that is the size of our buffer.
We then persist the buffer between `receive()` calls by placing it on the `GoSNMP` struct. This is so that the garbage collector is not constantly having to clean them up. In the future, if the golang compiler escape detector is smart enough to figure out that the buffer doesn't escape the function, we could declare the buffer within `receive()`, and let it sit on the stack. But for the moment the escape detector is not smart enough to do this.
@benjamin-thomas
Copy link
Contributor

Hello @phemmer, you got the history of it right.

As a user of the library, I encountered this nasty bug where memory would not be released indeed, in a not obvious way.

The real solution was to copy the slice to force the runtime to deallocate this unreferenced memory (I think a comment explaining this needs to remain next to the code as that was definitely a gotcha).

The least I could do would be to test out your patch against my code, see if memory usage stays reasonable (at least in my use case).

@phemmer
Copy link
Contributor Author

phemmer commented Jul 12, 2016

If we're this concerned about memory usage, it would be easy to write a test to perform a few thousand sendOneRequest() calls, hold on to the responses, and see how much much memory is used per-response. Then release the responses and ensure the GC reclaims all the memory. Can set a threshold on each so that if it ever goes above a certain amount, then the test fails.

@benjamin-thomas
Copy link
Contributor

@phemmer if you can reproduce the problem (by avoiding the copying and triggering the bug), why not.

As I recall, I wasn't holding on to any reference, but I must admit I don't recall the details as clearly as I should.

Sorry I haven't had the time to look into it today, I'll try to look into it tomorrow.

@phemmer
Copy link
Contributor Author

phemmer commented Jul 13, 2016

Test available at e3df90e

Results:

6a71adf (b84d786^)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 65888
Mem leaked per request: 0
--- FAIL: TestSendOneRequest_memory (0.19s)
    mem_test.go:119: Memory used per request > 500 bytes. Actual = 65888

b84d786

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 608
Mem leaked per request: 0
--- FAIL: TestSendOneRequest_memory (0.14s)
    mem_test.go:119: Memory used per request > 500 bytes. Actual = 608

3e6fc99

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 432
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

f6904fd (master)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 480
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

8267234 (this PR)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 480
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

Note that the high memory usage (in 6a71adf) only happens with octet string responses. Meaning gosnmp is likely using a byte slice of the raw response when unmarshalling. Thus the memory usage could get even lower by doing a copy within the unmarshal.

Edit: indeed, as here is the result with copying the octet string during unmarshal:
404ef9e

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 432
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.13s)

@soniah
Copy link
Collaborator

soniah commented Aug 4, 2016

I've been backpacking around Brazil for a few months :-) I'd like some time to review this while my technical brain reboots :-D

@soniah soniah merged commit 5644f07 into gosnmp:master Aug 5, 2016
@phemmer phemmer deleted the handle-dups branch August 5, 2016 23:07
sparrc pushed a commit to influxdata/telegraf that referenced this pull request Aug 22, 2016
* Add a new and improved snmp plugin

* update gosnmp for duplicate packet fix

gosnmp/gosnmp#68
gosnmp/gosnmp#69
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate packets put connection in bad state
4 participants