handle receiving duplicate packets #69

phemmer · 2016-07-10T03:32:19Z

fixes #68

As an unrelated note, it happens at numerous places in the code where a method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this PR replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of passing it?

Other example:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket in which .Variables can replace the pdus argument. .PDUType can replace the packetOut.PDUType argument, .MsgID can replace the msgID argument, and .RequestID can replace the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if these were cleaned up. I personally found it highly confusing trying to figure out why this was being done.

wdreeveii · 2016-07-10T08:57:24Z

Patrick, Thank you for taking the time to prepare a patch.

I do want to note for further discussion that this commit fundamentally
changes the way retransmits are performed. Placing the write outside of the
for loop in sendOneRequest seems that retransmits will not occur.

Here are a few comments on the existing design of this function
#31 (comment)

I need to understand the implications a little better here though.

On Sat, Jul 9, 2016 at 7:32 PM, Patrick Hemmer notifications@github.com
wrote:

fixes #68 #68

As an unrelated note, it happens at numerous places in the code where a
method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of
passing it?

Other examples:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket
https://godoc.org/github.com/soniah/gosnmp#SnmpPacket in which
.Variables can replace the pdus argument. .PDUType can replace the
packetOut.PDUType argument, .MsgID can replace the msgID argument, and
.RequestID can replacea the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if

these were cleaned up.

You can view, comment on, or merge this pull request online at:

#69
Commit Summary

handle receiving duplicate packets

File Changes

M marshal.go
https://github.com/soniah/gosnmp/pull/69/files#diff-0 (83)

M marshal_test.go
https://github.com/soniah/gosnmp/pull/69/files#diff-1 (69)

Patch Links:

https://github.com/soniah/gosnmp/pull/69.patch

https://github.com/soniah/gosnmp/pull/69.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#69, or mute the thread
https://github.com/notifications/unsubscribe/AApLwVmWwiuO2WsKdvZglYjdAo11-sFKks5qUGfEgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

wdreeveii · 2016-07-10T09:19:01Z

Alright, I have verified

Timeout based retransmits are performed in the sendOneRequest layer, so
should remain unaffected.
We are now talking about retransmits that occur when the expected buffer
is too small. I believe this is used to avoid the complexity behind
fragmented reads on a udp socket.

It would be a good idea to write a test for this scenario, request
something we expect to be small, but is actually quite large.

On Sun, Jul 10, 2016 at 12:57 AM, Whitham Reeve thetawaves@gmail.com
wrote:

Patrick, Thank you for taking the time to prepare a patch.

I do want to note for further discussion that this commit fundamentally
changes the way retransmits are performed. Placing the write outside of the
for loop in sendOneRequest seems that retransmits will not occur.

Here are a few comments on the existing design of this function
#31 (comment)

I need to understand the implications a little better here though.

On Sat, Jul 9, 2016 at 7:32 PM, Patrick Hemmer notifications@github.com
wrote:

fixes #68 #68

As an unrelated note, it happens at numerous places in the code where a
method is called on a struct, and is passed members of itself as arguments.
For example in the previous code that this replaced:

x.dispatch(x.Conn, outBuf, expected)

Why not have the dispatch method just access x.Conn itself instead of
passing it?

Other examples:

packetOut.marshalMsg(pdus, packetOut.PDUType, msgID, reqID)

packetOut is a SnmpPacket
https://godoc.org/github.com/soniah/gosnmp#SnmpPacket in which
.Variables can replace the pdus argument. .PDUType can replace the
packetOut.PDUType argument, .MsgID can replace the msgID argument, and
.RequestID can replacea the reqID argument.

And numerous similar examples.

I think it would make the code easier for contributors to work with if

these were cleaned up.

You can view, comment on, or merge this pull request online at:

#69
Commit Summary

handle receiving duplicate packets

File Changes

M marshal.go
https://github.com/soniah/gosnmp/pull/69/files#diff-0 (83)

M marshal_test.go
https://github.com/soniah/gosnmp/pull/69/files#diff-1 (69)

Patch Links:

https://github.com/soniah/gosnmp/pull/69.patch

https://github.com/soniah/gosnmp/pull/69.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#69, or mute the thread
https://github.com/notifications/unsubscribe/AApLwVmWwiuO2WsKdvZglYjdAo11-sFKks5qUGfEgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

Whitham D. Reeve II
907-764-5263

phemmer · 2016-07-10T18:06:25Z

We are now talking about retransmits that occur when the expected buffer
is too small. I believe this is used to avoid the complexity behind
fragmented reads on a udp socket.

I don't see how this PR has any impact on that.
Even before this PR was created, if a response is received bigger than the buffer used to read, dispatch() would return an error. sendOneRequest would then resend the request, without changing the receive buffer size, and the error would recur.
With the new code, the exact same thing happens.

~~I'm not saying it's not a problem, but it's not a result of this PR.~~
Edit: nix that, I forgot the retry in the above scenario was done within dispatch(), not sendOneRequest().
Also the fix is really simple. Allocate a 64kb buffer instead of trying to guess one. No more partial reads, smaller code, and slightly (infinitesimally) faster as well.

The only behavior change this PR introduces is that when a response is received, but it either cannot be unmarshalled, or it contains an unexpected request ID, we continue waiting for the response instead of sending a new request.

Oh, and and I just realized that timeout retry behavior in the issue you linked is also another cause of this issue.
By dividing the timeout by number of retries, and resending after the partial timeout, if the response to the first request is received while waiting for the second, the code leaves the second response sitting on the socket buffer.

wdreeveii · 2016-07-10T19:01:58Z

I suppose on windows that is the expected behavior. Though not on unix like
systems where a read with a small buffer is successful - but the packet is
truncated. To receive the whole packet,
the request is sent once more and a read is attempted with a new larger
buffer.

Also, changing the buffer size to 64k does not change the problem. It
simply makes it less likely while causing this routine to use a whole lot
more memory. That is an issue.

On Sun, Jul 10, 2016 at 10:06 AM, Patrick Hemmer notifications@github.com
wrote:

We are now talking about retransmits that occur when the expected buffer
is too small. I believe this is used to avoid the complexity behind
fragmented reads on a udp socket.

I don't see how this PR has any impact on that.
Even before this PR was created, if a response is received bigger than the
buffer used to read, dispatch() would return an error. sendOneRequest
would then resend the request, without changing the receive buffer size,
and the error would recur.
With the new code, the exact same thing happens.

I'm not saying it's not a problem, but it's not a result of this PR. Also
the fix is really simple. Allocate a 64kb buffer instead of trying to guess
one. No more partial reads, smaller code, and slightly (infinitesimally)
faster as well.

The only behavior change this PR introduces is that when a response is
received, but it either cannot be unmarshalled, or it contains an
unexpected request ID, we continue waiting for the response instead of
sending a new request.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#69 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApLwZqYwekvOsXMAmSa4m-WpABZFssdks5qUTShgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

phemmer · 2016-07-10T19:07:22Z

Also, changing the buffer size to 64k does not change the problem. It
simply makes it less likely while causing this routine to use a whole lot
more memory. That is an issue.

Not true.

it makes the issue impossible. IP doesn't allow packets bigger than 64k, not even fragmented. Thus unless using something like a unix domain socket (which is not supported here), a response >64k is impossible.
Yes, it results in up to an additional 64kb of memory being used. I don't think 64kb of memory is a huge price to pay. And on linux systems with overcommit enabled (which is the default), it won't even use 64kb, it'll use however big the read is.

phemmer · 2016-07-10T19:27:48Z

I just wrote a quick benchmark with the idea:

Master branch:

# go test -run=X -bench=OneRequest -benchmem ./
PASS
BenchmarkSendOneRequest-8      30000         56819 ns/op        9173 B/op        160 allocs/op
ok      github.com/soniah/gosnmp    2.308s

64kb buffer:

# go test -run=X -bench=OneRequest -benchmem ./
PASS
BenchmarkSendOneRequest-8      30000         56872 ns/op        8149 B/op        159 allocs/op
ok      github.com/soniah/gosnmp    2.269s

phemmer · 2016-07-11T04:11:20Z

I've added a commit to use a 64kb read buffer.

wdreeveii · 2016-07-12T01:18:39Z

I rather like this commit.

Due diligence requires me to understand why this code was implemented in
the first place:

b84d786

and then:

3e6fc99

I'm guessing that the second commit supersedes any benefits provided by the
first commit. So that leaves undesirable memory usage on non-linux systems.
Do you have any comments or benchmarks there?

On Sun, Jul 10, 2016 at 8:11 PM, Patrick Hemmer notifications@github.com
wrote:

I've added a commit to use a 64kb read buffer.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#69 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApLwZVip1CVqDR2xS6WX5Ok2SpWzBlrks5qUcJpgaJpZM4JIwRj
.

Whitham D. Reeve II
907-764-5263

phemmer · 2016-07-12T02:00:39Z

It looks like both those commits were trying to address what is described here: https://blog.golang.org/go-slices-usage-and-internals at the bottom in the A possible "gotcha" section.

Looking at the old code (that was removed in b84d786), it was allocating 64kb each time something was received. If reference to that allocated slice was maintained, then it could end up sitting in memory until all references were released. So b84d786 tried to solve the issue by allocating less memory. But still if thousands of references were held, it would still add up to a lot of memory. So it looks like 3e6fc99 tried to then fix the issue by performing a copy of exactly the size needed, which would allow the original over-sized allocation to be released.

This PR changes the approach entirely so that instead of performing an allocation every time we read something, we re-use the same buffer over and over. Though we still do an allocation of the exact size needed to hold the response, and if thousands of those are held, then it could add up. But that would be the caller's fault for holding them, not gosnmp (at least it shouldn't be. I don't think there's anywhere that maintains a permanent reference to the received response).
We could allocate the 64k buffer within receive() (what was previously dispatch()), but the compiler escape detector isn't good enough yet to figure out that the variable doesn't escape the function, and so it ends up on the heap. So doing the 64k allocation within receive() would result in the GC doing a lot of work (though it wouldn't use up much memory). In the future, if the compiler escape detector gets good enough, we could change this. But it seems for now we just have to persist the buffer between calls.

Edit: I amended the commit to add a description given the controversial history

This commit changes the `receive()` method to use a static 64k buffer. This is done to alleviate the issue that arises where if we try to read a UDP message that is larger than the buffer, then the message is truncated. 64k is the maximum size of a IP packet (even when fragmented), so that is the size of our buffer. We then persist the buffer between `receive()` calls by placing it on the `GoSNMP` struct. This is so that the garbage collector is not constantly having to clean them up. In the future, if the golang compiler escape detector is smart enough to figure out that the buffer doesn't escape the function, we could declare the buffer within `receive()`, and let it sit on the stack. But for the moment the escape detector is not smart enough to do this.

benjamin-thomas · 2016-07-12T04:55:51Z

Hello @phemmer, you got the history of it right.

As a user of the library, I encountered this nasty bug where memory would not be released indeed, in a not obvious way.

The real solution was to copy the slice to force the runtime to deallocate this unreferenced memory (I think a comment explaining this needs to remain next to the code as that was definitely a gotcha).

The least I could do would be to test out your patch against my code, see if memory usage stays reasonable (at least in my use case).

phemmer · 2016-07-12T12:21:56Z

If we're this concerned about memory usage, it would be easy to write a test to perform a few thousand sendOneRequest() calls, hold on to the responses, and see how much much memory is used per-response. Then release the responses and ensure the GC reclaims all the memory. Can set a threshold on each so that if it ever goes above a certain amount, then the test fails.

benjamin-thomas · 2016-07-12T15:36:13Z

@phemmer if you can reproduce the problem (by avoiding the copying and triggering the bug), why not.

As I recall, I wasn't holding on to any reference, but I must admit I don't recall the details as clearly as I should.

Sorry I haven't had the time to look into it today, I'll try to look into it tomorrow.

phemmer · 2016-07-13T04:45:02Z

Test available at e3df90e

Results:

6a71adf (b84d786^)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 65888
Mem leaked per request: 0
--- FAIL: TestSendOneRequest_memory (0.19s)
    mem_test.go:119: Memory used per request > 500 bytes. Actual = 65888

b84d786

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 608
Mem leaked per request: 0
--- FAIL: TestSendOneRequest_memory (0.14s)
    mem_test.go:119: Memory used per request > 500 bytes. Actual = 608

3e6fc99

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 432
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

f6904fd (master)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 480
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

8267234 (this PR)

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 480
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.14s)

Note that the high memory usage (in 6a71adf) only happens with octet string responses. Meaning gosnmp is likely using a byte slice of the raw response when unmarshalling. Thus the memory usage could get even lower by doing a copy within the unmarshal.

Edit: indeed, as here is the result with copying the octet string during unmarshal:
404ef9e

# go test -v -run TestSendOneRequest_memory ./
github.com/soniah/gosnmp
=== RUN   TestSendOneRequest_memory
Mem used per request: 432
Mem leaked per request: 0
--- PASS: TestSendOneRequest_memory (0.13s)

soniah · 2016-08-04T22:00:13Z

I've been backpacking around Brazil for a few months :-) I'd like some time to review this while my technical brain reboots :-D

* Add a new and improved snmp plugin * update gosnmp for duplicate packet fix gosnmp/gosnmp#68 gosnmp/gosnmp#69

phemmer force-pushed the handle-dups branch 2 times, most recently from c93be00 to cf7eda1 Compare July 10, 2016 03:36

handle receiving duplicate packets

499dcd2

phemmer force-pushed the handle-dups branch from cf7eda1 to 499dcd2 Compare July 10, 2016 03:37

phemmer mentioned this pull request Jul 10, 2016

Alternate SNMP plugin influxdata/telegraf#1389

Merged

3 tasks

phemmer force-pushed the handle-dups branch 4 times, most recently from 26be96e to 07893db Compare July 11, 2016 04:22

phemmer force-pushed the handle-dups branch from 07893db to 8267234 Compare July 12, 2016 02:42

phemmer force-pushed the handle-dups branch from 8038689 to 8267234 Compare July 13, 2016 04:43

soniah merged commit 5644f07 into gosnmp:master Aug 5, 2016

phemmer deleted the handle-dups branch August 5, 2016 23:07

sparrc pushed a commit to influxdata/telegraf that referenced this pull request Aug 22, 2016

Alternate SNMP plugin (#1389)

7fac749

* Add a new and improved snmp plugin * update gosnmp for duplicate packet fix gosnmp/gosnmp#68 gosnmp/gosnmp#69

toni-moreno mentioned this pull request Oct 17, 2016

Out of order failure toni-moreno/snmpcollector#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle receiving duplicate packets #69

handle receiving duplicate packets #69

phemmer commented Jul 10, 2016 •

edited

Loading

wdreeveii commented Jul 10, 2016

fixes #68 #68

these were cleaned up.

wdreeveii commented Jul 10, 2016

fixes #68 #68

these were cleaned up.

phemmer commented Jul 10, 2016 •

edited

Loading

wdreeveii commented Jul 10, 2016

phemmer commented Jul 10, 2016 •

edited

Loading

phemmer commented Jul 10, 2016

phemmer commented Jul 11, 2016

wdreeveii commented Jul 12, 2016

phemmer commented Jul 12, 2016 •

edited

Loading

benjamin-thomas commented Jul 12, 2016

phemmer commented Jul 12, 2016

benjamin-thomas commented Jul 12, 2016

phemmer commented Jul 13, 2016 •

edited

Loading

soniah commented Aug 4, 2016

handle receiving duplicate packets #69

handle receiving duplicate packets #69

Conversation

phemmer commented Jul 10, 2016 • edited Loading

wdreeveii commented Jul 10, 2016

fixes #68 #68

these were cleaned up.

wdreeveii commented Jul 10, 2016

fixes #68 #68

these were cleaned up.

phemmer commented Jul 10, 2016 • edited Loading

wdreeveii commented Jul 10, 2016

phemmer commented Jul 10, 2016 • edited Loading

phemmer commented Jul 10, 2016

phemmer commented Jul 11, 2016

wdreeveii commented Jul 12, 2016

phemmer commented Jul 12, 2016 • edited Loading

benjamin-thomas commented Jul 12, 2016

phemmer commented Jul 12, 2016

benjamin-thomas commented Jul 12, 2016

phemmer commented Jul 13, 2016 • edited Loading

soniah commented Aug 4, 2016

phemmer commented Jul 10, 2016 •

edited

Loading

phemmer commented Jul 10, 2016 •

edited

Loading

phemmer commented Jul 10, 2016 •

edited

Loading

phemmer commented Jul 12, 2016 •

edited

Loading

phemmer commented Jul 13, 2016 •

edited

Loading