-
Notifications
You must be signed in to change notification settings - Fork 1
Adds the saver struct #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 26
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @pboothe and @stephen-soltesz)
saver/saver.go, line 39 at r2 (raw file):
} switch nl.LayerType() { case ((*layers.IPv4)(nil)).LayerType():
Can we explicitly use layers.LayerTypeIPv4
and layers.LayerTypeIPv6
in the case, rather than the extra casting, etc?
saver/saver.go, line 92 at r2 (raw file):
Pchan write channel is connected to the Pchan channel.
This sentence looks like a typo.
saver/saver.go, line 115 at r2 (raw file):
metrics.SaversStarted.Inc() defer metrics.SaversStopped.Inc() defer s.state.Set("stopped")
SaverCount is a gauge, so all states should return to zero once all connections are closed. If I've traced things correctly, "stopped" is never decremented again, making it a counter. A final .Dec
in Stop
might resolve that.
saver/saver.go, line 165 at r2 (raw file):
// discovering the size of the application layer and then subtracting it // from the overall size of the packet data. IPv6 supports variable-length // headers, so this is actually required.
Just clarifying: what behaves differently when the packet is IPv4?
saver/saver.go, line 165 at r2 (raw file):
// discovering the size of the application layer and then subtracting it // from the overall size of the packet data. IPv6 supports variable-length // headers, so this is actually required.
Just clarifying: IPv6 headers are variable length -- per-packet? Or is it stable for a given 4-tuple? I'm trying to understand why/how WriteFileHeader works.
saver/saver.go, line 181 at r2 (raw file):
func (s *Saver) savePacket(w *pcapgo.Writer, p gopacket.Packet, headerLen int) { info := p.Metadata().CaptureInfo info.CaptureLength = minInt(info.CaptureLength, headerLen)
Please describe why this modification is needed and a hint about how WritePacket depends on it considering we unconditionally use :headerLen
in the p.Data()
below.
What happens when info.CaptureLength is less than headerLen? ... how or when is that even possible? Does pcap capture partial packets?
saver/saver.go, line 203 at r2 (raw file):
// Lose all race conditions, because saver requires that nobody hold // onto the Pchan channel for more than a millisecond. time.Sleep(time.Millisecond)
I can't convince myself that this is guaranteed to be correct.
Without synchronization on the writer and without guarantees about how the go scheduler behaves (maybe there are), I want to say that there is still a non-zero chance of another go routine that was about to write to s.Pchan holding a reference longer than this delay before being scheduled again after the close.
The longer this delay is the less probable that becomes. 1ms is the scheduler quantum.
If there is a defensible argument for performance to avoid the synchronization between this and the writer, I'm open to that. If Go makes guarantees about goroutine scheduling that prevents the potential race here, or makes it even less probable, I'm open to that too. But, in those cases, we might want bigger hazard lights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)
saver/saver.go, line 39 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Can we explicitly use
layers.LayerTypeIPv4
andlayers.LayerTypeIPv6
in the case, rather than the extra casting, etc?
Done.
saver/saver.go, line 92 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Pchan write channel is connected to the Pchan channel.
This sentence looks like a typo.
Done.
saver/saver.go, line 115 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
SaverCount is a gauge, so all states should return to zero once all connections are closed. If I've traced things correctly, "stopped" is never decremented again, making it a counter. A final
.Dec
inStop
might resolve that.
Done.
saver/saver.go, line 165 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Just clarifying: what behaves differently when the packet is IPv4?
Done.
saver/saver.go, line 165 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Just clarifying: IPv6 headers are variable length -- per-packet? Or is it stable for a given 4-tuple? I'm trying to understand why/how WriteFileHeader works.
I believe that they are stable for a given 4-tuple. I have added that assumption as a comment.
saver/saver.go, line 181 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Please describe why this modification is needed and a hint about how WritePacket depends on it considering we unconditionally use
:headerLen
in thep.Data()
below.What happens when info.CaptureLength is less than headerLen? ... how or when is that even possible? Does pcap capture partial packets?
pcap captures partial packets. That's how it works. You have to specify N, the number of bytes of each packet you would like to capture, when you set up the capture.
saver/saver.go, line 203 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I can't convince myself that this is guaranteed to be correct.
Without synchronization on the writer and without guarantees about how the go scheduler behaves (maybe there are), I want to say that there is still a non-zero chance of another go routine that was about to write to s.Pchan holding a reference longer than this delay before being scheduled again after the close.
The longer this delay is the less probable that becomes. 1ms is the scheduler quantum.
If there is a defensible argument for performance to avoid the synchronization between this and the writer, I'm open to that. If Go makes guarantees about goroutine scheduling that prevents the potential race here, or makes it even less probable, I'm open to that too. But, in those cases, we might want bigger hazard lights.
Figured out a better way. Stop
no longer exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "caller closes channel" convention change is really good. The shape of everything looks good to me. There are a number of small things so I expect this to be the last round.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @pboothe)
saver/saver.go, line 92 at r3 (raw file):
// Saver provides two channels to allow packets to be saved. A well-buffered // channel for packets and a channel to receive the UUID. type Saver struct {
This is an aside; not critical for this PR. The naming convention for packages and types like saver.Saver
is common in Python but discouraged in Go. For Go, the preferable convention is to name the package.type so it's easy to read, for example packet.Saver
or flow.Saver
and expresses what it does.
My comment the other day about gopacket.Packet
was b/c they had done a poor job of that. I was surprised to see that pattern from a google owned repo.
saver/saver.go, line 183 at r3 (raw file):
Quoted 5 lines of code…
if ok { s.savePacket(w, p, headerLen) } else { break }
I suggest the formatting below to eliminate the else block:
if !ok {
break
}
s.savePacket(w, p, headerLen)
My understanding is that this pattern is encouraged -- https://golang.org/doc/effective_go.html#if
saver/saver.go, line 212 at r3 (raw file):
// packet. Because we can't be sure how big a header will be before we have // observed the flow, we have set N to be large and we trim packets down // here to not waste space when saving them to disk.
And, to prevent accidental saving PII embedded in the data portion of the packet?
saver/saver.go, line 216 at r3 (raw file):
saved in the data
s/saved in the data//g i.e. ".. bytes are actually returned from the pcap system"
The phrases "saved length" and "saved in the data" are confusing here b/c iiuc, they're describing bytes read into memory that have not been yet and may not all be written to disk.
saver/saver.go, line 221 at r3 (raw file):
info := p.Metadata().CaptureInfo info.CaptureLength = minInt(info.CaptureLength, headerLen) w.WritePacket(info, p.Data()[:headerLen])
If CaptureLength can be less than headerLen, then why is p.Data()[:headerLen]
safe?
I would expect this to panic with an out of range error.
Is the p.Data() buffer always allocated with a larger capacity?
saver/saver.go, line 238 at r3 (raw file):
// If synchronization between UUID creation and packet collection is off by // more than a second, things are messed up. pchan := make(chan gopacket.Packet, 833333)
I understand the Packet is a pointer but iiuc, there is an object that contains 1500 bytes right? If so, please explicitly mention 1.25GB of memory max.
saver/saver.go, line 258 at r3 (raw file):
// StartNew creates a new Saver to save a single TCP flow and starts its // goroutine. A saver's goroutine can be stopped either by cancelling the
Please state explicitly explicit that it is the callers responsibility to either cancel the context or close Pchan.
saver/saver_test.go, line 218 at r3 (raw file):
go func() { // Get packets from a wireshark-produced pcap file. handle, err := pcap.OpenOffline("test.pcap")
How was the test.pcap generated? Above you took care to comment that "IPs have been changed to prevent exposure of internal IPs." I spy ...
Also, some of the packets report as "incorrect checksum" -- is that expected? If so, why?
(There may be a better expression)
tcpdump -nnvvXSs 1500 -vvvv -r ~/Downloads/test.pcap
Could the sample include some packets with a larger payload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)
saver/saver.go, line 92 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
This is an aside; not critical for this PR. The naming convention for packages and types like
saver.Saver
is common in Python but discouraged in Go. For Go, the preferable convention is to name the package.type so it's easy to read, for examplepacket.Saver
orflow.Saver
and expresses what it does.My comment the other day about
gopacket.Packet
was b/c they had done a poor job of that. I was surprised to see that pattern from a google owned repo.
Now TCP
saver/saver.go, line 183 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
if ok { s.savePacket(w, p, headerLen) } else { break }
I suggest the formatting below to eliminate the else block:
if !ok { break } s.savePacket(w, p, headerLen)
My understanding is that this pattern is encouraged -- https://golang.org/doc/effective_go.html#if
Done.
saver/saver.go, line 212 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
And, to prevent accidental saving PII embedded in the data portion of the packet?
Added to comment
saver/saver.go, line 216 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
saved in the data
s/saved in the data//g i.e. ".. bytes are actually returned from the pcap system"
The phrases "saved length" and "saved in the data" are confusing here b/c iiuc, they're describing bytes read into memory that have not been yet and may not all be written to disk.
Done.
saver/saver.go, line 221 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
If CaptureLength can be less than headerLen, then why is
p.Data()[:headerLen]
safe?I would expect this to panic with an out of range error.
Is the p.Data() buffer always allocated with a larger capacity?
I had thought that a slice with bounds past the end just returns the slice up to the end. In actual fact I had no test data that exercised that case and you were right about the crash. Fixed.
saver/saver.go, line 238 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I understand the Packet is a pointer but iiuc, there is an object that contains 1500 bytes right? If so, please explicitly mention 1.25GB of memory max.
Done.
saver/saver.go, line 258 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Please state explicitly explicit that it is the callers responsibility to either cancel the context or close Pchan.
Done.
saver/saver_test.go, line 218 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
How was the test.pcap generated? Above you took care to comment that "IPs have been changed to prevent exposure of internal IPs." I spy ...
Also, some of the packets report as "incorrect checksum" -- is that expected? If so, why?
(There may be a better expression)
tcpdump -nnvvXSs 1500 -vvvv -r ~/Downloads/test.pcap
Could the sample include some packets with a larger payload?
I hadn't worried about updating the checksum, so when I edited the IPs by hand the checksum (rightly) started failing. I'll collect new PCAPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r4.
Reviewable status:complete! 1 of 1 approvals obtained
Part of https://github.com/m-lab/dev-tracker/issues/442
https://github.com/m-lab/dev-tracker/issues/443
This change is