Skip to content

Conversation

pboothe
Copy link
Contributor

@pboothe pboothe commented Oct 23, 2019

@coveralls
Copy link

coveralls commented Oct 23, 2019

Pull Request Test Coverage Report for Build 26

  • 177 of 177 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 6: 0.0%
Covered Lines: 177
Relevant Lines: 177

💛 - Coveralls

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @pboothe and @stephen-soltesz)


saver/saver.go, line 39 at r2 (raw file):

	}
	switch nl.LayerType() {
	case ((*layers.IPv4)(nil)).LayerType():

Can we explicitly use layers.LayerTypeIPv4 and layers.LayerTypeIPv6 in the case, rather than the extra casting, etc?


saver/saver.go, line 92 at r2 (raw file):

Pchan write channel is connected to the Pchan channel.

This sentence looks like a typo.


saver/saver.go, line 115 at r2 (raw file):

	metrics.SaversStarted.Inc()
	defer metrics.SaversStopped.Inc()
	defer s.state.Set("stopped")

SaverCount is a gauge, so all states should return to zero once all connections are closed. If I've traced things correctly, "stopped" is never decremented again, making it a counter. A final .Dec in Stop might resolve that.


saver/saver.go, line 165 at r2 (raw file):

	// discovering the size of the application layer and then subtracting it
	// from the overall size of the packet data. IPv6 supports variable-length
	// headers, so this is actually required.

Just clarifying: what behaves differently when the packet is IPv4?


saver/saver.go, line 165 at r2 (raw file):

	// discovering the size of the application layer and then subtracting it
	// from the overall size of the packet data. IPv6 supports variable-length
	// headers, so this is actually required.

Just clarifying: IPv6 headers are variable length -- per-packet? Or is it stable for a given 4-tuple? I'm trying to understand why/how WriteFileHeader works.


saver/saver.go, line 181 at r2 (raw file):

func (s *Saver) savePacket(w *pcapgo.Writer, p gopacket.Packet, headerLen int) {
	info := p.Metadata().CaptureInfo
	info.CaptureLength = minInt(info.CaptureLength, headerLen)

Please describe why this modification is needed and a hint about how WritePacket depends on it considering we unconditionally use :headerLen in the p.Data() below.

What happens when info.CaptureLength is less than headerLen? ... how or when is that even possible? Does pcap capture partial packets?


saver/saver.go, line 203 at r2 (raw file):

			// Lose all race conditions, because saver requires that nobody hold
			// onto the Pchan channel for more than a millisecond.
			time.Sleep(time.Millisecond)

I can't convince myself that this is guaranteed to be correct.

Without synchronization on the writer and without guarantees about how the go scheduler behaves (maybe there are), I want to say that there is still a non-zero chance of another go routine that was about to write to s.Pchan holding a reference longer than this delay before being scheduled again after the close.

The longer this delay is the less probable that becomes. 1ms is the scheduler quantum.

If there is a defensible argument for performance to avoid the synchronization between this and the writer, I'm open to that. If Go makes guarantees about goroutine scheduling that prevents the potential race here, or makes it even less probable, I'm open to that too. But, in those cases, we might want bigger hazard lights.

Copy link
Contributor Author

@pboothe pboothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)


saver/saver.go, line 39 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Can we explicitly use layers.LayerTypeIPv4 and layers.LayerTypeIPv6 in the case, rather than the extra casting, etc?

Done.


saver/saver.go, line 92 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
Pchan write channel is connected to the Pchan channel.

This sentence looks like a typo.

Done.


saver/saver.go, line 115 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

SaverCount is a gauge, so all states should return to zero once all connections are closed. If I've traced things correctly, "stopped" is never decremented again, making it a counter. A final .Dec in Stop might resolve that.

Done.


saver/saver.go, line 165 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Just clarifying: what behaves differently when the packet is IPv4?

Done.


saver/saver.go, line 165 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Just clarifying: IPv6 headers are variable length -- per-packet? Or is it stable for a given 4-tuple? I'm trying to understand why/how WriteFileHeader works.

I believe that they are stable for a given 4-tuple. I have added that assumption as a comment.


saver/saver.go, line 181 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Please describe why this modification is needed and a hint about how WritePacket depends on it considering we unconditionally use :headerLen in the p.Data() below.

What happens when info.CaptureLength is less than headerLen? ... how or when is that even possible? Does pcap capture partial packets?

pcap captures partial packets. That's how it works. You have to specify N, the number of bytes of each packet you would like to capture, when you set up the capture.


saver/saver.go, line 203 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

I can't convince myself that this is guaranteed to be correct.

Without synchronization on the writer and without guarantees about how the go scheduler behaves (maybe there are), I want to say that there is still a non-zero chance of another go routine that was about to write to s.Pchan holding a reference longer than this delay before being scheduled again after the close.

The longer this delay is the less probable that becomes. 1ms is the scheduler quantum.

If there is a defensible argument for performance to avoid the synchronization between this and the writer, I'm open to that. If Go makes guarantees about goroutine scheduling that prevents the potential race here, or makes it even less probable, I'm open to that too. But, in those cases, we might want bigger hazard lights.

Figured out a better way. Stop no longer exists.

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "caller closes channel" convention change is really good. The shape of everything looks good to me. There are a number of small things so I expect this to be the last round.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @pboothe)


saver/saver.go, line 92 at r3 (raw file):

// Saver provides two channels to allow packets to be saved. A well-buffered
// channel for packets and a channel to receive the UUID.
type Saver struct {

This is an aside; not critical for this PR. The naming convention for packages and types like saver.Saver is common in Python but discouraged in Go. For Go, the preferable convention is to name the package.type so it's easy to read, for example packet.Saver or flow.Saver and expresses what it does.

My comment the other day about gopacket.Packet was b/c they had done a poor job of that. I was surprised to see that pattern from a google owned repo.


saver/saver.go, line 183 at r3 (raw file):

Quoted 5 lines of code…
		if ok {
			s.savePacket(w, p, headerLen)
		} else {
			break
		}

I suggest the formatting below to eliminate the else block:

if !ok {
    break
}
s.savePacket(w, p, headerLen)

My understanding is that this pattern is encouraged -- https://golang.org/doc/effective_go.html#if


saver/saver.go, line 212 at r3 (raw file):

	// packet. Because we can't be sure how big a header will be before we have
	// observed the flow, we have set N to be large and we trim packets down
	// here to not waste space when saving them to disk.

And, to prevent accidental saving PII embedded in the data portion of the packet?


saver/saver.go, line 216 at r3 (raw file):

saved in the data 

s/saved in the data//g i.e. ".. bytes are actually returned from the pcap system"

The phrases "saved length" and "saved in the data" are confusing here b/c iiuc, they're describing bytes read into memory that have not been yet and may not all be written to disk.


saver/saver.go, line 221 at r3 (raw file):

	info := p.Metadata().CaptureInfo
	info.CaptureLength = minInt(info.CaptureLength, headerLen)
	w.WritePacket(info, p.Data()[:headerLen])

If CaptureLength can be less than headerLen, then why is p.Data()[:headerLen] safe?

I would expect this to panic with an out of range error.

Is the p.Data() buffer always allocated with a larger capacity?


saver/saver.go, line 238 at r3 (raw file):

	// If synchronization between UUID creation and packet collection is off by
	// more than a second, things are messed up.
	pchan := make(chan gopacket.Packet, 833333)

I understand the Packet is a pointer but iiuc, there is an object that contains 1500 bytes right? If so, please explicitly mention 1.25GB of memory max.


saver/saver.go, line 258 at r3 (raw file):

// StartNew creates a new Saver to save a single TCP flow and starts its
// goroutine. A saver's goroutine can be stopped either by cancelling the

Please state explicitly explicit that it is the callers responsibility to either cancel the context or close Pchan.


saver/saver_test.go, line 218 at r3 (raw file):

	go func() {
		// Get packets from a wireshark-produced pcap file.
		handle, err := pcap.OpenOffline("test.pcap")

How was the test.pcap generated? Above you took care to comment that "IPs have been changed to prevent exposure of internal IPs." I spy ...

Also, some of the packets report as "incorrect checksum" -- is that expected? If so, why?

(There may be a better expression)

tcpdump -nnvvXSs 1500 -vvvv -r ~/Downloads/test.pcap 

Could the sample include some packets with a larger payload?

Copy link
Contributor Author

@pboothe pboothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)


saver/saver.go, line 92 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

This is an aside; not critical for this PR. The naming convention for packages and types like saver.Saver is common in Python but discouraged in Go. For Go, the preferable convention is to name the package.type so it's easy to read, for example packet.Saver or flow.Saver and expresses what it does.

My comment the other day about gopacket.Packet was b/c they had done a poor job of that. I was surprised to see that pattern from a google owned repo.

Now TCP


saver/saver.go, line 183 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
		if ok {
			s.savePacket(w, p, headerLen)
		} else {
			break
		}

I suggest the formatting below to eliminate the else block:

if !ok {
    break
}
s.savePacket(w, p, headerLen)

My understanding is that this pattern is encouraged -- https://golang.org/doc/effective_go.html#if

Done.


saver/saver.go, line 212 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

And, to prevent accidental saving PII embedded in the data portion of the packet?

Added to comment


saver/saver.go, line 216 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
saved in the data 

s/saved in the data//g i.e. ".. bytes are actually returned from the pcap system"

The phrases "saved length" and "saved in the data" are confusing here b/c iiuc, they're describing bytes read into memory that have not been yet and may not all be written to disk.

Done.


saver/saver.go, line 221 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

If CaptureLength can be less than headerLen, then why is p.Data()[:headerLen] safe?

I would expect this to panic with an out of range error.

Is the p.Data() buffer always allocated with a larger capacity?

I had thought that a slice with bounds past the end just returns the slice up to the end. In actual fact I had no test data that exercised that case and you were right about the crash. Fixed.


saver/saver.go, line 238 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

I understand the Packet is a pointer but iiuc, there is an object that contains 1500 bytes right? If so, please explicitly mention 1.25GB of memory max.

Done.


saver/saver.go, line 258 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Please state explicitly explicit that it is the callers responsibility to either cancel the context or close Pchan.

Done.


saver/saver_test.go, line 218 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

How was the test.pcap generated? Above you took care to comment that "IPs have been changed to prevent exposure of internal IPs." I spy ...

Also, some of the packets report as "incorrect checksum" -- is that expected? If so, why?

(There may be a better expression)

tcpdump -nnvvXSs 1500 -vvvv -r ~/Downloads/test.pcap 

Could the sample include some packets with a larger payload?

I hadn't worried about updating the checksum, so when I edited the IPs by hand the checksum (rightly) started failing. I'll collect new PCAPs.

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 5 files at r4.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained

@pboothe pboothe merged commit 22073eb into master Oct 25, 2019
@pboothe pboothe deleted the add-saver branch October 25, 2019 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants