Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate bytecounts #30

Closed
dougburks opened this issue Nov 19, 2012 · 18 comments
Closed

Inaccurate bytecounts #30

dougburks opened this issue Nov 19, 2012 · 18 comments

Comments

@dougburks
Copy link

Hello,

We're running prads as follows:

prads -i eth0 -c $conf -u sguil -g sguil -L
/nsm/sensor_data/$SENSOR/sancp/ -f /nsm/sensor_data/$SENSOR/pads.fifo
-b 'ip or (vlan and ip)'

In Sguil, if I query the sancp table and look at the connections, the source and destination bytes are way too big.

Example:

I do the following:
curl testmyids.com

For that connection, Sguil shows the following:
S Bytes: 109704
D Bytes: 43697

If I look at the pcap in Wireshark, it's only 1032 bytes total.

Where did 109704 and 43697 come from?

Thanks!

@comotion
Copy link
Collaborator

looking at the code, the s/d bytes are computed on the basis of all bytes sent and received including tcp headers. AFAICT the wireshark counts could be just data bytes. Maybe that's what's causing the discrepancy?

@dougburks
Copy link
Author

Even if we just look at the lower number, it's 40 times as big as what
wireshark is reporting. I don't think tcp headers would be responsible for
a 40x increase.

Can you try duplicating my test with "curl testmyids.com"? You'll see that
the traffic is nowhere near what prads is reporting.

Thanks!

On Wednesday, November 21, 2012, Kacper Why wrote:

looking at the code, the s/d bytes are computed on the basis of all bytes
sent and received including tcp headers. AFAICT the wireshark counts
could be just data bytes. Maybe that's what's causing the discrepancy?


Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-10609258.

Doug Burks
http://securityonion.blogspot.com

@dougburks
Copy link
Author

Good morning PRADS developers!

Have you had a chance to try and duplicate my issue?

Thanks!

@dougburks
Copy link
Author

pi->packet_bytes is using pi->ip4->ip_len:

src/prads.c:    pi->packet_bytes = (pi->ip4->ip_len - (IP_HL(pi->ip4) * 4));

Elsewhere in the code, pi->ip4->ip_len is referenced using its ntohs value. For example:

src/sig_tcp.c:            opt_ptr = (uint8_t *) pi->ip4 + ntohs(pi->ip4->ip_len); // fixed from htons
src/sig_tcp.c:            e.size = (ftype == TF_ACK) ? 0 : ntohs(pi->ip4->ip_len);

Should prads.c be using ntohs?

@dougburks
Copy link
Author

I added ntohs to prads.c and the numbers seem to make more sense.

Default version:

1354539036000000001|2012-12-03 12:50:36|2012-12-03 12:50:37|1|6|2886759668|37361|3651154719|80|6|108680|5|52893|27|27

Patched with ntohs:

1354538979000000001|2012-12-03 12:49:39|2012-12-03 12:49:39|0|6|2886759668|37360|3651154719|80|6|305|5|363|27|27

Thoughts?

gamelinux pushed a commit that referenced this issue Dec 3, 2012
…ug Burks for finding, reporting, debugging, and possible fix!
@comotion
Copy link
Collaborator

comotion commented Dec 4, 2012

This along with #31 and #33 should maybe close this issue, but needs testing. I've run it through my usual suspects but I'd feel better with moar testing .@dougburks can you test @comotion/master branch, ie pull request #33 and see if things are honky dory?

@dougburks
Copy link
Author

Was hoping to test this, but can't until Issue #34 is fixed.

@dougburks
Copy link
Author

curl testmyids.com
uid=0(root) gid=0(root) groups=0(root)

Old prads (with ntohs patch):
src pkts: 6
src bytes: 309
dst pkts: 4
dst bytes: 347

prads 0.3.2-rc3
src pkts: 6
src bytes: 309
dst pkts: 4
dst bytes: 347

(no difference in this case between 0.3.2-rc3 and previous version with ntohs patch)

wireshark:
src pkts: 6
src bytes: 537
dst pkts: 4
dst bytes: 495

The prads numbers are much closer to the Wireshark numbers since adding ntohs, but they still don't seem 100% correct. I believe Wireshark is reporting all bytes including IP and TCP header, but PRADS doesn't include IP header? That would be 20 bytes each for 10 packets or 200 bytes total difference. So if Wireshark is showing 1032 bytes total, I would expect PRADS to show 1032-200=832 bytes. But PRADS is reporting 656 bytes total.

What am I missing?

Thanks!

@gamelinux
Copy link
Owner

Does any of the packets have tcp options etc? Are all the IP headers 20 bytes? When making cxtracker and prads, the aim was to replace sancp when it comes to how it outputs session (in regards to be used in sguil), which would be better to compare prads to, than wireshark.

@dougburks
Copy link
Author

All IP headers in this stream are 20 bytes.

The first two packets have TCP options:
Packet 1: 24 bytes of TCP options
Packet 2: 8 bytes of TCP options

The remaining eight packets have no TCP options so they are standard 20-byte TCP headers.

Alec Waters and myself have tried many methods but have been unable to understand the numbers. Can you definitively tell us what the prads bytecount is supposed to be measuring? Is it supposed to include IP headers? Is it supposed to include TCP headers?

Thanks!

@gamelinux
Copy link
Owner

First off, are there any retransmissions also?

The main line for what it counts (for IPv4):

pi->packet_bytes = (ntohs(pi->ip4->ip_len) - (IP_HL(pi->ip4) * 4));

and then:

cxt->x_total_bytes += pi->packet_bytes;

Translated, that is:
For each packet in a session, the byte counter is incremented (respectively to if it is the source or destination sending the packet) with the "ip_len - IP_HL", meaning the total length of the IP packet, minus the IP Header length.

So that will include for example the TCP header with its TCP options, and the TCP payload.

In cxtracker, It used to be the length of the payload of UDP or TCP. But when we moved to Indexing when doing full pcap, we use the length of the entire packet as received from libpcap (pcap_pkthdr->len) so it will match directly to where in the raw pcap file you will find the start of a session etc.

Hope this clarifies the issue.

If this is not how you would like it to be, please tell us what standard you would like it to obey.
I would guess TCP/UDP payload byte count would be the most accurate, but as prads dont
take in account TCP fragmentation and stream-reassembly, this will never be correct :)

Hacky Xmas btw!

@comotion
Copy link
Collaborator

I guess we could do raw bytes if this is "more correct"

@dougburks
Copy link
Author

I'd be OK with the current standard if I understood exactly what it was and how to reconcile the numerical differences between prads and other tools like Wireshark and Argus. If I can't reconcile the differences, then I'd be concerned that there might be a bug lurking somewhere. I sent a pcap of "curl testmyids.com" (the example used in this thread) to Edward to see if he could help me understand how to reconcile.

When you say "raw bytes", you mean include all of the IP header, TCP/UDP header, and payload, right? I think that might be more user-friendly and intuitive for most analysts, especially when they start comparing to other tools like Wireshark and Argus.

So would it just be changing this:

pi->packet_bytes = (ntohs(pi->ip4->ip_len) - (IP_HL(pi->ip4) * 4));

to this?

pi->packet_bytes = (ntohs(pi->ip4->ip_len);

Thanks!

@taosecurity
Copy link

I love seeing PRADS in Sguil and Security Onion, but I share Doug's concerns about inconsistencies compared with tools like Argus and Wireshark. I would really like to see PRADS adopt the same byte counting options that a tool like Argus provides. When there is yet a third version, that doesn't seem very easy to understand, it makes life difficult for analysts. Thank you.

@comotion
Copy link
Collaborator

@taosecurity and @dougburks please test it out and let us know

@dougburks
Copy link
Author

With the latest patch, were we expecting the numbers to match Argus and Wireshark? I just tested with "pi->packet_bytes = ntohs(pi->ip4->ip_len);" and the numbers are closer, but still don't match. Using the same testmyids.com pcap that I used previously in this issue (and sent to Edward via email), I get the following results:
tcpreplay reports it's sending 1032 bytes
argus reports 1032 bytes
wireshark reports 1032 bytes
prads reports 856 bytes

So now it looks like the difference is the ethernet frame header. Would it make sense to include the ethernet frame header in prads so that it matches Argus/Wireshark/tcpreplay?

Thanks!

@dougburks
Copy link
Author

As another data point, Bro and NetworkMiner are reporting 856 bytes just like prads. So I'm OK with keeping "pi->packet_bytes = ntohs(pi->ip4->ip_len);" if everybody else is. Thanks!

@comotion
Copy link
Collaborator

That's the thing... I've gone with your initial suggestion because our original behavior is arbitrary and so is the byte counting behavior of other tools. Including the ethernet header doesn't make much sense to me: if you are sniffing two sides of a conversation between two different lan segments you will have different byte counts due to encapsulation, vlan differences etc.

however, I don't have particularily strong feelings about this and am happy to put it in line with whatever seems reasonable. We'll keep it like this - bro/networkminer-style and perhaps introduce a new option if enough people want the other kind of counting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants