Connection cutoff, like Time Machine #123
Comments
Came here to request this same feature. My environment will often have very large elephant flows, so I'd like to set a BPF-filter based connection cutoff like I currently do with time-machine to have longer pcap retention. +1 Relevant talk from Aashish at LBNL. |
So I looked into this and it really doesn't seem feasible without a massive addition to Stenographer. I think that the connection cutoff portion is best left out of Stenographer. My reasoning for this is that the reason Stenotype is so fast is that it doesn't analyze what it is writing at all--it simply writes blocks of memory from memory space shared from the Kernel to disk . Have you considered running something like Bro and then shunting the traffic with a tap aggregation switch? Generally at the point that you're concerned with only keeping 'portions' of traffic, you're going to have the hardware to do something like that... Keep in mind, I don't speak for the development team... I'm just pointing out the feature that you wan't isn't trivial to add while keeping the performance that Steno currently has :) |
Yes, we have all of that already in place, but we wouldn't want to shunt things like encrypted traffic (SSH, HTTPS, SMTPS, etc.), however we would only be interested in storing a small portion of that stream. Perhaps we can truncate the flow in our tapagg switch, but I haven't looked into it. |
You can use TrimPCAP to cut off/trim flows after they have been written to PCAP files. You can run TrimPCAP with different cutoff limits for different time periods to trim new data gently but only store a few kB from old flows, as explained here: |
We assume that most of the "interesting" data is found in the first few packets of a connection. Scott Campbell et. al describe a "heavy tail flow effect," wherein a small number of flows dominate the overall volume of data. To address this, Time Machine supports a "connection cutoff," which I understand is very efficient. For example, in LBNL's infrastructure, limiting HTTP flows to 5MB results in a 480GB capture size, from 6100GB actual traffic.
Why not support it in Stenographer? And... thanks for the open source!
Edit: fixed link to Scott Campbell
The text was updated successfully, but these errors were encountered: