Non-Linear response for sequential iteration of large capture #48

chrconlo · 2014-12-11T21:17:46Z

Thanks for the great tool!

I'm attempting to use pyshark to iterate large capture files (180K packets per file) and see non-linear response from iteration. Is there a way to tune pyshark to behave linearly in such circumstances? At around 3K packets is starts getting really slow.

        cap = pyshark.FileCapture(cap_file, keep_packets=False)
        cap.display_filter = ('stp')
        cap.apply_on_packets(getStpData)

bpduTracker chrconlo$ ./bpduTracker.py
Starting bpduTracker...
Capture File Found: /users/xyz/Desktop/20141202/prog/test.cap

2014-12-11 16:05:00.030286 -> Packet Count: 50
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:00.402444 -> Packet Count: 100
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:00.996808 -> Packet Count: 150
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:01.793828 -> Packet Count: 200
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:02.814992 -> Packet Count: 250
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:04.133944 -> Packet Count: 300
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:05.571902 -> Packet Count: 350
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:07.212769 -> Packet Count: 400
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:09.178611 -> Packet Count: 450
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:11.315323 -> Packet Count: 500
BPDU Accounting Dictionary Size: 0

2014-12-11 16:05:13.638662 -> Packet Count: 550
BPDU Accounting Dictionary Size: 0

The text was updated successfully, but these errors were encountered:

chrconlo · 2014-12-15T23:33:42Z

Not sure if you got a chance to look at this but a look through the code and some debugs it looks like the slowdown is in asyncio; more specifically the stream reader below. Are you seeing the same? Thanks

new_data = yield From(stream.read(self.DEFAULT_BATCH_SIZE))

chrconlo · 2014-12-17T21:08:44Z

I've backed down to using older rev of pyShark (0.2.6), prior to asyncio, and see vastly better performance although some other things seem broken like frame_info, display filter on capture and getting certain field matches to work. I was able to process 420K packets in under 8 minutes. Thanks

-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries

KimiNewt · 2014-12-18T06:10:37Z

Yes, it seems asyncio severely reduced performance (on unix anyway, I'm
oddly seeing good performance on windows).
I'll try optimizing it over the weekend hopefully, and if push comes to
shove ill remove asyncio. Conceptually it should not have lower
performance, but we'll see.

On Wednesday, December 17, 2014, chrconlo notifications@github.com wrote:

I've backed down to using older rev of pyShark (0.2.6), prior to asyncio,
and see vastly better performance although some other things seem broken
like frame_info, display filter on capture and getting certain field
matches to work. I was able to process 420K packets in under 8 minutes.
Thanks

-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries

—
Reply to this email directly or view it on GitHub
#48 (comment).

KimiNewt · 2014-12-20T10:47:29Z

I've (probably) isolated the problem to: https://github.com/KimiNewt/pyshark/blob/master/src/pyshark/capture/capture.py#L147
What seems to be happening is that a large amount of data (tshark XML) is in the subprocess stdout pipe. We read that one packet at a time, and the XML grows faster and larger as time goes on.
That line copies over what might be a very large string. That took ~40ms on a large cap file I tried.

The solution is probably to extract ALL the packets at once from the data received instead of one at-a-time (we can't use lxml for this as it does not support parsing partial XMLs). I'll try finding a solution for that.

chrconlo · 2015-01-08T20:56:50Z

Interesting. Any luck finding a solution for this? Thanks

KimiNewt · 2015-05-09T09:03:38Z

Fixed by PR #66

KimiNewt closed this as completed May 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Linear response for sequential iteration of large capture #48

Non-Linear response for sequential iteration of large capture #48

chrconlo commented Dec 11, 2014

chrconlo commented Dec 15, 2014

chrconlo commented Dec 17, 2014

KimiNewt commented Dec 18, 2014

KimiNewt commented Dec 20, 2014

chrconlo commented Jan 8, 2015

KimiNewt commented May 9, 2015

Non-Linear response for sequential iteration of large capture #48

Non-Linear response for sequential iteration of large capture #48

Comments

chrconlo commented Dec 11, 2014

chrconlo commented Dec 15, 2014

chrconlo commented Dec 17, 2014

KimiNewt commented Dec 18, 2014

KimiNewt commented Dec 20, 2014

chrconlo commented Jan 8, 2015

KimiNewt commented May 9, 2015