Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[OPEN] Test dumping performances #3126
First GSoC PR here! Good day :)
As documented in README, something has to be implemented yet, such as a way to loop dumps and average result, or to write blobs (and time the process) to SQLite DB.
Mind that this (in particular
Request for comments here!
Congratulations! It is a great PR! I wish my PRs were like this one :D
I looked through all addons and they follow second paragraph of PEP8 imports recommendations:
So, it would be nice to have:
Personally, I like when imports are placed by increasing their length:
But it is absolutely up to you I think.
This looks like an excellent start. Eventually, I would like to mature the serialisation benchmarks in
Ok so, new chapter in the saga
Basically, I have stretched the dumping flow up until disk, so to have a more meaningful benchmarking of the process. What I have implemented in the past two days is mostly focused on the new
This is quite dummy, actually:
As @cortesi was guessing, writing to DB has quite an impact on performances:
ANYWAY, I am quite convinced that performances here can be a bit improved. I suppose that the major hit on perf time is due to me using
So, for me, it's time to study how to cut those numbers in the sshots
Great work. This is all very interesting. A few stream-of-consciousness thoughts here:
In practice, flows will be put on a queue for serialisation. At any one point in time, we will be able to inspect that queue, batch together all the flows that are there currently, and commit them as a single transaction. With asyncio queues, we can efficiently wait on new items in the queue without polling. We should explore this as an optimisation, and adjust the performance tests to be throughput tests - i.e. we add N flows per second to the queue in a constant stream, and see how long they take two write to disk. Ideally, we'd comfortably handle about N=150, which is our performance limit on a current reasonable system. This strategy will also let us handle spikes reasonably.
Hot and cold flows
It occurs to me that we might end up with a system where we have hot flows, that live in-memory, and haven't yet been shed to disk, and cold flows that have already been saved. When the user iterates through flows in one of the tools, the hot store is checked first, and then the cold store. We should keep an eye on whether this is necessary as we develop the state addon.
Separate flow bodies
I think we'll end up storing flow bodies separately from the flows themselves. There are many situations in which we only need to retrieve the flow metadata (i.e. the flow index in mitmproxy console, search that doesn't include content bodies, etc.). We should think about including this in the tests - flow bodies might live in a separate large blob storage table to optimize access patterns, or just in a separate table column.
@cortesi thanks for the notes. They inspired me some really useful thoughts on the task. For throughput testing purposes, do you think I should let the addon generate a constant stream of flows? Or instead, just request a great number of pages and intercept those flows?
So, some thoughts on the current scenario.
First, I realized that all this testing process is not a self-contained one. I am implicitly using these addons as a testing ground to actually build the basic blocks of the next-up serializer.
So, I've extended the test to the SQLite module, adding some asyncio spoons:
Note that SQLite is NOT asynchronous yet, so
I feel I'm only scratching the surface. I'm reading some sweets SQLite docs on performance optimizations.
@mitmproxy/devs any thoughts on the code?
Hi there! I think we should get the streaming performance measurement ready to merge so we can move to the next step.
Measurements on my system match yours roughly. Expressed in flows saved per second (f/s), I see about 2k f/s for a 160k file, and about 50 f/s for a 5mb file. This is within acceptable range for ordinary use, and we should be able to improve that even more as we focus on performance down the track.
A few comments:
I'll take care of this.
It only served the purpose of testing
By the way, I am already working on a more mature system, and this quite experimental testing addon will reflect the evolving changes.