Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up[OPEN] Test dumping performances #3126
Conversation
madt1m
added some commits
May 3, 2018
This comment has been minimized.
This comment has been minimized.
Congratulations! It is a great PR! I wish my PRs were like this one :D
I looked through all addons and they follow second paragraph of PEP8 imports recommendations:
So, it would be nice to have:
Personally, I like when imports are placed by increasing their length:
But it is absolutely up to you I think. |
This comment has been minimized.
This comment has been minimized.
Oh yeah, those |
madt1m
added some commits
May 15, 2018
This comment has been minimized.
This comment has been minimized.
This looks like an excellent start. Eventually, I would like to mature the serialisation benchmarks in
|
This comment has been minimized.
This comment has been minimized.
Working on it. Will push in the next day(s) |
madt1m
added some commits
May 17, 2018
This comment has been minimized.
This comment has been minimized.
Ok so, new chapter in the saga Basically, I have stretched the dumping flow up until disk, so to have a more meaningful benchmarking of the process. What I have implemented in the past two days is mostly focused on the new DummySessionThis is quite dummy, actually:
As @cortesi was guessing, writing to DB has quite an impact on performances: ANYWAY, I am quite convinced that performances here can be a bit improved. I suppose that the major hit on perf time is due to me using
So, for me, it's time to study how to cut those numbers in the sshots @mhils @Kriechi join the pit! I think I need some uber-pythonic advice here! |
madt1m
added some commits
May 19, 2018
kajojify
reviewed
May 19, 2018
""" | ||
if self.f: | ||
for s in self.serializers: | ||
ctx.log('{} module: '.format(s)) |
This comment has been minimized.
This comment has been minimized.
kajojify
May 19, 2018
Contributor
You used f-strings in all other places. Do you want to use f-string here as well? :)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Great work. This is all very interesting. A few stream-of-consciousness thoughts here: Commit unitIn practice, flows will be put on a queue for serialisation. At any one point in time, we will be able to inspect that queue, batch together all the flows that are there currently, and commit them as a single transaction. With asyncio queues, we can efficiently wait on new items in the queue without polling. We should explore this as an optimisation, and adjust the performance tests to be throughput tests - i.e. we add N flows per second to the queue in a constant stream, and see how long they take two write to disk. Ideally, we'd comfortably handle about N=150, which is our performance limit on a current reasonable system. This strategy will also let us handle spikes reasonably. Hot and cold flowsIt occurs to me that we might end up with a system where we have hot flows, that live in-memory, and haven't yet been shed to disk, and cold flows that have already been saved. When the user iterates through flows in one of the tools, the hot store is checked first, and then the cold store. We should keep an eye on whether this is necessary as we develop the state addon. Separate flow bodiesI think we'll end up storing flow bodies separately from the flows themselves. There are many situations in which we only need to retrieve the flow metadata (i.e. the flow index in mitmproxy console, search that doesn't include content bodies, etc.). We should think about including this in the tests - flow bodies might live in a separate large blob storage table to optimize access patterns, or just in a separate table column. |
This comment has been minimized.
This comment has been minimized.
@cortesi thanks for the notes. They inspired me some really useful thoughts on the task. For throughput testing purposes, do you think I should let the addon generate a constant stream of flows? Or instead, just request a great number of pages and intercept those flows? |
madt1m
added some commits
May 28, 2018
This comment has been minimized.
This comment has been minimized.
So, some thoughts on the current scenario. First, I realized that all this testing process is not a self-contained one. I am implicitly using these addons as a testing ground to actually build the basic blocks of the next-up serializer. So, I've extended the test to the SQLite module, adding some asyncio spoons:
Note that SQLite is NOT asynchronous yet, so I feel I'm only scratching the surface. I'm reading some sweets SQLite docs on performance optimizations. Next steps:
@mitmproxy/devs any thoughts on the code? |
This comment has been minimized.
This comment has been minimized.
Hi there! I think we should get the streaming performance measurement ready to merge so we can move to the next step. Measurements on my system match yours roughly. Expressed in flows saved per second (f/s), I see about 2k f/s for a 160k file, and about 50 f/s for a 5mb file. This is within acceptable range for ordinary use, and we should be able to improve that even more as we focus on performance down the track. A few comments:
|
This comment has been minimized.
This comment has been minimized.
I'll take care of this.
It only served the purpose of testing By the way, I am already working on a more mature system, and this quite experimental testing addon will reflect the evolving changes. |
kajojify
added
the
gsoc
label
Jun 8, 2018
This comment has been minimized.
This comment has been minimized.
Closing this - replicated, clean work in #3256. |
madt1m commentedMay 14, 2018
First GSoC PR here! Good day :)
So, this should set the foundations to meaningfully test python protobuf implementation performances. The results are encouraging, as GET 4 MB pdf yields the following:

As documented in README, something has to be implemented yet, such as a way to loop dumps and average result, or to write blobs (and time the process) to SQLite DB.
Mind that this (in particular
dummyhttp_pb2
) requiresprotobuf
package to work. To test the addon, I have just put thegoogle
package produced bypython setup.py build
from https://github.com/google/protobuf/tree/master/pythoninto my serialization directory.
Request for comments here!😄