-
Notifications
You must be signed in to change notification settings - Fork 17
general remarks about Loads architecture / next steps #262
Comments
This sounds like a good architecture to me - as long as the db can keep up! I guess you already have a UUID for each test run so conceptually, it's just having the agents insert individual results under this key. Would you still use zmq to push results into the db process? |
Yes each result is unique so we won't have any conflict. I guess DynamoDB could work there.
I would keep zmq for all the client/broker/agents communication, but would use a pure tcp client to send the data to the DB - see #263 for the new results publication flow |
The reasoning sounds good to me. Especially, I think one key thinking here is the fact we don't really need to store a lot of duplicated data. For instance, we could just store an incremental value of successes, plus the different errors, maybe storing when the first error occurred and when the last one did. In a discussion we had with Tarek, I think I understood the goal was to have the data aggregated by the "test" program itself before sending it to stdout. Now that I think of it, I would do this aggregation in the agent code rather than in the test program, in order to stay with a really simple test program protocol. Otherwise, looks good to me! |
The problem here is that for very intensive load testing you will probably bust the stdin pipe, because the size of the buffer is limited and it gets emptied as fast as the agent dequeues data to send them to the database. Once you've reached the max size everything gets blocked and we're in trouble. If it's well documented I don't think it's that hard imho |
Oh, that's right, it's not ultra complicated, it's just some complexity that I think should be avoided if possible: less to do for the implementers means more implementations! Isn't there any way to tweak this max size of the pipe (can't we use all the free RAM)? |
tldr; if the agent can't keep up the pace we're asking for trouble because we may run day-long tests. Let's imagine a Go program that sends several thousands of results per seconds for 24hours. Even if we use the RAM, if the python agent can't keep up the pace, the queue will grow and eventually eat all the RAM. We will also be unable to provide a almost-live feedback on the test and every report will start to lag like hell. And the other problem is that we will end up using the CPU for the agent queue work instead of letting as much CPU as possible for the load program to use. Asking the program to aggregate per second is "free". |
Gotcha. That works for me. |
brain dump -- would love some feedback @ametaireau @Natim @jbonacci @rfk @bbangert
So, after a few months of work - I realize it's a lot of work to maintain a consistent cluster where we have agents spread on several boxes and a broker linked to them.
The main issue is that once some load tests are running, all the results are sent in real time to the broker via a chain of zeromq publisher sockets.
That leads to 2 problems:
I think a much more robust system would be to drop the PUB/SUB system for results, and use a shared database. We'd let the database system deal with all the network partionning issue and the broker would simply drive the agents to run the tests.
In case the broker can't reach an agent, well, the agent is on its own - working on the test and reporting back to the DB. Our web dashboard can then just do db queries to display results - like it does now but not by asking the broker anymore (right now the broker provides APIs to query/fill the DB and to run tests)
That would separate the concerns:
I am not sure what database system we want yet. Step 1 could be to extract everything related to the DB from the broker, and have it under its own process - then change the agents so they interact with that one when some results are to be published.
The text was updated successfully, but these errors were encountered: