Loadgen not working as expected #2199

MonsieurNicolas · 2019-07-16T17:42:55Z

The built-in load generator used for testing has a few problems.

There is a design flow in that after txs get submitted it moves to a completely different phase waitTillComplete, assuming that transactions were submitted succesfuly and just need to be processed by the network:

a big problem there is that if nothing was actually submitted (which can happen, see below), that phase will succeed
if some txs are dropped, there is no way it can recover and an observer of loadgen has no way of quantifying the amount of dropped transactions

A few more details:

in general, it doesn't handle when transactions get dropped by the validator's queue for whatever reason. The logic there that "just retries" is too simplistic to recover.
it doesn't handle ADD_STATUS_TRY_AGAIN_LATER (banned transactions). I suspect this should be handled the same way that we deal with 1 anyways

Here are my recommendations:

load generator should only succeed when work is complete. An observer (such as test automation), only needs to wait for loadgen.run.complete to be set without having to know how to check for completion. We leave the option of "timing out" to the observer (so "stopping" loadgen" should work in this situation)
We already have logic that uses the list of all accounts used in the simulation to sign and generate proper sequence numbers, we can expand on this to a process that can guarantee progress instead.

Potential updated way of generating load:

start simulation: setup world
a. synchronize all source accounts used during the entire simulation run (mAccounts)
b. compute the list of pairs expected <simulation_account_ID=uint64, sequence_number> of expected sequence numbers for each source account at the end of the simulation. For "create", it's the single account, something like (rounding) current_seq_num+nbAccount/batchSize, for payments we can assume round robin over the set of mAccounts so something like current_seq_num+nTx/nAccounts
c. initialize the lists of accounts done and backlog to empty
load generation step (inject txPerStep transactions), loop is something like:
a. generate backlog if needed (ie, backlog is empty)
i. iterate over expected, remove accounts that already have the right sequence number; otherwise add to backlog
ii. if expected is empty, simulation is "done"
ii. shuffle backlog
b. pick an account from backlog, generate and submit one transaction for that account
i. the logic that we need here is something like we have right now: duplicate -> skip (ie, pick the next account from backlog) on error, synchronize account
ii. NB: generate has to be fully deterministic based on source account and sequence number (this is already the case)
iii. we have to break if no progress was made even after rebuilding the backlog (can happen with create if the account gets banned)

This may generate a few extra transactions at the end, but this should not really matter (as they would be rejected by the validator with duplicate). We get "retries" for free between steps.

The text was updated successfully, but these errors were encountered:

marta-lokhova · 2019-08-07T21:09:40Z

@MonsieurNicolas is it valid to expect that transactions are banned only when the system is overloaded? (i.e., too many txs are submitted and stellar-core can't keep up; note that I don't mean banned/dropped due to being invalid, since that just points to a bug in loadgen) If so, would it be simpler to just mark loadgen as "failed" and let the user decide what they want to do, instead of trying to recover?
In case of acceptance tests, we should not expect it to fail (though it should retry anyway), since we're generating a tiny amount of load at a very slow txrate. In case of benchmarking, I wouldn't want loadgen to recover, but rather fail, so we can reason about a point in time when the system became overloaded under the stress test without loadgen intervening.

MonsieurNicolas · 2019-08-07T23:13:41Z

So fast fail the loadgen if the system ends up banning transactions? I think this could work in this context

MonsieurNicolas added the bug label Jul 16, 2019

MonsieurNicolas added this to To do in v11.4.0 via automation Jul 16, 2019

marta-lokhova self-assigned this Aug 7, 2019

marta-lokhova mentioned this issue Aug 11, 2019

Fail loadgen when load is too high, remove Simulation dependency #2220

Merged

MonsieurNicolas moved this from To do to In progress in v11.4.0 Aug 13, 2019

latobarita closed this as completed in #2220 Aug 14, 2019

v11.4.0 automation moved this from In progress to Done Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadgen not working as expected #2199

Loadgen not working as expected #2199

MonsieurNicolas commented Jul 16, 2019

marta-lokhova commented Aug 7, 2019

MonsieurNicolas commented Aug 7, 2019

Loadgen not working as expected #2199

Loadgen not working as expected #2199

Comments

MonsieurNicolas commented Jul 16, 2019

marta-lokhova commented Aug 7, 2019

MonsieurNicolas commented Aug 7, 2019