You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The built-in load generator used for testing has a few problems.
There is a design flow in that after txs get submitted it moves to a completely different phase waitTillComplete, assuming that transactions were submitted succesfuly and just need to be processed by the network:
a big problem there is that if nothing was actually submitted (which can happen, see below), that phase will succeed
if some txs are dropped, there is no way it can recover and an observer of loadgen has no way of quantifying the amount of dropped transactions
A few more details:
in general, it doesn't handle when transactions get dropped by the validator's queue for whatever reason. The logic there that "just retries" is too simplistic to recover.
it doesn't handle ADD_STATUS_TRY_AGAIN_LATER (banned transactions). I suspect this should be handled the same way that we deal with 1 anyways
Here are my recommendations:
load generator should only succeed when work is complete. An observer (such as test automation), only needs to wait for loadgen.run.complete to be set without having to know how to check for completion. We leave the option of "timing out" to the observer (so "stopping" loadgen" should work in this situation)
We already have logic that uses the list of all accounts used in the simulation to sign and generate proper sequence numbers, we can expand on this to a process that can guarantee progress instead.
Potential updated way of generating load:
start simulation: setup world
a. synchronize all source accounts used during the entire simulation run (mAccounts)
b. compute the list of pairs expected<simulation_account_ID=uint64, sequence_number> of expected sequence numbers for each source account at the end of the simulation. For "create", it's the single account, something like (rounding) current_seq_num+nbAccount/batchSize, for payments we can assume round robin over the set of mAccounts so something like current_seq_num+nTx/nAccounts
c. initialize the lists of accounts done and backlog to empty
load generation step (inject txPerStep transactions), loop is something like:
a. generate backlog if needed (ie, backlog is empty)
i. iterate over expected, remove accounts that already have the right sequence number; otherwise add to backlog
ii. if expected is empty, simulation is "done"
ii. shuffle backlog
b. pick an account from backlog, generate and submit one transaction for that account
i. the logic that we need here is something like we have right now: duplicate -> skip (ie, pick the next account from backlog) on error, synchronize account
ii. NB: generate has to be fully deterministic based on source account and sequence number (this is already the case)
iii. we have to break if no progress was made even after rebuilding the backlog (can happen with create if the account gets banned)
This may generate a few extra transactions at the end, but this should not really matter (as they would be rejected by the validator with duplicate). We get "retries" for free between steps.
The text was updated successfully, but these errors were encountered:
@MonsieurNicolas is it valid to expect that transactions are banned only when the system is overloaded? (i.e., too many txs are submitted and stellar-core can't keep up; note that I don't mean banned/dropped due to being invalid, since that just points to a bug in loadgen) If so, would it be simpler to just mark loadgen as "failed" and let the user decide what they want to do, instead of trying to recover?
In case of acceptance tests, we should not expect it to fail (though it should retry anyway), since we're generating a tiny amount of load at a very slow txrate. In case of benchmarking, I wouldn't want loadgen to recover, but rather fail, so we can reason about a point in time when the system became overloaded under the stress test without loadgen intervening.
The built-in load generator used for testing has a few problems.
There is a design flow in that after txs get submitted it moves to a completely different phase
waitTillComplete
, assuming that transactions were submitted succesfuly and just need to be processed by the network:A few more details:
ADD_STATUS_TRY_AGAIN_LATER
(banned transactions). I suspect this should be handled the same way that we deal with 1 anywaysHere are my recommendations:
loadgen.run.complete
to be set without having to know how to check for completion. We leave the option of "timing out" to the observer (so "stopping" loadgen" should work in this situation)Potential updated way of generating load:
a. synchronize all source accounts used during the entire simulation run (
mAccounts
)b. compute the list of pairs
expected
<simulation_account_ID=uint64, sequence_number>
of expected sequence numbers for each source account at the end of the simulation. For "create", it's the single account, something like (rounding)current_seq_num+nbAccount/batchSize
, for payments we can assume round robin over the set ofmAccounts
so something likecurrent_seq_num+nTx/nAccounts
c. initialize the lists of accounts
done
andbacklog
toempty
txPerStep
transactions), loop is something like:a. generate
backlog
if needed (ie,backlog
is empty)i. iterate over
expected
, remove accounts that already have the right sequence number; otherwise add tobacklog
ii. if
expected
is empty, simulation is "done"ii. shuffle
backlog
b. pick an account from
backlog
, generate and submit one transaction for that accounti. the logic that we need here is something like we have right now: duplicate -> skip (ie, pick the next account from
backlog
) on error, synchronize accountii. NB:
generate
has to be fully deterministic based on source account and sequence number (this is already the case)iii. we have to break if no progress was made even after rebuilding the backlog (can happen with
create
if the account gets banned)This may generate a few extra transactions at the end, but this should not really matter (as they would be rejected by the validator with
duplicate
). We get "retries" for free between steps.The text was updated successfully, but these errors were encountered: