Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client traffic creates performance bottleneck in aggregator #1207

Closed
3 tasks done
jpraynaud opened this issue Sep 6, 2023 · 1 comment · Fixed by #1251
Closed
3 tasks done

Client traffic creates performance bottleneck in aggregator #1207

jpraynaud opened this issue Sep 6, 2023 · 1 comment · Fixed by #1251
Assignees
Labels
performances 🥇 Performances

Comments

@jpraynaud
Copy link
Member

jpraynaud commented Sep 6, 2023

Issue

During our stress test benchmarks of the aggregator, we have noticed that a bottleneck probably exists with the client traffic (sent during phase 2) which leads to unexplained 404 errors when signers send signatures.

Here are the result from test ran with 0, 10 and 50 clients:

$ cargo run --bin load-aggregator -- -vvv --cardano-cli-path mithril-test-lab/mithril-end-to-end/script/mock-cardano-cli --num-signers=100 --num-clients=0
number_of_signers       100
number_of_clients       0
phase   duration/ms
stress bootstrap        22895
signers registration    5786
signatures registration 27150
signatures registration 28642
signers registration    6286
signatures registration 28635
signatures registration 29608
$ cargo run --bin load-aggregator -- -vvv --cardano-cli-path mithril-test-lab/mithril-end-to-end/script/mock-cardano-cli --num-signers=100 --num-clients=10
number_of_signers       100
number_of_clients       10
phase   duration/ms
stress bootstrap        24004
signers registration    5922
signatures registration 29640
signatures registration 28756
signers registration    7195
signatures registration 29504
signatures registration 30565
$ cargo run --bin load-aggregator -- -vvv --cardano-cli-path mithril-test-lab/mithril-end-to-end/script/mock-cardano-cli --num-signers=100 --num-clients=50
...
Sep 06 09:23:42.368 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool10puufv84j8akal2u9g3cydklua3zfxxcplh5mfzqan70j3wy6zs, expected HTTP code 201 got 404 with the message: .)
Sep 06 09:23:42.368 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool16akcun2st7hl5fk62m0scas22mvv2t6swwtmrq28ntpezzw7s4z, expected HTTP code 201 got 404 with the message: .)
Sep 06 09:23:42.369 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool1g30svdy0nqlcg74m6d4prs03rnlgq4umcmj7pklhfl96x5t5upw, expected HTTP code 201 got 404 with the message: .)
Sep 06 09:23:42.369 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool1qdxnzhg6tjklcda4kex0h4mjxresxym45867h7vznyekxtanpmz, expected HTTP code 201 got 404 with the message: .)
Sep 06 09:23:42.369 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool1ju40rw9jw0mdyf2cf3f8gfaqy62maxnzxacm62z65tzys23v8dj, expected HTTP code 201 got 404 with the message: .)
Sep 06 09:23:42.369 WARN Signer Signature Registration error caught: Err(Registering signatures for party_id=pool1pngs6k4vunlxlhfhg0gryr645448xrtu390efl6uqjheq5wfswu, expected HTTP code 201 got 404 with the message: .)
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `99`', mithril-test-lab/mithril-end-to-end/src/bin/load-aggregator/main.rs:236:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To do

  • Explain the bottleneck origin
  • Fix the aggregator if needed
  • Fix the stress tester if needed
@jpraynaud jpraynaud added dev 💪 optimization 🛠️ Optimization and/or small enhancements to-groom 🤔 Needs grooming performances 🥇 Performances labels Sep 6, 2023
@jpraynaud jpraynaud self-assigned this Sep 8, 2023
@jpraynaud jpraynaud removed dev 💪 optimization 🛠️ Optimization and/or small enhancements to-groom 🤔 Needs grooming labels Sep 21, 2023
@jpraynaud
Copy link
Member Author

Here is a summary of the investigations done on this bottleneck, and of the fixes applied:

  • The aggregator binary tested must be the release and not the debug. This was responsible for failures with low client traffic.

    The default binary directory has been updated to target/release to avoid working on the incorrect binary without being ware of it.

  • The 404 error received was due to temporal effects: the aggregator is still serving a previous pending certificate, but the stress test uses it to trigger the next steps of the test: the result is that the signatures sent are targeting the wrong open message.

    This has been fixed by checking the signed entity type when waiting for a pending certificate.

  • With many clients sending traffic, the time needed to generate new certificates, artifacts is increased which was causing timeouts.

    The timeouts of the main scenario have been increased in order to avoid this behavior.

  • The HTTP client used to create the traffic on the client size has been cached in order to optimize the resources used by the load tester tool and send the traffic with a higher throughput.
  • The fake client now sends statistics traffic when a download succeeds.

Overall, the stress test now succeeds on the same machine used to identify the bottleneck:

  • with 200 signers and 200 clients
  • with 2 signers and 5000 clients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performances 🥇 Performances
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant