Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design & implement stress test tool for aggregator - phase 2 #1155

Closed
7 of 10 tasks
Tracked by #904
jpraynaud opened this issue Aug 10, 2023 · 0 comments · Fixed by #1194
Closed
7 of 10 tasks
Tracked by #904

Design & implement stress test tool for aggregator - phase 2 #1155

jpraynaud opened this issue Aug 10, 2023 · 0 comments · Fixed by #1194
Assignees
Labels
D-medium Difficulty: medium prototype 🛠️ Prototype/PoC of a feature task

Comments

@jpraynaud
Copy link
Member

jpraynaud commented Aug 10, 2023

Issue

We want to design a simple stress test tool for the aggregator and implement it in its first version.
The idea is to be able to verify that under a load of ~100 simulated signers, the aggregator:

  • Handles all Signer Key Registrations without error in a reasonable time (<30s per request)
  • Handles all Signer Signatures Registrations without error in a reasonable time (<30s per request)
  • Creates the associated certificates and artifacts for each signed entity type (<300s to retrieve the information)
  • All of these expectations are under a load of ~10-100 simulated clients requesting the certificates and artifacts routes, retrieving responses in a reasonable time (<10s per request)

Note: The simulated signers and clients should be represented by the HTTP calls they do on the aggregator REST API

Also, we want to understand and keep track of the limitations of this design which is expected to run on a single computer, and that will be enhanced in future issues. We will probably add retry strategies, ...

Nominal scenario

  • Phase 1: Run Without Clients
  1. Create the Signer Fixtures
  2. Launch aggregator
  3. Wait for the aggregator REST API to be up (/epoch-settings route is answering 200)
  4. Create and send the Signer Key Registrations payloads
  5. Move the aggregator 1 epoch forward and wait for the aggregator to be at new epoch (/epoch-settings route)
  6. Create and send the Signer Key Registrations payloads
  7. Compute genesis certificate (no need to stop the aggregator)
  8. Move the aggregator 1 epoch forward and wait for the aggregator to be at new epoch (/epoch-settings route)
  9. Create and send the Signer Key Registrations payloads
  10. Wait for pending certificate to be available for Mithril Stake Distribution (/certificate-pending route is answering 200)
  11. Create the Signer Signatures Registrations payloads
  12. Send the Signer Signatures Registrations payloads
  13. Wait for the certificate to be available for Mithril Stake Distribution (/certificates route is answering 200)
  14. Wait for the artifact to be available for Mithril Stake Distribution (/artifact/mithril-stake-distributions route is answering 200)
  15. Wait for pending certificate to be available for Cardano Immutable Files (/certificate-pending route is answering 200)
  16. Create the Signer Signatures Registrations payloads
  17. Send the Signer Signatures Registrations payloads
  18. Wait for the certificate to be available for Cardano Immutable Files (/certificates route is answering 200)
  19. Wait for the artifact to be available for Cardano Immutable Files (/artifact/snapshots route is answering 200)
  • Phase 2: Run With Clients
  • 1. Create the Clients payloads
  • 2. Send in a separate thread the Clients payloads
  • 3. Move the aggregator 1 epoch forward
  • 4. Create and send the Signer Key Registrations payloads
  • 5. Start same as Phase 1 step 4

Other tasks

  • Add concurrency level for requests?
  • Compute statistics for HTTP Client requests (min/max, avg, std, percentiles, ...) (e.g. with rewrk crate)
  • Compute summary statistics for the whole test
  • Measure performance of aggregator process and compute statistics
  • Update the README with command to run the stress test

Definition of Success/Failure

  • The test fails if:
    • Any step fails with a timeout
    • The aggregator crashes
  • The test succeeds otherwise

We will display statistics for each eligible step (error rates, error distributions, min/max, ...) and for clients requests (To be defined)

Design

Here is the design that we have created:
Image

Parent issue

#904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-medium Difficulty: medium prototype 🛠️ Prototype/PoC of a feature task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants