Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate how Iroha operates under load under Ubuntu/Debian #4830

Closed
dima74 opened this issue Jul 10, 2024 · 5 comments · Fixed by #4893
Closed

Investigate how Iroha operates under load under Ubuntu/Debian #4830

dima74 opened this issue Jul 10, 2024 · 5 comments · Fixed by #4893
Assignees

Comments

@dima74
Copy link
Contributor

dima74 commented Jul 10, 2024

It was reported that:

  • Iroha container under Ubuntu restarts after ~10 minutes
  • Iroha container under Debian restarts after ~7 hours

Need to investigate it


Notes about kura.lock:

  • kubernetes perioadically checks if iroha is alive using /health route
  • if three checks in a row fails, it restarts the container
  • restart is performed non-gracefully, thus iroha doesn't free kura.lock
  • after restart iroha can't take kura.lock thus can't start
@dima74 dima74 self-assigned this Jul 10, 2024
@dima74
Copy link
Contributor Author

dima74 commented Jul 11, 2024

kura.lock problem should be fixed by https://github.com/soramitsu/iroha-deploy/pull/166

Update: it fixes only problem with genesis peer restart, but still have problem that if iroha is shutdown non-gracefully (e.g. because of OOM), then after restart iroha fails to start

@dima74 dima74 mentioned this issue Jul 16, 2024
5 tasks
@dima74
Copy link
Contributor Author

dima74 commented Jul 22, 2024

Reproduced the problem, here are some results

  • Preconditions:
    • Testing only FindAllAccounts query
    • Using genesis with 2500 accounts
    • Using iroha version 50e8104
    • Using iroha2-perf load generator version 44b2499839ee6a6851dd9f39bb5bfa8d02b4fa6c
    • Testing is performed locally, results in the docker/k8s should be similar
  • Observed problem that iroha consumes a lot of memory even under small load:
    • 10 queries per second ~2.6GB
    • 20 queries per second ~6.6GB
    • 30 queries per second ~11.3GB
  • Background: how queries are handled
    • Consider iroha received query FindAllAccounts. It executes it and finds all 2500 accounts. Then it splits response to batches (default batch size is 10), returns first batch to the client, and saves remaining batches to LiveQueryStore. When client requests second batch, it is removed from LiveQueryStore. Normally client requests all batches, so after it query data will be removed from LiveQueryStore. However if client doesn't request batches, LiveQueryStore has special pruning mechanism - every 30 seconds it removes old queries (inactive more then 30 seconds).
  • Background: how load generator works
  • So iroha uses a lot of memory in such scenario because it stores all queries results for last 30 seconds in LiveQueryStore
  • So:
    • I think load generator should be modified to request the whole data, not only the first batch (either retrieve all batches in multiple requests, or increase batch size)
    • Iroha can be easily DDOSed by requesting only first batch of heavy query. This might be a security problem

@mversic
Copy link
Contributor

mversic commented Jul 22, 2024

Iroha can be easily DDOSed by requesting only first batch of heavy query. This might be a security problem

this is not acceptable. We should at least have some limit to the number of live queries to prevent OOM

@mversic
Copy link
Contributor

mversic commented Jul 22, 2024

Queue already has some form of DDOS implemented. There is global limit and per user limit on the number of transactions that can be in the queue. We should at least do something like that

@DCNick3
Copy link
Contributor

DCNick3 commented Jul 22, 2024

Pruning can also lead to DDoS, because non-malicious actors wouldn't be able to complete their queries as the store is being constantly thrashed. Maybe use the fact that queries are signed and add a per-AccountId limit?

UPD: I see that you are proposing a per-used limit too. Sorry, didn't notice that; a global + a per-used limit would be good

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants