Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node memory usage is not fully controlled #1136

Closed
aidan-kwon opened this issue Jan 28, 2022 · 1 comment
Closed

Node memory usage is not fully controlled #1136

aidan-kwon opened this issue Jan 28, 2022 · 1 comment
Assignees
Labels
issue/bug Issues with the code-level bugs.
Projects

Comments

@aidan-kwon
Copy link
Member

aidan-kwon commented Jan 28, 2022

Klaytn node should be operated stably with minimum H/W requirements.
However, some EN operators experienced OOM when a lot of transactions and API calls exist on the Cypress network.

I reported the issues from two different usage

  1. EN that receives too many API calls
    The followings are memory profiling results when the node keet increasing memory usage.
    I found out that httpServer allocated a lot of memory for the bytebuffer of json encoder

image (21)
image (20)

  1. EN that doesn't receive too many API calls
    There is no detailed profiling result of information, but some EN operators said they EN has crashed when there are a lot of transactions in blocks for a long time.
@aidan-kwon aidan-kwon self-assigned this Jan 28, 2022
@aidan-kwon aidan-kwon added the issue/bug Issues with the code-level bugs. label Jan 28, 2022
@yoomee1313 yoomee1313 added this to To do in Misc Aug 22, 2022
@yoomee1313
Copy link
Contributor

yoomee1313 commented Nov 8, 2022

  1. EN that receives too many API calls

Assuming most of the api call is not a debug api which consumes a lot of memory, this problem is solved by the following PR #1650.

  1. EN that doesn't receive too many API calls

I have measured how much memory the EN uses. Most of the time, it usually shows the next memory usage, but sometimes it spikes enough to cause OOM. If situation getting worse, the machine hangs.

  • m5.2xlarge (32GB) -> 20GB~22GB
  • i3.4xlarge(122GB) -> about 70GB, sometimes it spikes to 90GB

Recommend the following actions to avoid the machine hang and restart as soon as possible if there's oom.

  • Recommend at least 60GB memory when syncing, and 30GB memory when not syncing.
  • Set memory limitation such as cgroup to limit the ken process memory usage.
  • Set restart option of kend.conf
  • Set --state.trie-cache-limit in the ADDITIONAL field of kend.conf appropriate for your service. This flag determines the memory size consumed by storing fastcache. The default value is calculated as below.
    • if physicalMemorySize < 10GB: i recommend use the machine having bigger memory size.
    • if 10GB <= physicalMemorySize < 20GB: it is set as 1GB.
    • if 20GB <= physicalMemorySize <=100GB: it is set as 0.3*physicalMemorySize
    • if physicalMemorySize > 100GB: it is set as 0.35*physicalMemorySize

I'll close this issue, because there's many duplicated issues and i want to seperate 1. and 2. as seperate issue.

Misc automation moved this from To do to Done Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue/bug Issues with the code-level bugs.
Projects
Misc
Done
Development

No branches or pull requests

2 participants