Node memory usage is not fully controlled #1136

aidan-kwon · 2022-01-28T06:50:48Z

Klaytn node should be operated stably with minimum H/W requirements.
However, some EN operators experienced OOM when a lot of transactions and API calls exist on the Cypress network.

I reported the issues from two different usage

EN that receives too many API calls
The followings are memory profiling results when the node keet increasing memory usage.
I found out that httpServer allocated a lot of memory for the bytebuffer of json encoder

EN that doesn't receive too many API calls
There is no detailed profiling result of information, but some EN operators said they EN has crashed when there are a lot of transactions in blocks for a long time.

yoomee1313 · 2022-11-08T06:02:18Z

EN that receives too many API calls

Assuming most of the api call is not a debug api which consumes a lot of memory, this problem is solved by the following PR #1650.

EN that doesn't receive too many API calls

I have measured how much memory the EN uses. Most of the time, it usually shows the next memory usage, but sometimes it spikes enough to cause OOM. If situation getting worse, the machine hangs.

m5.2xlarge (32GB) -> 20GB~22GB
i3.4xlarge(122GB) -> about 70GB, sometimes it spikes to 90GB

Recommend the following actions to avoid the machine hang and restart as soon as possible if there's oom.

Recommend at least 60GB memory when syncing, and 30GB memory when not syncing.
Set memory limitation such as cgroup to limit the ken process memory usage.
Set restart option of kend.conf
Set --state.trie-cache-limit in the ADDITIONAL field of kend.conf appropriate for your service. This flag determines the memory size consumed by storing fastcache. The default value is calculated as below.
- if physicalMemorySize < 10GB: i recommend use the machine having bigger memory size.
- if 10GB <= physicalMemorySize < 20GB: it is set as 1GB.
- if 20GB <= physicalMemorySize <=100GB: it is set as 0.3*physicalMemorySize
- if physicalMemorySize > 100GB: it is set as 0.35*physicalMemorySize

I'll close this issue, because there's many duplicated issues and i want to seperate 1. and 2. as seperate issue.

aidan-kwon self-assigned this Jan 28, 2022

aidan-kwon added the issue/bug Issues with the code-level bugs. label Jan 28, 2022

yoomee1313 added this to To do in Misc Aug 22, 2022

yoomee1313 closed this as completed Nov 8, 2022

Misc automation moved this from To do to Done Nov 8, 2022

This was referenced Nov 8, 2022

Issues for Milestone for v1.10.0 (Kore) #1662

Closed

QA for v1.10.0 (Kore) #1676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node memory usage is not fully controlled #1136

Node memory usage is not fully controlled #1136

aidan-kwon commented Jan 28, 2022 •

edited

yoomee1313 commented Nov 8, 2022 •

edited

Node memory usage is not fully controlled #1136

Node memory usage is not fully controlled #1136

Comments

aidan-kwon commented Jan 28, 2022 • edited

yoomee1313 commented Nov 8, 2022 • edited

aidan-kwon commented Jan 28, 2022 •

edited

yoomee1313 commented Nov 8, 2022 •

edited