-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM and Utilization Issues when using Prysm v5 #14020
Comments
gm, what flags are you using the run Prysm? |
Actually, I see your flags. Thanks Try turning off subscribe all subnets. That uses a huge amount of memory and is rarely necessary |
Also lower your max peers to something sensible like 100. Both of those flags will require more and more memory. We are still investigating the OOMs you have linked, but we know that |
GM, wow, thanks for the quick response. Yeah my flags are included in the config files/parameters above: prysm.yaml and validator.yaml Actually, I've already adjusted the max peers to Is there any reason why this is not an issue in 4.2.1, tho? If it takes tremendously more memory? |
v4 doesn't have subnets for blobs |
Is this subnet required for Deneb validators? |
Yes. Blobs are required in deneb |
I just wanted to say this is an awesome issue report @fhildeb |
Used a max peer count of The issue remains (used The LUKSO network does not have blobs yet- as it's only up to Shanghai-Capella (as stated in the report), so the configuration should not cause these increases compared to |
Hey @prestonvanloon 👋 Is there any new update about this issue? Does https://github.com/prysmaticlabs/prysm/releases/tag/v5.0.4 solve this problem? |
We are still experiencing the issue on v5.0.4 |
We are still experiencing the issue on v5.1.0 |
what flags are you running this with @git-ljm and which network is this with ? |
Describe the bug
I'm running a Prysm validator on LUKSO (Layer 1 EVM up to date with Shanghai-Capella).
Related to the upcoming Cancun-Deneb fork, other homestakers and I upgraded to Prysm v5.0.3.
Since upgrading, I have:
After reaching the maximum memory, the CPU spikes up (to 75%) until Prysm crashes. Until the OOM Error from the OS, there are no visible warnings or errors in the logs. I'm using 32GB of RAM- so the memory of the Prysm client is crashing after 48-55 hours. Other LUKSO community members running Prysm validators reported similar errors after upgrading- the client crashes just around a day for those with only 16GB of RAM.
Every time it crashed, I reverted back to one version to trim down where the root cause was introduced. So far, I've got the same OOM issue for v5.0.3, v5.0.2, v5.0.1, and 5.0.0- coming to the conclusion that it got introduced with v5. When downgrading to 4.2.1, everything returns to normal, and the physical memory of the validator and consensus client combined does not grow beyond 5GB
As Prysm crashed, I've always started a clean setup, removing all previous blockchain data gathered during the previous try. I've used checkpoint sync to quickly get back online. Therefore, it might be that this memory issue exists while the EL client is syncing. However, I did not investigate too much, and this is plainly speculative.
I've also seen other issues being opened about OOM lately:
As well as a draft PR about a potential memory bugfix:
Would love to know:
Monitoring V5.0.2
Returning back to V4.2.1 after it crashed
Has this worked before in a previous version?
🔬 Minimal Reproduction
To simplify starting clients, I've used the LUKSO CLI Tool to create a JWT & load the network configuration. However, it just starts up the EL/CL clients and should not be related.
Error
Platform(s)
Linux (x86)
What version of Prysm are you running? (Which release)
v5.0.0 and above
Anything else relevant (validator index / public key)?
Used OS/Hardware:
The text was updated successfully, but these errors were encountered: