Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claimtire build taking too much ram and causing a crash #104

Open
niglomeister opened this issue Jul 7, 2023 · 7 comments
Open

Claimtire build taking too much ram and causing a crash #104

niglomeister opened this issue Jul 7, 2023 · 7 comments

Comments

@niglomeister
Copy link

niglomeister commented Jul 7, 2023

I'm trying to run a full lbcd node and synced up to height ~1 000 000 but now when i'm trying to run lbcd again i can't get past
height ~700 000 in the initial rebuilding of the claimtrie with it eating all my 16GB of ram + 16GB of swap and crashing.

The ram usage goes up very rapidly from around height 600 000.
Screenshot from 2023-07-07 22-26-05

Screenshot from 2023-07-07 22-26-48

Considering the blockchain is 1 400 000 blocks high at this moment how much ram is needed to run a full lbcd node ?
The readme says 8GB are needed but its already taking 32GB at height ~700 000 and i'm guessing the remaining blocks contain way more claims than the first.

I know the readme says that ram usage may increase over time but at this point if this is the expected usage i think the baseline should be updated to reflect more accurately the current state of things.

@roylee17
Copy link
Collaborator

roylee17 commented Jul 8, 2023

It took me ~50 mins to sync from 0 - 739,000 blocks with 1.4GB memory.
I remember even 1.3 million blocks back in January, the operational memory required was approximately 7GB.
Chances are your database might be corrupted, and the sync went rogue.
Remove the ~/.lbcd and re-sync and see if that changes.

2023-07-08 16:42:08.963 [INF] SYNC: Processed 762 blocks in the last 10.19s (27678 transactions, height 737332, 2020-03-25 02:06:34 -0700 PDT)
2023-07-08 16:42:18.964 [INF] SYNC: Processed 530 blocks in the last 10s (18964 transactions, height 737862, 2020-03-26 01:41:24 -0700 PDT)
2023-07-08 16:42:22.527 [INF] MAIN: RAM: using 1.4 GB with 7.2 available, DISK: using 13.6 GB with 740.9 available
2023-07-08 16:42:28.979 [INF] SYNC: Processed 668 blocks in the last 10.01s (22528 transactions, height 738530, 2020-03-27 07:26:50 -0700 PDT)
2023-07-08 16:42:38.992 [INF] SYNC: Processed 709 blocks in the last 10.01s (24291 transactions, height 739239, 2020-03-28 15:15:09 -0700 PDT)

@moodyjon
Copy link
Collaborator

moodyjon commented Jul 9, 2023

You might also try to tweak some environment variables GOGC and GOMEMLIMIT to get behavior that is friendlier to your 16GB memory + 16GB swap environment. The default settings are GOGC=100 and GOMEMLIMIT=math.MaxInt64. For example, GOGC=100 means that starting from 1GB live memory usage, the program is allowed to allocate 100% more memory (doubling the heap to 2GiB) before the garbage collector scans the heap and releases unused memory. So out of your 32GB, half of that or more might be garbage.

The claimtrie build process makes lots of temporary allocations which become garbage immediately. The live memory needed to store the claimtrie is around 7GB, but you might observe the lbcd process using up to 14GB at the given moment (from the OS perspective).

See:
https://go.dev/doc/gc-guide#GOGC
https://go.dev/doc/gc-guide#Memory_limit

More on GOMEMLIMIT:
https://pkg.go.dev/runtime/debug#SetMemoryLimit

Running lbcd with a command like env GOGC=50 GOMEMLIMIT=16GiB ./lbcd ... would make the garbage collector more aggressive, especially above 16GiB heap usage. The cost of this is CPU time spent scanning the heap more often. The CPU time is usually not a big deal as long as there is 1 extra CPU core idle/available.

@niglomeister
Copy link
Author

niglomeister commented Jul 9, 2023

Thank you.
i've tried to run it with env GOGC=50 GOMEMLIMIT=16GiB ./lbcd but i get the same result.
I've tried deleting the .lbcd folder and am now in the process of re-syncing the whole blockchain with the lastest release of lbcd. When i'm close to height 1 000 000 like i was before i'll try building the claim trie again and report on my results

@roylee17
Copy link
Collaborator

roylee17 commented Jul 10, 2023

FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT)

2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available
2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246
2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT)
2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available
2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available
2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246
2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT)

@niglomeister
Copy link
Author

FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT)

2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available
2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246
2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT)
2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available
2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available
2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246
2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT)

Just to be clear my problem happened not when syncing the blockchain but at the "building the full claimtrie in ram" point when restarting the node after it had been synced.

But it looks like you were right. My database must have been corrupted, i deleted the wholde .lbcd folder and started with a fresh install from the latest version and have now synced to the height i was at before.

When i restart lbcd now the claimtrie build only takes about 7GB of ram like you told me.

Thanks everyone for your help

@niglomeister
Copy link
Author

Hey. Just to say that the same bug is happening again. When i build the claimtrie it takes all my 16gb of rame + 16gb of swap. I'll delete the database again and resync since it solved it last time but it would be good to look into what's causing that issue

@kaichaosun
Copy link

kaichaosun commented Oct 30, 2023

I confirm the issue also exist in my linux server with 16GB RAM.

2023-10-30 15:35:52.667 [INF] MAIN: RAM: using 10.4 GB with 4.8 available, DISK: using 56.7 GB with 156.8 available
2023-10-30 15:35:56.161 [INF] CHAN: Rebuilding claim trie data to 936681. At: 650684
2023-10-30 15:36:01.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 666520
2023-10-30 15:36:06.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 678459
2023-10-30 15:36:11.178 [INF] CHAN: Rebuilding claim trie data to 936681. At: 689073
2023-10-30 15:36:16.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 692127
2023-10-30 15:36:21.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 697614
2023-10-30 15:36:26.180 [INF] CHAN: Rebuilding claim trie data to 936681. At: 703273
2023-10-30 15:36:31.181 [INF] CHAN: Rebuilding claim trie data to 936681. At: 707315
2023-10-30 15:36:32.708 [INF] MAIN: RAM: using 13.1 GB with 2.2 available, DISK: using 56.7 GB with 156.8 available
2023-10-30 15:36:36.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 711832
2023-10-30 15:36:41.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 715616
2023-10-30 15:36:46.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 722001
2023-10-30 15:36:51.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 727028
2023-10-30 15:36:56.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 734153
2023-10-30 15:37:01.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 741013
2023-10-30 15:37:06.559 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743350
2023-10-30 15:37:13.408 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743354
2023-10-30 15:37:18.504 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743356

This is pprof map:
pprof001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants