LND restarts with over 1GB channel.db on 32bit linux (RaspberryPi 4) #4811

openoms · 2020-11-30T17:57:37Z

Background

LND is fails when the channel.db reaches the size of 1GB
Described previously here:
raspiblitz/raspiblitz#1778

Now I had this repeatedly happen on two nodes and one has no leeway for further compaction.

Your environment

lnd: v0.11.1
Linux rpi4GB 4.19.118-v7l+ Getting latest commitment transaction #1311 SMP Mon Apr 27 14:26:42 BST 2020 armv7l GNU/Linux
bitcoind v0.20.0
RaspiBlitz scripts behind Tor

Steps to reproduce

Run lnd and once the channel.db grows over 1GB it restarts repeatedly and reproducibly.
This last node seems to alway fail at these log entries:

2020-11-30 17:36:57.455 [TRC] CHDB: Pruning nodes from graph with no open channels
2020-11-30 17:36:57.502 [INF] CHDB: Pruned unconnected node 038b731b984c3e9ef0d1710ffcd2ecdbddbf1d6097e900107d94ead7b7b1fd4956 from channel graph                                                                 
2020-11-30 17:36:57.503 [INF] CHDB: Pruned unconnected node 03ba8210cea60dc7a98a0b6c87a71622d85a664ec203663dcebca09177de7a2563 from channel graph                                                                 
2020-11-30 17:36:57.503 [INF] CHDB: Pruned unconnected node 021b00feca3ea07602cc12a44423735540d8212041864e9cc3c856a6171be88cc7 from channel graph                                                                 
2020-11-30 17:36:57.504 [INF] CHDB: Pruned unconnected node 02151b861f4825eaed7c0e582c46e6af624ae61819eec7ae31f348e3dfa9615d6b from channel graph                                                                 
2020-11-30 17:36:57.504 [INF] CHDB: Pruned unconnected node 021cd9c8d3cb0f934a6e446fc2a7db8e44c5a31b15d181d7d2f07128faf387dd27 from channel graph                                                                 
2020-11-30 17:36:57.504 [INF] CHDB: Pruned 5 unconnected nodes from the channel graph

@Roasbeef has suggested that this might be the case after 2GB-s on a 32bit system, but seems it came earlier.

Working to migrate the project to 64bit OS: raspiblitz/raspiblitz#1199
And I will certainly do it for these nodes asap.
Do you think that switching the database between the 32 and 64 bit ARM architecture poses further risks?

Can be significant problem since the 32bit OS is the default recommendation for Raspberry Pi-s and many projects are based on that inlcuding: RaspiBlitz, RaspiBolt, myNode, Umbrel etc. (Nodl uses different SBC-s and aarch64 architecture from the beginning)

The text was updated successfully, but these errors were encountered:

Roasbeef · 2020-11-30T19:17:42Z

Do you have a breakdown of the file sizes in the entire .lnd directory?

openoms · 2020-11-30T19:40:08Z

Do you have a breakdown of the file sizes in the entire .lnd directory?

Sure:

$ sudo du -ah /mnt/hdd/lnd/
16K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/channel.backup
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/walletkit.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/chainnotifier.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/admin.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/invoices.macaroon
20K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/macaroons.db
2.0M	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/wallet.db
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/signer.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/invoice.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/router.macaroon
4.0K	/mnt/hdd/lnd/data/chain/bitcoin/mainnet/readonly.macaroon
2.1M	/mnt/hdd/lnd/data/chain/bitcoin/mainnet
2.1M	/mnt/hdd/lnd/data/chain/bitcoin
2.1M	/mnt/hdd/lnd/data/chain
9.3M	/mnt/hdd/lnd/data/watchtower/bitcoin/mainnet/watchtower.db
9.3M	/mnt/hdd/lnd/data/watchtower/bitcoin/mainnet
9.3M	/mnt/hdd/lnd/data/watchtower/bitcoin
4.0K	/mnt/hdd/lnd/data/watchtower/v3_onion_private_key
9.3M	/mnt/hdd/lnd/data/watchtower
21M	/mnt/hdd/lnd/data/graph/mainnet/sphinxreplay.db
1008M	/mnt/hdd/lnd/data/graph/mainnet/channel.db
33M	/mnt/hdd/lnd/data/graph/mainnet/wtclient.db
1.0G	/mnt/hdd/lnd/data/graph/mainnet/uncompacted.db
2.1G	/mnt/hdd/lnd/data/graph/mainnet
2.1G	/mnt/hdd/lnd/data/graph
2.1G	/mnt/hdd/lnd/data
4.0K	/mnt/hdd/lnd/v3_onion_private_key
4.0K	/mnt/hdd/lnd/lnd.conf
3.4M	/mnt/hdd/lnd/logs/bitcoin/mainnet/lnd.log
2.0M	/mnt/hdd/lnd/logs/bitcoin/mainnet/lnd.log.21554.gz
2.8M	/mnt/hdd/lnd/logs/bitcoin/mainnet/lnd.log.21553.gz
2.0M	/mnt/hdd/lnd/logs/bitcoin/mainnet/lnd.log.21552.gz
11M	/mnt/hdd/lnd/logs/bitcoin/mainnet
11M	/mnt/hdd/lnd/logs/bitcoin
11M	/mnt/hdd/lnd/logs
4.0K	/mnt/hdd/lnd/tls.key
4.0K	/mnt/hdd/lnd/lnd.conf.save
4.0K	/mnt/hdd/lnd/tls.cert
2.1G	/mnt/hdd/lnd/

Actually I left the backup up of the channel.db (uncompacted.db) in there too.
Now moved it out (just 5 mins ago - so no restarts since). Could that be part of the problem?

guggero · 2020-11-30T21:06:18Z

Having the uncompacted.db file in there should have no effect at all. Probably just coincidence that it's running longer now.

As to your question about migrating the channel DB from 32bit to 64bit: This is something that I'd like to have a definitive answer to as well. In the past I've always recommended to not migrate between operating systems and/or architectures.
But digging a bit into bbolt, it seems there at least is the intention of being cross-OS/cross-arch compatible. At least according to this issue there only seem to be differences between Windows and the rest, and only for large files. Maybe you could try running that sample code provided there both compiled as 32bit and 64bit binary?

For the affected nodes, I'd probably try compiling the bbolt binary with 64bit and run compaction on the 32bit channel.db (and other DBs too). If that doesn't result in errors and bbolt check neither, I'd dare to then run everything with a 64bit lnd binary.
But perhaps try this with a larger testnet node first? Because once you started the "migrated" channel.db you shouldn't just go back and run the old one, just to make sure a stray fee update doesn't risk you publishing an old state when going back to the old DB.

openoms · 2020-12-01T13:22:23Z

LND failed again today (after I was able to gain some megabytes with compaction the last time) and now even chantools fails:

$ chantools compactdb --sourcedb /mnt/hdd/lnd/data/graph/mainnet/channel.db                 --destdb /mnt/hdd/lnd/data/graph/mainnet/compacted.db
2020-12-01 13:17:03.718 [INF] CHAN: chantools version v0.5.1 commit v0.5.1
unexpected fault address 0x22aa0044
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x22aa0044 pc=0x2eb55c]

goroutine 1 [running]:
runtime.throw(0x67fbcc, 0x5)
	/usr/local/go/src/runtime/panic.go:774 +0x5c fp=0x1c3da28 sp=0x1c3da14 pc=0x3ecec
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:401 +0x310 fp=0x1c3da40 sp=0x1c3da28 pc=0x54010
github.com/coreos/bbolt.(*DB).meta(0x1d7c140, 0x376)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/db.go:901 +0x1c fp=0x1c3da5c sp=0x1c3da44 pc=0x2eb55c
github.com/coreos/bbolt.(*DB).hasSyncedFreelist(...)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/db.go:323
github.com/coreos/bbolt.(*Tx).rollback(0x1eb2580)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/tx.go:279 +0x68 fp=0x1c3da74 sp=0x1c3da5c pc=0x2f493c
github.com/coreos/bbolt.(*Tx).Commit(0x1eb2580, 0x2ac7b21a, 0x8)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/tx.go:161 +0x430 fp=0x1c3db14 sp=0x1c3da74 pc=0x2f4374
main.(*compactDBCommand).compact.func2(0x20496a0, 0x1, 0x1, 0x2ac7b374, 0x8, 0x8, 0x2ac7b37c, 0x152, 0x152, 0x0, ...)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:77 +0x318 fp=0x1c3db58 sp=0x1c3db14 pc=0x559428
main.(*compactDBCommand).walkBucket(0x1c0bd80, 0x206fda0, 0x20496a0, 0x1, 0x1, 0x2ac7b374, 0x8, 0x8, 0x2ac7b37c, 0x152, ...)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:161 +0x74 fp=0x1c3dbac sp=0x1c3db58 pc=0x54a8b0
main.(*compactDBCommand).walkBucket.func1(0x2ac7b374, 0x8, 0x8, 0x2ac7b37c, 0x152, 0x152, 0x152, 0x0)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:186 +0x264 fp=0x1c3dc04 sp=0x1c3dbac pc=0x559980
github.com/coreos/bbolt.(*Bucket).ForEach(0x206fda0, 0x1c3dc70, 0x0, 0x0)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/bucket.go:388 +0xf8 fp=0x1c3dc38 sp=0x1c3dc04 pc=0x2e5868
main.(*compactDBCommand).walkBucket(0x1c0bd80, 0x206fda0, 0x0, 0x0, 0x0, 0x66c98485, 0x1b, 0x1b, 0x0, 0x0, ...)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:172 +0x124 fp=0x1c3dc8c sp=0x1c3dc38 pc=0x54a960
main.(*compactDBCommand).walk.func1.1(0x66c98485, 0x1b, 0x1b, 0x206fda0, 0x206fda0, 0x0)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:151 +0x13c fp=0x1c3dcd4 sp=0x1c3dc8c pc=0x559664
github.com/coreos/bbolt.(*Tx).ForEach.func1(0x66c98485, 0x1b, 0x1b, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/tx.go:129 +0x70 fp=0x1c3dcf4 sp=0x1c3dcd4 pc=0x2f77d4
github.com/coreos/bbolt.(*Bucket).ForEach(0x1c8c18c, 0x1c3dd3c, 0x1d6e960, 0x0)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/bucket.go:388 +0xf8 fp=0x1c3dd28 sp=0x1c3dcf4 pc=0x2e5868
github.com/coreos/bbolt.(*Tx).ForEach(0x1c8c180, 0x1c63d5c, 0x2eabb4, 0x1d7c000)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/tx.go:128 +0x58 fp=0x1c3dd48 sp=0x1c3dd28 pc=0x2f3e38
main.(*compactDBCommand).walk.func1(0x1c8c180, 0x1d7c200, 0x1c8c180)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:145 +0x54 fp=0x1c3dd68 sp=0x1c3dd48 pc=0x5596f4
github.com/coreos/bbolt.(*DB).View(0x1d7c000, 0x1c63dc0, 0x0, 0x0)
	/home/bitcoin/go/pkg/mod/github.com/coreos/bbolt@v1.3.3/db.go:725 +0x90 fp=0x1c3ddac sp=0x1c3dd68 pc=0x2eac10
main.(*compactDBCommand).walk(0x1c0bd80, 0x1d7c000, 0x1c63e18, 0x0, 0x0)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:144 +0x54 fp=0x1c3ddcc sp=0x1c3ddac pc=0x54a814
main.(*compactDBCommand).compact(0x1c0bd80, 0x1d7c140, 0x1d7c000, 0x0, 0x0)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:72 +0xe4 fp=0x1c3de30 sp=0x1c3ddcc pc=0x54a73c
main.(*compactDBCommand).Execute(0x1c0bd80, 0x1c6c0c0, 0x0, 0x5, 0x1c0bd80, 0x1)
	/home/bitcoin/chantools/cmd/chantools/compactdb.go:39 +0x1bc fp=0x1c3de5c sp=0x1c3de30 pc=0x54a488
github.com/jessevdk/go-flags.(*Parser).ParseArgs(0x1c163c0, 0x1c16038, 0x5, 0x5, 0xae, 0x0, 0x0, 0x5ec4a0, 0x1c88220)
	/home/bitcoin/go/pkg/mod/github.com/jessevdk/go-flags@v1.4.0/parser.go:316 +0x664 fp=0x1c3df34 sp=0x1c3de5c pc=0x53bee8
github.com/jessevdk/go-flags.(*Parser).Parse(...)
	/home/bitcoin/go/pkg/mod/github.com/jessevdk/go-flags@v1.4.0/parser.go:186
main.runCommandParser(0x1, 0x14e0c)
	/home/bitcoin/chantools/cmd/chantools/main.go:160 +0x798 fp=0x1c3df70 sp=0x1c3df34 pc=0x54f240
main.main()
	/home/bitcoin/chantools/cmd/chantools/main.go:53 +0x14 fp=0x1c3dfa4 sp=0x1c3df70 pc=0x54e9f4
runtime.main()
	/usr/local/go/src/runtime/proc.go:203 +0x208 fp=0x1c3dfe4 sp=0x1c3dfa4 pc=0x40c6c
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm.s:868 +0x4 fp=0x1c3dfe4 sp=0x1c3dfe4 pc=0x6a734

goroutine 7 [select]:
io.(*pipe).Read(0x1c16390, 0x1d62000, 0x1000, 0x1000, 0xc50039, 0xea24c, 0x1)
	/usr/local/go/src/io/pipe.go:50 +0xac
io.(*PipeReader).Read(0x1c7caf0, 0x1d62000, 0x1000, 0x1000, 0x4b, 0x4, 0x0)
	/usr/local/go/src/io/pipe.go:127 +0x38
bufio.(*Reader).fill(0x2059f84)
	/usr/local/go/src/bufio/bufio.go:100 +0x108
bufio.(*Reader).ReadSlice(0x2059f84, 0xa, 0x1, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/bufio/bufio.go:359 +0x2c
bufio.(*Reader).ReadLine(0x2059f84, 0xc50039, 0x1, 0x1, 0x1, 0x0, 0x0)
	/usr/local/go/src/bufio/bufio.go:388 +0x24
github.com/jrick/logrotate/rotator.(*Rotator).Run(0x1c16360, 0x89bea0, 0x1c7caf0, 0x0, 0x0)
	/home/bitcoin/go/pkg/mod/github.com/jrick/logrotate@v1.0.0/rotator/rotator.go:100 +0x90
github.com/lightningnetwork/lnd/build.(*RotatingLogWriter).InitLogRotator.func1(0x2080ee0, 0x1c7caf0)
	/home/bitcoin/go/pkg/mod/github.com/guggero/lnd@v0.9.0-beta-rc4.0.20200826102054-8c9171307182/build/logrotator.go:80 +0x30
created by github.com/lightningnetwork/lnd/build.(*RotatingLogWriter).InitLogRotator
	/home/bitcoin/go/pkg/mod/github.com/guggero/lnd@v0.9.0-beta-rc4.0.20200826102054-8c9171307182/build/logrotator.go:79 +0x288

I have the 64bit ARM node ready, already restored a (small) database from a 32bit node.
Will try to run the compaction there first.

openoms · 2020-12-01T13:42:46Z

Went on to migrate the 1GB channel.db to a 64bit ARM system. Looking good so far.
All channels and peers are online and can't see any errors in the LND.log.
Will give it some time now.

guggero · 2020-12-01T17:19:46Z

[signal SIGSEGV: segmentation violation code=0x1 addr=0x22aa0044 pc=0x2eb55c]

Oh oh, that sounds like an actual data corruption issue... Do you get the same error if you start that DB with lnd again?

openoms · 2020-12-01T18:53:48Z

Oh oh, that sounds like an actual data corruption issue... Do you get the same error if you start that DB with lnd again?

I could not start the 1GB+ channel.db on the 32bit system at all (LND restarted immediately), so went on migrating it to the aarch64 system and it is running there since without errors.

Could the chantools error be the same problem with bbolt and the size of the database?

guggero · 2020-12-01T18:56:15Z

Ah, okay. Yeah, it could be, what build/binary version of chantools did you run the command with? If there's no error when using 64bit ARM chantools, that's probably the problem.

openoms · 2020-12-01T21:41:11Z

I built chantools from the source at v0.5.1 on the 32bit system specifying the environment:
CGO_ENABLED=0 GOOS=linux GOARCH=arm GOARM=7 make install
This has worked well before, but could only shrink the database less and less with repeated compactions. On the last one could only gain a few megabytes and it grew back in a couple of days again.

Testing chantools compactdb from the latest source on aarc64 resulted in no errors and went from:
1062508 to 1038376 bytes. LND started again without issues.

It seems that using a 64bit system does solve this issue, but can be a returning question since many projects are based off the default 32bit Raspbian.

guggero · 2020-12-01T21:54:36Z

That's good to know that a 32bit binary can result in a segfault for large DBs and that updating to 64bit helps.

But that doesn't solve the problem for existing users, I agree. We added the gc-canceled-invoices-on-startup flag that does some garbage collection. It's only on master now but will be in the next version. If you run with that set to true, then shutdown and compact again, do you see a big reduction in the size?

openoms · 2020-12-02T10:57:10Z

Just to confirm to test the garbage collection one should:

stop and run chantools compactdb
check channel.db size
build LND from source at the latest master (before lnd 0.12)
start the daemon with lnd --gc-canceled-invoices-on-startup (how long to wait for or look for some message in the lnd.log?)
check channel.db size
stop and run chantools compactdb
check the compacted.db size

Will the garbage collection and/or the compaction feature be on by default (happen on restarts) in lnd v0.12 or remains an option to be run?

guggero · 2020-12-02T11:56:24Z

Yes, that's how I would try it as well. Though this might not do much if most data isn't from canceled invoices. It really depends on the usage of the node, but for now that is the only automatic garbage collection we've implemented (AFIK).
You should see a message in the logs mentioning the garbage collection. I don't recall the exact message.

Just to be clear: If you do this, you won't be able to downgrade the node to lnd v0.11.1-beta because of the database migrations it contains.

There is a new flag to run auto compaction on startup: --db.bolt.auto-compact
You might also want to set --db.bolt.auto-compact-min-age=0 to enable compaction on every restart and not just every week (which is the default). If you use that, you can skip the chantools compactdb calls and instead just restart lnd to achieve the same effect.

openoms · 2020-12-02T19:02:28Z

Just had a report from another active node (not under my control) experiencing the same immediate restarts with LND:

admin@raspberrypi:~ $ sudo du -h /mnt/hdd/lnd/data/graph/mainnet/channel.db
1.0G    /mnt/hdd/lnd/data/graph/mainnet/channel.db

and then with chantools (compiled for 32bit as described here: https://github.com/openoms/lightning-node-management/blob/master/LNDdatabaseCompaction.md):

bitcoin@raspberrypi:~/chantools$ chantools compactdb --sourcedb /mnt/hdd/lnd/data/graph/mainnet/channel.db \
>                 --destdb /mnt/hdd/lnd/data/graph/mainnet/compacted.db
2020-12-02 18:05:45.502 [INF] CHAN: chantools version v0.6.0 commit v0.6.0-1-gf82d78d
unexpected fault address 0x22a95044
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x22a95044 pc=0x2eb55c]

He did not run chantools compactdb before.

guggero · 2020-12-02T19:06:18Z

That means the DB is too large to be opened with any 32bit process. If the user doesn't run a 64bit OS, they could copy it to a 64bit ARM device and run compaction there, with chantools compiled with 64bit. If compaction was never run, the benefit might be large to sufficiently postpone the problem again (until RaspiBlitz updates to 64bit in general).

openoms · 2020-12-02T19:21:11Z

Yes, it sounds clear that there is no solution to this on 32bit ARM and I have just been lucky to be able to run compaction when the issue presented the first time.

As he has no other ARM device advised to try the already functional 64bit RaspiBlitz version: raspiblitz/raspiblitz#1199 (comment). PR: raspiblitz/raspiblitz#1833

Thank you for the explanation about the upcoming features of LND:
--gc-canceled-invoices-on-startup and --db.bolt.auto-compact will help to delay this issue for most and hopefully will give enough time for everyone to update to 64bit pre-emptively.

openoms · 2020-12-09T09:31:14Z

Another node reaching 1GB channel.db on 32bit ARM, just documenting the error message. Nothing left, but to migrate to 64 bit.

[INF] LTND: Version: 0.11.1-beta commit=v0.11.1-beta, build=production, logging=default
[INF] LTND: Active chain: Bitcoin (network=mainnet)
[INF] LTND: Opening the main database, this might take a few minutes...
[INF] LTND: Opening bbolt database, sync_freelist=true
[INF] CHDB: Checking for schema update: latest_version=17, db_version=17
[INF] LTND: Database now open (time_to_open=11.810386ms)!
[INF] RPCS: password gRPC proxy started at 0.0.0.0:8080
[INF] RPCS: password RPC server listening on 0.0.0.0:10009
[INF] LTND: Waiting for wallet encryption password. Use `lncli create` to create a wallet, `lncli unlock` to unlock an existing wallet, or `lncli changepassword` to change the password of an existing wallet and unlock it.
[ERR] LNWL: Failed to open database: cannot allocate memory

$ sudo du  /mnt/hdd/lnd/data/graph/mainnet/channel.db
1048544	/mnt/hdd/lnd/data/graph/mainnet/channel.db

$ lncli unlock
Input wallet password: 
[lncli] rpc error: code = Unknown desc = cannot allocate memory

openoms · 2021-04-05T18:16:04Z

Update, as there is no other solution proposed here I am testing the 64bit Rasperry OS with the RaspiBlitz since opening the issue:
raspiblitz/raspiblitz#1199

The next (v1.7) SDcard release will be based on the 64bit base image and people can already download the RC1 from the dev branch: https://github.com/rootzoll/raspiblitz/tree/dev#downloading-the-software

If a 32bit system has this problem starting with the 64bit image is tested to solve the issue and LND will start again.

Roasbeef · 2021-08-22T18:51:53Z

Closing this as the issue is with the 32-bit systems.

renepickhardt · 2021-08-28T14:57:47Z

Hey @Roasbeef sorry but I have to double check on this: Are you absolutely certain that you want to keep this closed and unfixed which would effectively mean that lnd does not support 32 bit systems in the future (with all the (un)intended consequences that may come with such a decision)?

rkfg · 2021-08-30T13:57:42Z

Two remarks:

for me moving the lnd data posed no issues, I migrated everything from an amd64 machine to armv7 (32 bit), bitcoind, electrumx and lnd data, and it all just worked; now I need to update the base system on my RPi to aarch64 before channels.db grows big enough... but there's still plenty of time. So I suppose there will be no issues if anyone decides to migrate, just make sure not to use the old data anywhere or nasty things will happen.
this size limitation is quite weird indeed, I'd expect 4Gb as it's the uint32 limit (or 2Gb for int32) but 1Gb is too little. Would be interesting to find the reason of these crashes. I only found this vague answer: Max database size boltdb/bolt#535

I suppose it should either be fixed or documented on the main page that lnd explicitly doesn't support 32 bit systems and it's okay for it to crash as soon as the database reaches 1Gb (which isn't that much for a busy node), and you're left to pick up the pieces. Yes, the mainstream distros like Umbrel and Raspiblitz migrated to aarch64 but the very popular Raspbian (that I use) is still 32 bit by default and the 64 bit version is so unofficial/beta that I had to google it, there's no mention of it on the download page. I didn't even know it exists at all!

Roasbeef · 2021-09-02T21:35:20Z

@renepickhardt there's nothing to resolve on our end which is the reason why this issue is closed. This is a matter of the architecture that lnd is running on, as well as the kernel related settings and how that interacts with the default database.

The reason why users run into this for 32-bit systems is entirely related to their kernel settings. bbolt uses memory mapping by default to map the entire database onto the virtual memory address space. Ignoring everything else, this means that the database size can grow up to 4 GB (not all physically mapped, but simply addressable). Once the DB size gets over 1 GB, bolt then attempts to re-size the memory map to attempt to double it to 2GB or so. At this point what happens next is dependent on the memory split of the user space vs the kernel. If kernel is set to (as an example) occupy 3 GB of space, with 1 GB for the user, then this operation will fail (example for 4GB of RAM some pis have less).

In terms of the kernel here, if users only want to run 32-bit systems, then I believe activating the PAE extensions for the "high memory" kernel operating mode can help here. The issues users that pack everything running on a single pi (certainly not advised if you want reliability, but again you get what you pay for with a rasp pi), is that other processes (bitcoind, etc) are also competing for the address space. Depending on swap settings and addressable physical RAM, not everything may be able to fit.

I wager that only very old or very large nodes running on pi's etc (that haven't examined their kernel settings at all), need to deal with this. For larger nodes, if a node is that larger then it would behove the user to switch to more reliable hardware as pi's notoriously can run into hardware failure issues.

In practice, if a user ever detects this, then it's likely due to the fact that they may have never compacted their database. bolt doesn't actually reclaim the space when things are deleted, instead it keeps it all around on disk, putting the free pages in a free list for new DB operations. Users that run into this can usually just compact their database and the issue goes away. The latest versions of lnd (master to be made into 0.14) are also better about deleting state they not longer need, which results in smaller database sizes.

Beyond that, lnd 0.14 will ship with a newly optimized etcd backend, as well as initial support for postgres. For users seeking to operate a more reliable set up, both those options are certainly better than potentially storing all the data on an SD card.

Roasbeef · 2021-09-02T21:44:12Z

To provide a bit more detail as to why the operation can fail (lets assume 4 GB of RAM, DB is 2 GB at this point), see this method: https://github.com/etcd-io/bbolt/blob/master/node.go#L517

What happens is the bolt needs to copy the entire database into heap memory temporarily to ensure that once it unmaps, then maps the database, the inodes, etc aren't pointing to stale memory. I think this is the error most people are running into: when then DB needs to double in size initially (and any 1 GB increments beyond that) there simply isn't enough addressable physical memory.

Roasbeef · 2021-09-02T21:47:49Z

One attempt to mitigate this somewhat in the past was trying to set an initial memory map of the largest DB a 32-bit system can handle: btcsuite/btcwallet#697

This would mean that the DB never needs to be copied over as it would never need to be remapped. IIRC we fell short on testing on that, and also just generally concluded that in 2021 maybe the effort to make things slightly safer for 32-bit system may not have been worth it. Since then IIRC, we entertained removing the compiled 32-bit binaries, but then decided that for smaller nodes (as they should be on a pi) regular compaction resolves the issue in practice.

lucasmoten mentioned this issue Dec 2, 2020

Consider transitioning next release to 64 Bit raspibolt/raspibolt#669

Closed

mutatrum mentioned this issue May 5, 2021

unexpected fault address fatal error #5270

Closed

Roasbeef closed this as completed Aug 22, 2021

Roasbeef mentioned this issue Aug 22, 2021

lnd won't start up (unable to open databases: unable to obtain database backends: cannot allocate memory) #5649

Closed

VajraOfIndra mentioned this issue Nov 4, 2021

Monitoring channel.db size in welcome/motd script raspibolt/raspibolt#776

Closed

openoms mentioned this issue Nov 8, 2021

Multiple improvements of electrs installation raspibolt/raspibolt#777

Closed

VajraOfIndra mentioned this issue Dec 1, 2021

Monitoring channel.db size in welcome/motd script raspibolt/raspibolt#809

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LND restarts with over 1GB channel.db on 32bit linux (RaspberryPi 4) #4811

LND restarts with over 1GB channel.db on 32bit linux (RaspberryPi 4) #4811

openoms commented Nov 30, 2020

Roasbeef commented Nov 30, 2020

openoms commented Nov 30, 2020

guggero commented Nov 30, 2020

openoms commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 2, 2020

guggero commented Dec 2, 2020

openoms commented Dec 2, 2020

guggero commented Dec 2, 2020

openoms commented Dec 2, 2020

openoms commented Dec 9, 2020

openoms commented Apr 5, 2021

Roasbeef commented Aug 22, 2021

renepickhardt commented Aug 28, 2021

rkfg commented Aug 30, 2021

Roasbeef commented Sep 2, 2021 •

edited

Loading

Roasbeef commented Sep 2, 2021

Roasbeef commented Sep 2, 2021

LND restarts with over 1GB channel.db on 32bit linux (RaspberryPi 4) #4811

LND restarts with over 1GB channel.db on 32bit linux (RaspberryPi 4) #4811

Comments

openoms commented Nov 30, 2020

Background

Your environment

Steps to reproduce

Roasbeef commented Nov 30, 2020

openoms commented Nov 30, 2020

guggero commented Nov 30, 2020

openoms commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 1, 2020

guggero commented Dec 1, 2020

openoms commented Dec 2, 2020

guggero commented Dec 2, 2020

openoms commented Dec 2, 2020

guggero commented Dec 2, 2020

openoms commented Dec 2, 2020

openoms commented Dec 9, 2020

openoms commented Apr 5, 2021

Roasbeef commented Aug 22, 2021

renepickhardt commented Aug 28, 2021

rkfg commented Aug 30, 2021

Roasbeef commented Sep 2, 2021 • edited Loading

Roasbeef commented Sep 2, 2021

Roasbeef commented Sep 2, 2021

Roasbeef commented Sep 2, 2021 •

edited

Loading