Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow bitcoind startup #760

Closed
chemicstry opened this issue Feb 13, 2018 · 27 comments
Closed

Slow bitcoind startup #760

chemicstry opened this issue Feb 13, 2018 · 27 comments
Assignees
Labels
backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) resource usage Software resource usage improvements/issues

Comments

@chemicstry
Copy link

LND is starting extremely slowly with bitcoind backend doing rescans.

2018-02-13 18:55:18.804 [INF] NTFN: New block epoch subscription
2018-02-13 18:55:18.804 [INF] NTFN: New block epoch subscription
2018-02-13 18:55:18.804 [INF] NTFN: New block epoch subscription
2018-02-13 18:55:18.804 [INF] NTFN: New block epoch subscription
2018-02-13 18:55:18.819 [INF] NTFN: New spend subscription: utxo=28b94cf8a86b3d0060a9890bf8bcfa9d0dc3b6e405833ae5e70837ac776212ee:1
2018-02-13 18:55:18.846 [INF] NTFN: New spend subscription: utxo=0e687f8f0a251399eea30ba0c08efc4b8beef38cac86f48ae08c3932ddd2a969:0
2018-02-13 18:55:18.948 [INF] NTFN: New spend subscription: utxo=3d3a25393eec52cb153cc12bae9dc0aa77323606c38f39742fa820e2dd4afe59:1
2018-02-13 18:55:19.048 [INF] NTFN: New spend subscription: utxo=e559f4c7225310d6f4a4f8ce450009a96e1b070a5877910bcffc8b132938f553:1
2018-02-13 18:55:19.153 [INF] LNWL: Starting rescan from block 00000000000000000010c18be77f4c3237bf7fa77805bf689396c7fe5453d476
2018-02-13 18:56:35.605 [INF] LNWL: Rescan finished at 508505 (00000000000000000010c18be77f4c3237bf7fa77805bf689396c7fe5453d476)
2018-02-13 18:56:35.605 [INF] LNWL: Starting rescan from block 0000000000000000000327a1d30680756ef1111de04532647223e2432a198a30
2018-02-13 19:03:07.445 [INF] LNWL: Rescan finished at 506390 (0000000000000000000327a1d30680756ef1111de04532647223e2432a198a30)
2018-02-13 19:03:07.445 [INF] LNWL: Starting rescan from block 0000000000000000001cf7d8fd084aaa08fd5ff6f6feef379232622553b4633b
2018-02-13 19:03:07.445 [INF] NTFN: New spend subscription: utxo=253d41023ff55c99804a246be32ac02be91a935408f4f755bcb2ba986457a15a:0
2018-02-13 19:08:57.196 [INF] LNWL: Rescan finished at 506012 (0000000000000000001cf7d8fd084aaa08fd5ff6f6feef379232622553b4633b)
2018-02-13 19:08:57.196 [INF] NTFN: New spend subscription: utxo=aff36bb2f8fe08e9ceee8080d2d0e777a6ae766df83413b3e653f1ce8801c802:1
2018-02-13 19:08:57.196 [INF] LNWL: Starting rescan from block 000000000000000000025b581ba7e6bbd5219b09243a8a5911f6f8a2c75eb285
2018-02-13 19:14:23.570 [INF] LNWL: Rescan finished at 506206 (000000000000000000025b581ba7e6bbd5219b09243a8a5911f6f8a2c75eb285)
2018-02-13 19:14:23.571 [INF] NTFN: New spend subscription: utxo=f6e2e8af276823f5b638d4235ef6dadd86c3b6072559ec55ed2cbfa76cf747b4:1
2018-02-13 19:14:23.571 [INF] LNWL: Starting rescan from block 00000000000000000005c6c1b12537172b3d81e689c13affc72699d3eefde603
2018-02-13 19:14:23.660 [ERR] CNCT: unable to register for spend: Received blocks out of order: current height=506012, disconnected height=506517
2018-02-13 19:19:21.619 [INF] LNWL: Rescan finished at 506392 (00000000000000000005c6c1b12537172b3d81e689c13affc72699d3eefde603)
2018-02-13 19:19:21.620 [INF] LNWL: Starting rescan from block 0000000000000000005dee5fc1bccec86695871837444af26a08c435e254c8d1
2018-02-13 19:23:04.476 [INF] LNWL: Rescan finished at 506517 (0000000000000000005dee5fc1bccec86695871837444af26a08c435e254c8d1)

As you can see in the log, it took 30 minutes and after that it just idles. Restarting lnd does rescanning again but with different subscriptions/blocks. After a numerous restarts it finally starts up and operates normally.

During rescanning LND is loading single cpu core to 100%, bitcoind sits at about 20%. I thought this is due to slow HDD, but iowait is just 1-3%.

Full log of multiple starts: https://pastebin.com/UeC6fCzH

I have heard reports that switching from bitcoind to btcd makes it start instantly, which makes me believe that this issue is exclusive to bitcoind. I have not personally tested with btcd yet (doing it now).

LND version: 7abdd30

@Roasbeef
Copy link
Member

The issue is that bitcoind doesn't have a proper rescan API like btcd does. With btcd, we issue the call and get any relevant notifications of spends, etc. While with bitcoind we're forced to manually fetch each block and scan it in memory ourselves. Once bitcoind implements BIP 158, we'll be able to use those filters to skip actually fetching most of those blocks.

@Roasbeef
Copy link
Member

It shouldn't be that slow though, it appears that there's something else going on there. Can you try a restart, but logging bumped up a level or two? Then make another paste with the results, thanks.

@chemicstry
Copy link
Author

I see. Thanks for explanation.

I temporarily started lnd on neutrino just to sync and now it starts on bitcoind without rescans. I get these rescans when I do not restart lnd for longer periods. I will do debug log once they reappear.

@meshcollider meshcollider added resource usage Software resource usage improvements/issues backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) labels Feb 14, 2018
@chemicstry
Copy link
Author

chemicstry commented Feb 21, 2018

Here is a trace log of a rescan: https://pastebin.com/VmTUjHUN
This one was quick because there wasn't that much to rescan.

But I think 10 blocks/sec is quite slow and it is the issue here.
CPU usage was roughly the same (LND 100%, bitcoind 15%, iowait ~2% on i7-4770).

I have no idea when BIP 158 is coming, but if its soon, then there is no point wasting time on fixing this the old way.

@Roasbeef
Copy link
Member

Roasbeef commented Feb 21, 2018

Where do you see 10 seconds a block?

2018-02-21 15:19:43.048 [DBG] LNWL: Filtering block 509420 (0000000000000000004489ac5522cc40dca9db8f54f72ec8a5e1bd5e9696d2aa) with 479 transactions
2018-02-21 15:19:43.144 [DBG] LNWL: Filtering block 509421 (000000000000000000017baaf4072b9ed6554ea43b75d11442b8c4e9006ad795) with 2377 transactions

Looks to actually be a few ms?

It's just due to the fact that bitcoind doesn't provide hooks for application developers in the same manner that btcd does.

From that log, the whole start up took ~2 minutes, which isn't too bad considering that it had to scan a few hundred blocks.

@Roasbeef
Copy link
Member

I have no idea when BIP 158 is coming, but if its soon, then there is no point wasting time on fixing this the old way.

There's a PR up on the bitcoind repo to add the initial indexing for the filters, the next step would be to expose it on the rpc interface. Optimistically, it may land in the next release. But we'll see...

@acurious
Copy link

acurious commented Apr 3, 2018

Seems like I have similar issue: raspibolt/raspibolt#42
I'm a beginner witth 0.2 btc stuck due to that. I wonder should I try to install and run btcd on my Raspberry Pi to be able to close my channels now? Or is there better option?

@Roasbeef
Copy link
Member

Roasbeef commented Apr 3, 2018

@acurious BTC stuck due to the slow start up time? You should only see this when creating new lnd instance backed by bitcoind for the first time. Subsequent time should be pretty snappy unless you've been offline for like several months.

@acurious
Copy link

acurious commented Apr 3, 2018

It used to start up normally. My lightning node does not start up anymore from yesterday. Keeps starting rescans and hangs. My bitcoind is synced with the Bitcoin network. I have restated the whole node several times. I let the node run for 6 hours without any progress.

Edit: some logs: https://pastebin.com/nyvG7xa4

@chemicstry
Copy link
Author

My node does deep rescanning after long runs without restart, even though it was running normally up to that point. Maybe it loses some state, I don't know.

Other problem is that rescans occasionally stop. It runs for ~15mins on 100% cpu, then drops to 0% and does nothing. Logs end with [INF] LNWL: Rescan finished at (...). Restarting multiple times and letting it run eventually complete the resync process and it starts up normally. I think this is the exact behaviour @acurious is observing.

Starting with neutrino or btcd backend makes resync much faster. I can start with bitcoind right away after syncing with neutrino.

@acurious
Copy link

acurious commented Apr 4, 2018

My node does deep rescannning for ~15mins on 100% cpu, then drops to 1% cpu and does nothing. Logs end with [INF] LNWL: Rescan finished at (...). Restarting multiple times and letting it run does not complete the resync process and the node never starts up. As I understand, neutrino is not an option for LND in mainnet yet. I do not have btcd backend available at the moment. Any ideas how to proceed?

@acurious
Copy link

acurious commented Apr 4, 2018

Update. I set debug level to trace. Last lines from log before hanging are:

Apr 04 17:34:30 raspberrypi lnd[2169]: 2018-04-04 17:34:30.588 [INF] RPCS: [debuglevel] changing debug level to: trace
Apr 04 17:54:41 raspberrypi lnd[2169]: 2018-04-04 17:54:41.660 [INF] LNWL: Rescan finished at 516600 (000000000000000000113417d5a37e463f27a538ff0f9c92f75976fe94569952)
Apr 04 17:54:41 raspberrypi lnd[2169]: 2018-04-04 17:54:41.679 [INF] LNWL: Starting rescan from block 0000000000000000000253eb008c988272de462e74a2e18da24d136ba117562d
Apr 04 17:54:41 raspberrypi lnd[2169]: 2018-04-04 17:54:41.688 [INF] NTFN: New spend subscription: utxo=a85bbfcc66595ad3cf07a0c6816a2d97c73deadb545a5e8db4b8d6e65458a38f:1
Apr 04 17:54:41 raspberrypi lnd[2169]: 2018-04-04 17:54:41.729 [INF] CNCT: Close observer for ChannelPoint(ffbd58efb475292c43ae4721d6ef5c7479f07f95a3bbd0a59cfd65a918e298da:1) active
Apr 04 17:54:41 raspberrypi lnd[2169]: 2018-04-04 17:54:41.729 [DBG] CNCT: Starting chain watcher for ChannelPoint(44bb2bec5242e1b8590832af24fdbc97d0a4034faf58c10777a11ee4cb74daac:0)
Apr 04 18:05:21 raspberrypi lnd[2169]: 2018-04-04 18:05:21.727 [INF] LNWL: Rescan finished at 516601 (00000000000000000018f2563d7c4e7d6a7f265bd103cfdb431e8e7a13086a8c)
Apr 04 18:05:21 raspberrypi lnd[2169]: 2018-04-04 18:05:21.759 [INF] NTFN: New spend subscription: utxo=44bb2bec5242e1b8590832af24fdbc97d0a4034faf58c10777a11ee4cb74daac:0
Apr 04 18:05:21 raspberrypi lnd[2169]: 2018-04-04 18:05:21.944 [INF] CNCT: Close observer for ChannelPoint(44bb2bec5242e1b8590832af24fdbc97d0a4034faf58c10777a11ee4cb74daac:0) active
Apr 04 18:05:21 raspberrypi lnd[2169]: 2018-04-04 18:05:21.944 [DBG] CNCT: Starting chain watcher for ChannelPoint(78b85973703897bc9183b975817fdf0dedcef2f95ea82dc69b64e645da96a289:1)
Apr 04 18:15:45 raspberrypi lnd[2169]: 2018-04-04 18:15:45.785 [DBG] LNWL: Possible reorg at block 00000000000000000010fd16c7a6942a3c7446e62899de56c5373184fdc35911
Apr 04 18:15:48 raspberrypi lnd[2169]: 2018-04-04 18:15:48.108 [DBG] LNWL: Disconnecting block 516599 (0000000000000000001bc4f462080af4e5025aeb9b8416b9f8a04a94bfae0720)

Edit: Seems like it hangs every time with same message about possible reorg and disconnecting block

Apr 04 20:19:43 raspberrypi lnd[2381]: 2018-04-04 20:19:43.714 [DBG] LNWL: Possible reorg at block 0000000000000000003ee16a6a9faddf5fb9a2f3ccf8b728f6b909c65a36cf63
Apr 04 20:19:47 raspberrypi lnd[2381]: 2018-04-04 20:19:47.970 [DBG] LNWL: Disconnecting block 516611 (0000000000000000000ea4d26e86d542aa438d70fb5330bd4f7e60dba82ee978)

@Roasbeef can you see any issues with lnd here? I do not know if it's related, but the issue started after I seemingly lost some funds for reasons not known to me so far.

@Roasbeef
Copy link
Member

Roasbeef commented Apr 4, 2018

Is your bitcoind fully synced? It looks like it's sending lnd the same re-org over and over again. As I'm typing this response, the current block height is 516635.

@Roasbeef
Copy link
Member

Roasbeef commented Apr 4, 2018

If you're running both bitcoind and lnd on a rasp pi, it could be the case that the bitcoind verifies blocks very slowly leading to verification lag between lnd and bitcoind.

@acurious
Copy link

acurious commented Apr 4, 2018

Yes, my bitcoind is fully synced. I am running both bitcoind and lnd on a rasp pi. I can try to move databases to my laptop to check if the issue persists.

@aakselrod
Copy link
Contributor

@acurious, any chance you could put up a copy of your bitcoin.conf? if you have rpcuser and rpcpassword in there, just put * instead of the actual data please.

@aakselrod
Copy link
Contributor

@chemicstry, you as well please?

@chemicstry
Copy link
Author

txindex=1
server=1
rpcuser=*
rpcpassword=*
zmqpubrawblock=tcp://0.0.0.0:28332
zmqpubrawtx=tcp://0.0.0.0:28332

I have since updated lnd and wiped the database so I no longer have these problems (yet). I think they started to appear when I had 80+ channels open. I'll report if I see them again. I was running on a i7 with 8 gigs of ram.

@acurious
Copy link

acurious commented Apr 5, 2018

# Bitcoind options
server=1
daemon=1
txindex=1
disablewallet=1

# Connection settings
rpcuser=*
rpcpassword=*
zmqpubrawblock=tcp://127.0.0.1:29000
zmqpubrawtx=tcp://127.0.0.1:29000

# Raspberry Pi optimizations
dbcache=100
maxorphantx=10
maxmempool=50
maxconnections=40
maxuploadtarget=5000

I run into issue while having 40+ channels open. I have updated to lnd version 0.4.1-beta after problem occured.

@aakselrod
Copy link
Contributor

OK, looks like it might be having some issues dealing with large numbers of channels, even on high-performance machines. Thanks, this gives me a way to try to reproduce!

@Roasbeef
Copy link
Member

Roasbeef commented Apr 5, 2018 via email

@aakselrod
Copy link
Contributor

Thanks, will take a look and try to optimize, though it might be a larger refactor btcwallet-side to allow the back-end to support multiple rescan/notification clients at one time.

@acurious
Copy link

acurious commented Apr 5, 2018

Tried same setup on my laptop with i5 processor and 4gigs of ram. Similar high load on processor and then no progress. Logs: https://pastebin.com/B2S6TQjW

@Roasbeef
Copy link
Member

Roasbeef commented Apr 7, 2018

Fixed by #1041.

@Roasbeef Roasbeef closed this as completed Apr 7, 2018
@acurious
Copy link

acurious commented Apr 7, 2018

Installed latest version with go. Run make check. Final response lines are

Makefile:150: recipe for target 'unit' failed
make: *** [unit] Error 123

Did run lnd anyway. It performed a lot of tasks, but alway exits with:

2018-04-07 20:48:16.552 [INF] CNCT: ChannelArbitrator(d2667a681f62c9b7b58d2ca94168604f9bf1d70483eb561361bbb2271286f28d:0): starting state=StateBroadcastCommit
2018-04-07 20:48:16.553 [TRC] CNCT: ChannelArbitrator(d2667a681f62c9b7b58d2ca94168604f9bf1d70483eb561361bbb2271286f28d:0): attempting state step with trigger=chainTrigger
2018-04-07 20:48:16.553 [INF] CNCT: ChannelArbitrator(d2667a681f62c9b7b58d2ca94168604f9bf1d70483eb561361bbb2271286f28d:0): force closing chan
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8e9554]

goroutine 1 [running]:
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).stateStep(0xc4207c2c00, 0x7e3ee, 0xc4202df580, 0xc420221400, 0x2, 0x2, 0xc4202df580, 0x0)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:432 +0x744
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).advanceState(0xc4207c2c00, 0xc40007e3ee, 0xc4202df580, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:642 +0x167
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).Start(0xc4207c2c00, 0x0, 0x0)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:250 +0x30c
github.com/lightningnetwork/lnd/contractcourt.(*ChainArbitrator).Start(0xc42016a5a0, 0x0, 0x0)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/contractcourt/chain_arbitrator.go:421 +0xa03
main.(*server).Start(0xc420510a00, 0xec3e34, 0x2e)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/server.go:522 +0xf5
main.lndMain(0x0, 0x0)
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/lnd.go:574 +0x1aeb
main.main()
	/home/bitcoin/go/src/github.com/lightningnetwork/lnd/lnd.go:623 +0x36

Wonder, if I've done something wrong myself

@Roasbeef
Copy link
Member

Roasbeef commented Apr 7, 2018

If you apply this patch: #975 (comment)

You should be able to restart. The particular bug you're running into will be fixed by a pending PR. Optimistically, it should land in master next week.

@acurious
Copy link

acurious commented Apr 9, 2018

Manually changed this line in my file, then run make clean and make && make install. Still closes with same error

Edit. logs: https://pastebin.com/yeKjhwKj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) resource usage Software resource usage improvements/issues
Projects
None yet
Development

No branches or pull requests

5 participants