Skip to content

@Roasbeef Roasbeef released this Apr 17, 2019 · 4899 commits to master since this release

This release marks a new major release of lnd that includes several important bug fixes, numerous performance optimizations, static channel backups (SCB), reduced bandwidth usage for larger nodes, an overhaul of the internals of the autopilot system, and a new batch sweeping sub-system. Due to the nature of some of the bug fixes which were made during the implementation of the new SCB feature, users are highly encouraged to upgrade to this new version.

Database migrations

This version includes a single migration to modify the message store format, used to send messages to remote peers reliably when attempting to construct channel proofs. The migration should appear as below:

2019-04-03 22:35:44.596 [INF] LTND: Version: 0.6.0-beta commit=v0.6-beta, build=production, logging=default
2019-04-03 22:35:44.596 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2019-04-03 22:35:44.597 [INF] CHDB: Checking for schema update: latest_version=8, db_version=7
2019-04-03 22:35:44.597 [INF] CHDB: Performing database schema migration
2019-04-03 22:35:44.597 [INF] CHDB: Applying migration #8
2019-04-03 22:35:44.597 [INF] CHDB: Migrating to the gossip message store new key format
2019-04-03 22:35:44.597 [INF] CHDB: Migration to the gossip message store new key format complete!

Verifying the Release

In order to verify the release, you'll need to have gpg or gpg2 installed on your system. Once you've obtained a copy (and hopefully verified that as well), you'll first need to import the keys that have signed this release if you haven't done so already:

curl https://keybase.io/roasbeef/pgp_keys.asc | gpg --import

Once you have his PGP key you can verify the release (assuming manifest-v0.6-beta.txt and manifest-v0.6-beta.txt.sig are in the current directory) with:

gpg --verify manifest-v0.6-beta.txt.sig

You should see the following if the verification was successful:

gpg: assuming signed data in 'manifest-v0.6-beta.txt'
gpg: Signature made Tue Apr 16 14:35:13 2019 PDT
gpg:                using RSA key F8037E70C12C7A263C032508CE58F7F8E20FD9A2
gpg: Good signature from "Olaoluwa Osuntokun <laolu32@gmail.com>" [ultimate]

That will verify the signature on the main manifest page which ensures integrity and authenticity of the binaries you've downloaded locally. Next, depending on your operating system you should then re-calculate the sha256 sum of the binary, and compare that with the following hashes (which are included in the manifest file):

e9f3ff0f551f3ce4ad113ad10f8009a3aaca82fb6cd0244a994c299602b29334  lnd-darwin-386-v0.6-beta.tar.gz
76816b2d0d0e3f4c0e7d41c0ecb1457afe2ba95b8f37f4aa1adebbd9bc19aa4b  lnd-darwin-amd64-v0.6-beta.tar.gz
f7650749dc50c3f8c1957680333d95562b159ed13fea26737dc29bff76212925  lnd-dragonfly-amd64-v0.6-beta.tar.gz
7b77ecbfcffb3e2151ff54c27133aebe0d9b6324c80594bce4df265b5f990f61  lnd-freebsd-386-v0.6-beta.tar.gz
d48ce7ed7cc71e988af65e4175e949a5e52f2b8109f5239ae595edc3b8442f05  lnd-freebsd-amd64-v0.6-beta.tar.gz
c783ce9987577d2b7a5e95b6a16767158ef98f48d0eeedf58f4c3a1ce7500e6d  lnd-freebsd-arm-v0.6-beta.tar.gz
cde995167b428696cd6e78733fd934ebda9e03c0b63938af4654c42bd2d86e88  lnd-linux-386-v0.6-beta.tar.gz
ef37b3658fd864dfb3af6af29404d92337229378c24bfb78aa2010ede4cd06af  lnd-linux-amd64-v0.6-beta.tar.gz
2f31b13a4da6217ed7e27a44e1705103d7ed846aa2f599b7e5de0e6033a66c19  lnd-linux-arm64-v0.6-beta.tar.gz
ae8571de0e033a05279469348102982fcfbd3f88c83d549a3d43165ab8ab5aa0  lnd-linux-armv6-v0.6-beta.tar.gz
effea372c207293fd42b0cc27800da3a70c22f8c9a0e7b5eb8dbe56b5b98e1a3  lnd-linux-armv7-v0.6-beta.tar.gz
61038e6cd67562ba3d832de38917d6d37b0cb74fe5e32d4d41fb6d9193f8109d  lnd-linux-mips64-v0.6-beta.tar.gz
28e5be6510fbae4f893253b34db0fcc92d720016f46abe00684a00d2d11a1be3  lnd-linux-mips64le-v0.6-beta.tar.gz
5c13f83344d2634763cf4e178a2d2ca69031a985030713d585d3b37f7a261c06  lnd-linux-ppc64-v0.6-beta.tar.gz
07e91fc56cb0cfcfe52dcaa2bdec008c401b04fe466e970449bcdb4ebb6bb077  lnd-netbsd-386-v0.6-beta.tar.gz
465f4649bdb1393543de52b0dc60aa6121fad0bcf5ad8e7ff62a72c2484dd264  lnd-netbsd-amd64-v0.6-beta.tar.gz
f75b70cf657bffef6cbf2147f69e4296fb98adb48bd18e26950aedb7802748e9  lnd-openbsd-386-v0.6-beta.tar.gz
e5ce0e16a815d5ad98a60c9f7a148efcdb083c0af73965962156f2e3fc03e0df  lnd-openbsd-amd64-v0.6-beta.tar.gz
2c46e9e1f519fe7b0177f30c77611591023442df63c0a1e186154baf7dd9a284  lnd-source-v0.6-beta.tar.gz
77069e9971cd3240891698c02d821ae28254f765c77b5f943b6b88b4943434e7  lnd-windows-386-v0.6-beta.zip
b84de3702074f7e6ecab5f60a1489fb4ee9cd83bcf7c7e9a44604c600ff1d37e  lnd-windows-amd64-v0.6-beta.zip
31205f95fcf7bab7eeff043807ef0485ca14f4506e2e3864db81411ef637aebc  vendor.tar.gz

One can use the shasum -a 256 <file name here> tool in order to re-compute the sha256 hash of the target binary for your operating system. The produced hash should be compared with the hashes listed above and they should match exactly.

Finally, you can also verify the tag itself with the following command:

git verify-tag v0.6-beta

Building the Contained Release

With this new version of lnd, we've modified our release process to ensure the bundled release is now fully self contained. As a result, with only the attached payload with this release, users will be able to rebuild the target release themselves without having to fetch any of the dependencies. Note that at this stage, binaries aren't yet fully reproducible (even with go modules). This is due to the fact that by default, Go will include the full directory path where the binary was built in the binary itself. As a result, unless your file system exactly mirrors the machine used to build the binary, you'll get a different binary, as it includes artifacts from your local file system. This will be fixed in go1.13, and before then we may modify our release system to do this automatically.

In order to re-build from scratch, assuming that vendor.tar.gz and lnd-source-v0.6-beta.tar.gz are in the current directory:

tar -xvzf vendor.tar.gz
tar -xvzf lnd-source-v0.6-beta.tar.gz
GO111MODULE=on go install -v -mod=vendor -ldflags "-X github.com/lightningnetwork/lnd/build.Commit=v0.6-beta"
GO111MODULE=on go install -v -mod=vendor -ldflags "-X github.com/lightningnetwork/lnd/build.Commit=v0.6-beta" ./cmd/lncli

The -mod=vendor flag tells the go build command that it doesn't need to fetch the dependencies, and instead, they're all enclosed in the local vendor directory.

Additionally, it's now possible to use the enclosed release.sh script to bundle a release for a specific system like so:

LNDBUILDSYS="linux-arm64 darwin-amd64" ./release.sh

The release.sh script will now also properly include the commit hash once again, as a regression caused by a change to the internal build system has been fixed.

⚡️⚡️⚡️ OK, now to the rest of the release notes! ⚡️⚡️⚡️

Release Notes

Protocol and Cross-Implementation Compatibility Fixes

We’ll now properly validate our own announcement signatures for NodeAnnouncements before writing them to disk and propagating them to other peers.

A bug has been fixed causing us to send an FinalFailExpiryTooSoon error rather than a FinalFailIncorrectCltvExpiry when the last HTLC of a route has an expiration height that is deemed too soon by the final destination of the HTLC.

Aliases received on the wire are now properly validated. Additionally, we’ll no longer disconnect peers that send us invalid aliases.

A bug has been fixed that would at times cause commitments to desynchronize in the face of multiple concurrent updates that included an UpdateFee message. The fix generalizes the existing commitment state machine logic to treat an UpdateFee message as we would any other Update* messages.

We’ll now reject funding requests that require an unreasonable confirmation depth before the channel can be used.

We’ll now space out our broadcast batches more in order to save bandwidth and consolidate more updates behind a single batch.

We’ll now require all peers we connect to, to have the DLP (Data Loss Protection) bit set. This is required for the new SCB (Static Channel Backups) to function properly.

For private channels, we’ll now always resend the latest ChannelUpdate to the remote peer on reconnecting. This update is required to properly make invoices with hop hints which are required for receiving over a non-advertised channel.

Reject and Channel Caches

A number of internal caches have been added to reduce memory idle memory usage with a large number of peers, and also reduce idle CPU usage due to stale channel updates.

In this release, lnd now maintains a small reject cache for detecting stale ChannelAnnouncment and ChannelUpdate messages from its peers. Prior versions of lnd would perform a database lookup for each incoming messages, which produced a huge amount of contention under load and as the channel graph exploded.

The reject cache maintains just 25 bytes per edge, and easily holds today's graph in memory. Users on low power devices or with a large number of peers will benefit immensely from lnd's improved ability to filter gossip traffic for the latest information and clear large backlogs received from their peers.

The number of items in the cache is configurable using the --caches.reject-cache-size flag. The default value of 50,000 comfortably fits all known channels in the reject cache, requiring 1.2MB.

Additionally, we now maintain a separate channel cache, which contains in-memory copies of ChannelAnnouncements, ChannelUpdates, and NodeAnnouncements for a given channel. This cache is used to satisfy queries in hot paths of our peers’ gossip queries, allow us to serve more responses from memory and perform fewer database reads and allocations in deserialization.

The size of the channel cache is also configurable via the --caches.chan-cache-size flag. The default value of 20,000 stores about half of all known channels in memory and constitutes about 40MB.

Graceful Shutdown via SIGTERM

It was discovered that prior versions of lnd didn’t attempt to catch the SIGTERM signal to execute a graceful shutdown. When possible, users should prefer to shutdown lnd gracefully via either SIGTERM or SIGINT to ensure the database is closed and any outstanding transactions committed in order to avoid database corruption. Commonly used process management systems such as Docker or systemd typically send SIGTERM, then wait for a period of time to allow the process to respond before forcefully killing the process. Before this release, lnd would always be forcefully killed by these platforms, rendering it unable to properly execute a graceful shutdown.

This new release of lnd will now properly catch these signals to ensure that we’re more likely to be able to execute a graceful shutdown. We believe that many reports of partial database corruption typically reported by those running on Raspberry Pi’s should be addressed by this change.

Static Channel Backups

In this release, we’ve implemented a new safe scheme for static channel backups (SCB's) for lnd. We say safe, as care has been taken to ensure that there are no foot guns in this method of backing up channels, vs doing things like rsyncing or copying the channel.db file periodically. Those methods can be dangerous as one never knows if they have the latest state of a channel or not. Instead, we aim to provide a simple safe instead to allow users to recover the settled funds in their channels in the case of partial or complete data loss. The backups themselves are encrypted using a key derived from the user's seed, this way we protect the privacy of the users channels in the back up state, and ensure that a random node can't attempt to import another user's channels. WIth this backup file, given their seed and the latest back up file, the user will be able to recover both their on-chain funds, and also funds that are fully settled within their channels. By "fully settled" we mean funds that are in the base commitment outputs, and not HTLCs. We can only restore these funds as right after the channel is created, we have all the data required to make a backup.

We call these “static” backups, as they only need to be obtained once for a given channel and are valid until the channel has been closed. One can view this backup as a final method of recovery in the case of total data loss. It’s important to note that during recovery the channels must be closed in order to recover the funds fully. This set up ensures that there’s no way to incorrectly uses an SCB that would result in broadcast of a revoked commitment state. Recovery documentation for both on-chain and off-chain coins can be found here.

Backup + Recovery Methods

The SCB feature exposes multiple safe ways to backup and recover a channel. We expect only one of them to be used primarily by unsophisticated end users, but have provided other mechanisms for more advanced users and business that already script lnd via the gRPC system.

First, the easiest method for backup+recovery. lnd now will maintain a channels.backup file in the same location that we store all the other files. Users will at any time be able to safely copy and backup this file. Each time a channel is opened or closed, lnd will update this file with the latest channel state. Users can use scripts to detect changes to the file, and upload them to their backup location. Something like fsnotify can notify a script each time the file changes to be backed up once again. The file is encrypted using an AEAD scheme, so it can safely be stored plainly in cloud storage, your SD card, etc. The file uses a special format and can be used to import via any of the recovery methods described below.

The second mechanism is via the new SubscribeChanBackups steaming gRPC method. Each time an channel is opened or closed, you'll get a new notification with all the chanbackup.Single files (described below), and a single chanbackup.Multi that contains all the information for all channels.

Finally, users are able to request a backup of a single channel, or all the channels via the cli and RPC methods. Here's an example, of a few ways users can obtain backups:

⛰ lncli --network=simnet exportchanbackup --chan_point=29be6d259dc71ebdf0a3a0e83b240eda78f9023d8aeaae13c89250c7e59467d5:0
{
    "chan_point": "29be6d259dc71ebdf0a3a0e83b240eda78f9023d8aeaae13c89250c7e59467d5:0",
    "chan_backup": "02e7b423c8cf11038354732e9696caff9d5ac9720440f70a50ca2b9fcef5d873c8e64d53bdadfe208a86c96c7f31dc4eb370a02631bb02dce6611c435753a0c1f86c9f5b99006457f0dc7ee4a1c19e0d31a1036941d65717a50136c877d66ec80bb8f3e67cee8d9a5cb3f4081c3817cd830a8d0cf851c1f1e03fee35d790e42d98df5b24e07e6d9d9a46a16352e9b44ad412571c903a532017a5bc1ffe1369c123e1e17e1e4d52cc32329aa205d73d57f846389a6e446f612eeb2dcc346e4590f59a4c533f216ee44f09c1d2298b7d6c"
}

⛰ lncli --network=simnet exportchanbackup --all
{
    "chan_points": [
        "29be6d259dc71ebdf0a3a0e83b240eda78f9023d8aeaae13c89250c7e59467d5:0"
    ],
    "multi_chan_backup": "fd73e992e5133aa085c8e45548e0189c411c8cfe42e902b0ee2dec528a18fb472c3375447868ffced0d4812125e4361d667b7e6a18b2357643e09bbe7e9110c6b28d74f4f55e7c29e92419b52509e5c367cf2d977b670a2ff7560f5fe24021d246abe30542e6c6e3aa52f903453c3a2389af918249dbdb5f1199aaecf4931c0366592165b10bdd58eaf706d6df02a39d9323a0c65260ffcc84776f2705e4942d89e4dbefa11c693027002c35582d56e295dcf74d27e90873699657337696b32c05c8014911a7ec8eb03bdbe526fe658be8abdf50ab12c4fec9ddeefc489cf817721c8e541d28fbe71e32137b5ea066a9f4e19814deedeb360def90eff2965570aab5fedd0ebfcd783ce3289360953680ac084b2e988c9cbd0912da400861467d7bb5ad4b42a95c2d541653e805cbfc84da401baf096fba43300358421ae1b43fd25f3289c8c73489977592f75bc9f73781f41718a752ab325b70c8eb2011c5d979f6efc7a76e16492566e43d94dbd42698eb06ff8ad4fd3f2baabafded"
}

⛰ lncli --network=simnet exportchanbackup --all --output_file=channels.backup

⛰ ll channels.backup
-rw-r--r--  1 roasbeef  staff   381B Dec  9 18:16 channels.backup

SCBs can be viewed as a last ditch method for recovering funds from channels due to total data loss. In future releases, we plan to implement methods that require more sophistication with respect to operational architecture, yet allow for dynamic backups. Even with these dynamic backups in place, SCBs will still serve as a fallback method if a dynamic back up may be known to be out of date, or in a partial state of consistency.

Future protocol changes will make the SCB recovery method more robust, as it will no longer rely on the remote peer to send the normal channel reestablishment handshake upon reconnection. Instead, given the SCB, lnd will be able to find the closing output directly on the chain after a force close by the remote party.

For further details w.r.t the lower level implementation of SCBs as well as the new RPC calls, users can check out the new recovery.md file which goes over methods to recover both on-chain and off-chain funds from lnd.

New Channel Status Manager

Within the protocol, nodes can mark a channel as enabled or disabled. A dsiable channel signals to other nodes that the channel isn’t to be used for routing for whatever reason. This allows clients to void these channels during path finding, and also lets routing nodes signal any faults in a channel to other nodes allowing them to ignore them and possibly remove them from their graph view. lnd has a system to automatically detect when a channel has been inactive for too long, and disable it, signalling to other peers that they can ignore it when routing. The system will also eventually re-enable a channel if it has been stable for long enough.

The prior version of sub-system had a number of flaws which would cause channels to be excessively enabled/disabled, causing ChannelUpdate spam in the network. In this release, this system has been revamped, resulting in a much more conservative, stable channel status manager. We’ll now only disable channels programmatically, and channels will only be re-enabled once the peer is stable for a long enough period of time. This period of time is now configurable.

Server and P2P Improvements

The max reconnection back off interval is now configurable. We cap this value by default to ensure we don’t wait an eternity before attempting to reconnect to a peer. However, on laptops and mobile platforms, users may want to value to be much lower to ensure they maintain connectivity in the face of roaming, or wi-ifi drops. The new field is: --maxbackoff=. A new complementary --minbackoff field has also been added.

We’ll now attempt to retry when faced with a write timeout rather than disconnect the peer immediately. This serves to generally make peer connections more stable to/from lnd.

Users operating larger lnd nodes may find that at times restarts can be rather load heavy due to the rapid burst of potentially hundreds of new p2p connections. In this new version of lnd, we’ve added a new flag (--stagger-initial-reconnect) to space out these connection attempts by several seconds, rather than trying to establish all the connections at once on start up.

Performance Enhancements

Outgoing Message Queue Prioritization

A new distinct queue of gossip messages has been added to the outgoing write queue system within lnd. We’ll now maintain two distinct queues: messages for gossip message, and everything else. Upon reconnection, certain messages are time sensitive such as sending the Channel Reestablishment message which causes a channel to shift from active to inactive. This queue optimization also means that making new channels, or updating existing channels will no longer be blocked by any outgoing gossip traffic, improving the quality of service.

Batched Pre-Image Writing in the HTLCSwitch

This new release will now batch writes for witnesses discovered in HTLC forwarding. At the same time, we correct a nuanced consistency issue related to a lack of synchronization with the channel state machine. Naively, forcing the individual preimage writes to be synchronized with the link incurs a heavy performance penalty (about 80% in profiling). Batching these allows us to minimize the number of db transactions required to write the preimages, allowing us to reinsert the batched write into the link's critical path and resolve the possible inconsistency. In fact, the benchmarks actually showed a slight performance improvement, even with the extra write in the critical path.

Unified Global SigPool

lnd uses a pool of goroutines that are tasked with signing and validating commitment and HTLC signatures for new channel updates. This pool allows us to process these commitment updates in parallel, rather than in a serial manner which would reduce payment throughput. [Rather than using a single SigPool per channel, we now use a single global SigPool](#2329_. With this change, we ensure that as the number of channels grows, the number of goroutines idling in the sigPool stays constant. It's the case that currently in the daemon, most channels are likely inactive, with only a handful actually consistently carrying out channel updates. As a result, this change should reduce the amount of idle CPU usage, as we have less active goroutines in select loops.

Read and Write Buffer Pools

In this release, we implement a write buffer pool for LN peers. Previously, each peer object would embed a 65KB byte array, which is used to serialize messages before writing them to the wire. As a result, every new peer causes a large memory allocation, which places unnecessary burden on the garbage collector when faced with short-lived or flapping peers. We’ll now use a buffer pool, that dynamically grows and shrinks based on the demand for write buffers corresponding to active peers. This greatly helps when there is a high level of churn in peer activity, or even if there is a single one flapping peer.

Similarly, whenever a new peer would connect, we would allocate a 65KB+16 byte array to use as a read buffer for each connection object. The read buffer stores the ciphertext and MAC read from the wire, and used to decrypt and then decode messages from the peer. Because the read buffer is implemented at the connection-level, as opposed to the peer-level like write buffers, simply opening a TCP connection would cause this allocation. Therefore peers that send no messages, or do not complete the handshake, will add to this memory overhead even if they are released promptly. To avoid this, we now use a similar read buffer pool to tend towards a steady working set of read buffers which drastically reduces memory usage.

Finally, we introduce a set of read/write worker pools, which are responsible for scheduling access to the read/write buffers in the underlying buffer pools. With the read and write pools, we modify the memory requirements to be at most linear in the number of specified workers. More importantly, these changes completely decouple read and write buffer allocations from the peer/connection lifecycle, allowing lnd to tolerate flapping peers with minimal overhead.

Nodes that have a large number of peers will see the most drastic benefit. In testing, we were able to create stable connections (w/o gossip queries) to over 900 unique nodes, all while keeping lnd's total memory allocations due to read/write buffers under 15 MB. This configuration could have easily connected to more nodes, though that was all that reachable via the bootstrapper.
This same test would have used between 90...

Assets 26