Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic deterministic backups #188

Merged
merged 47 commits into from
Aug 30, 2020
Merged

Automatic deterministic backups #188

merged 47 commits into from
Aug 30, 2020

Conversation

lukechilds
Copy link
Member

@lukechilds lukechilds commented Aug 29, 2020

Replaces #165

This utilises the new Umbrel seed from getumbrel/umbrel-manager#39 to derive a deterministic backup id and encryption key from the user's mnemonic for automated remote backups of static channel backups and user settings.

The backups are encrypted client side before being uploaded over Tor and are padded with random data. Backups are made immediately as soon as any relevant data has changed such as user settings or channel open/close. However, Umbrel will also make decoy backups at random intervals.

These features combined ensure that the backup server doesn't learn any sensitive information about the Umbrel.

  • The IP is hidden due to Tor.
  • The Umbrel's channel data and settings are encrypted client side with a key only known to the Umbrel device.
  • Random interval decoy backups ensure the server can't correlate backup activity with channel open/close activity on the Lightning network and correlate a backup ID with a channel pubkey.
  • Random padding obscures if the backup size has increased/decreased or remains unchanged due to it being a decoy.

Due to the key/id being deterministically derived from the Umbrel seed, all that's needed to fully recover an Umbrel is the mnemonic seed phrase. Upon recovery the device can automatically regenerate the same backup id/encryption key, request the latest backup from the backup server, decrypt it, and restore the user's settings and Lightning network channel data.

@lukechilds lukechilds changed the title Automated Backups Automatic deterministic backups Aug 29, 2020
@getumbrel getumbrel deleted a comment from nolim1t Aug 30, 2020
meeDamian
meeDamian previously approved these changes Aug 30, 2020
Copy link

@meeDamian meeDamian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most are slight improvements, still looking into the important parts

docker-compose.yml Outdated Show resolved Hide resolved
@@ -97,6 +99,7 @@ services:
DOCKER_COMPOSE_DIRECTORY: $PWD
DEVICE_HOST: ${DEVICE_HOST:-http://umbrel.local}
MIDDLEWARE_API_URL: "http://10.11.2.2"
UMBREL_SEED_FILE: "/db/umbrel-seed/seed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's beyond the scope of this PR, but I've been experimenting with a setup where:

  • secrets are stored encrypted at rest,
  • decrypted into /run/umbrel/ (which by default is a tmpfs),
  • umbrel-startup.service would have DirectoryNotEmpty= added, so that
  • magic starts when secrets are unleashed 🧙🏻‍♂️.

Even if the decryption key initially lives plaintext in storage, it makes it possible to transition easier later (to ex. smartphone decryption, or even secrets being actually sent from smartphone to device through sth like: device boots and send notification to paired phone, user confirms, and secrets are transferred from 📱 to tmpfs in RAM of the device), and it prevents some more naive secrets extraction approaches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds interesting.

We've already had some internal discussion on how to transition to alternative entropy sources in the future.

For example this PR is using umbrel-seed that is persisted on disk for non-interactive use-cases. e.g automated backups without requiring user interaction.

There could be another root seed, umbrel-interactive-seed that requires umbrel-seed + user password + usecase indentifier for derivation. This would never be stored on disk. So each time an app requires it, the user is prompted for their password to derive the entropy for that specific use case, the entropy is in memory while it's being used, and then destroyed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example this PR is using umbrel-seed that is persisted on disk for non-interactive use-cases. e.g automated backups without requiring user interaction.

If the need to encrypt is the only secret needed for backup using asymmetric crypto can solve it altogether (totally not shilling age here again).

There could be another root seed, umbrel-interactive-seed that requires umbrel-seed + user password + usecase indentifier for derivation. This would never be stored on disk. So each time an app requires it, the user is prompted for their password to derive the entropy for that specific use case, the entropy is in memory while it's being used, and then destroyed.

Not sure if you caught that contents of /run/ are in RAM only already 🤔. So that all decrypted stuff is always ephemeral. What can also be done is to have said tmpfs encrypted, so even if RBP gets deep-frozen extracting secrets from RAM becomes less easy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age was actually what I initially planned to use for this, but due to time constraints just went with symmetric PGP encryption cos it's super simple to use and I'm very familiar with it.

We can completely change the algorithm/keys used for backups at any point in the future without too much trouble btw.

@@ -0,0 +1,5 @@
#!/usr/bin/env bash

UMBREL_ROOT="$(readlink -f $(dirname "${BASH_SOURCE[0]}")/../..)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That really should be a global env var 😅

Copy link
Member Author

@lukechilds lukechilds Aug 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely 😅

There are lots of common bash helpers/variables that should be sourced from a single location. I'm planning to clean this up after we've got some higher priority features shipped.


check_dependencies () {
for cmd in "$@"; do
if ! command -v "$cmd" >/dev/null 2>&1; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] but maybe if ! test -x "$(command -v "$cmd")"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll address this as part of #188 (comment)


# Deterministically derives 128 bits of cryptographically secure entropy
derive_entropy () {
identifier="${1}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still hate putting so much mustache { on a simple $1 😝

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the light, it's the future.


echo "Deriving keys..."

backup_id=$(derive_entropy "umbrel_backup_id")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, and for later, but I feel these should be exported, and referenced somewhere instead hidden in a script, as they are really important!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely be documenting all the deterministic derivation schemes somewhere.

I also think we should put up some simple web app that takes a seed and derives all the resulting keys for recovery purposes. People could run it offline, it serves as a simple reference for all the derivation schemes, and we should add all schemes to it and never remove the old schemes, even after they're removed from Umbrel.

scripts/backup/backup Outdated Show resolved Hide resolved

mkdir -p "${BACKUP_ROOT}"

cp --archive "${UMBREL_ROOT}/lnd/data/chain/bitcoin/${BITCOIN_NETWORK}/channel.backup" "${BACKUP_ROOT}/channel.backup"

This comment was marked as outdated.

Copy link
Member Author

@lukechilds lukechilds Aug 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the file we're watching for changes and it's explicitly designed to be safe to copy:

On-Disk channel.backup
There are multiple ways of obtaining SCBs from lnd. The most commonly used method will likely be via the channels.backup file that's stored on-disk alongside the rest of the chain data. This is a special file that contains SCB entries for all currently open channels. Each time a channel is opened or closed, this file is updated on disk in a safe manner (atomic file rename). As a result, unlike the channel.db file, it's always safe to copy this file for backup at ones desired location.

https://github.com/lightningnetwork/lnd/blob/master/docs/recovery.md#on-disk-channelbackup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! thank you

scripts/backup/backup Show resolved Hide resolved

# Up to 10KB of random binary data
padding="$(shuf -i 0-10240 -n 1)"
dd if=/dev/urandom bs="${padding}" count=1 > "${BACKUP_ROOT}/.padding"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or bs="$((RANDOM % 10<<10))" since it's BASH anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$RANDOM is not uniformly random, it's a random integer between 0 and 32767.

I know we don't need this to be a cryptographically secure source since it's just adding some noise, but shuf is uniformly random, widely available, and the ranges are much more readable.

fi

# We need `sed 's/^.* //'` to trim the "(stdin)= " prefix from some versions of openssl
echo -n "${identifier}" | openssl dgst -sha256 -hmac "${umbrel_seed}" | sed 's/^.* //'
Copy link

@meeDamian meeDamian Aug 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why seed as hmac and identifier as message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean why those input parameters or why are they that way round?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they that way round

Copy link
Member Author

@lukechilds lukechilds Aug 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because the official parameters are secret and message. The identifier is not secret, it's just a text string that's unique to that use case to result in a unique seed. It's not really a "message" but it's closer to being a message than a secret.

The seed is definitely not a message, and it is secret, so seemed more sensible to set it as the secret parameter.

But it's an HMAC not just a hash, so it's resistant to length extension attacks, so really I think either way would work just as well.

Had to pick one way and this seemed like the most logical to me.

Do you think it makes more sense the other way?

@lukechilds lukechilds marked this pull request as ready for review August 30, 2020 14:38
Copy link
Member

@mayankchhabra mayankchhabra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @lukechilds! Amazing execution and speed, man.

Ran into an edge case where if the decoy-trigger runs before the channel.backup even exists, the cp command fails and the lock file isn't deleted, thus preventing all future backups.

No worries, I'll make another PR (#193). We can merge this for now!

@mayankchhabra mayankchhabra merged commit 473380e into getumbrel:master Aug 30, 2020
Copy link
Contributor

@vindard vindard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting PR. I learned a few things for sure from reading through the code changes and comments 🙏🏽

@AndySchroder
Copy link

Does a user have to actively opt out of this feature or is there a manual step to opt in to using this feature?

Ahmed262111 pushed a commit to Ahmed262111/umbrel that referenced this pull request May 27, 2024
Switching to BlueWallet-based ElectrumX connector which is a maintained fork of old ElectrumX connection code. Seems that some extra effort has been put into reconnecting and this also drops some manual error handling that my old code incorporated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants