-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic deterministic backups #188
Conversation
This reverts commit 853ee81.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most are slight improvements, still looking into the important parts
@@ -97,6 +99,7 @@ services: | |||
DOCKER_COMPOSE_DIRECTORY: $PWD | |||
DEVICE_HOST: ${DEVICE_HOST:-http://umbrel.local} | |||
MIDDLEWARE_API_URL: "http://10.11.2.2" | |||
UMBREL_SEED_FILE: "/db/umbrel-seed/seed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beyond the scope of this PR, but I've been experimenting with a setup where:
- secrets are stored encrypted at rest,
- decrypted into
/run/umbrel/
(which by default is atmpfs
), umbrel-startup.service
would haveDirectoryNotEmpty=
added, so that- magic starts when secrets are unleashed 🧙🏻♂️.
Even if the decryption key initially lives plaintext in storage, it makes it possible to transition easier later (to ex. smartphone decryption, or even secrets being actually sent from smartphone to device through sth like: device boots and send notification to paired phone, user confirms, and secrets are transferred from 📱 to tmpfs in RAM of the device), and it prevents some more naive secrets extraction approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds interesting.
We've already had some internal discussion on how to transition to alternative entropy sources in the future.
For example this PR is using umbrel-seed
that is persisted on disk for non-interactive use-cases. e.g automated backups without requiring user interaction.
There could be another root seed, umbrel-interactive-seed
that requires umbrel-seed
+ user password + usecase indentifier
for derivation. This would never be stored on disk. So each time an app requires it, the user is prompted for their password to derive the entropy for that specific use case, the entropy is in memory while it's being used, and then destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example this PR is using umbrel-seed that is persisted on disk for non-interactive use-cases. e.g automated backups without requiring user interaction.
If the need to encrypt is the only secret needed for backup using asymmetric crypto can solve it altogether (totally not shilling age
here again).
There could be another root seed, umbrel-interactive-seed that requires umbrel-seed + user password + usecase indentifier for derivation. This would never be stored on disk. So each time an app requires it, the user is prompted for their password to derive the entropy for that specific use case, the entropy is in memory while it's being used, and then destroyed.
Not sure if you caught that contents of /run/
are in RAM only already 🤔. So that all decrypted stuff is always ephemeral. What can also be done is to have said tmpfs
encrypted, so even if RBP gets deep-frozen extracting secrets from RAM becomes less easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Age was actually what I initially planned to use for this, but due to time constraints just went with symmetric PGP encryption cos it's super simple to use and I'm very familiar with it.
We can completely change the algorithm/keys used for backups at any point in the future without too much trouble btw.
@@ -0,0 +1,5 @@ | |||
#!/usr/bin/env bash | |||
|
|||
UMBREL_ROOT="$(readlink -f $(dirname "${BASH_SOURCE[0]}")/../..)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That really should be a global env var 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely 😅
There are lots of common bash helpers/variables that should be sourced from a single location. I'm planning to clean this up after we've got some higher priority features shipped.
|
||
check_dependencies () { | ||
for cmd in "$@"; do | ||
if ! command -v "$cmd" >/dev/null 2>&1; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] but maybe if ! test -x "$(command -v "$cmd")"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll address this as part of #188 (comment)
|
||
# Deterministically derives 128 bits of cryptographically secure entropy | ||
derive_entropy () { | ||
identifier="${1}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still hate putting so much mustache {
on a simple $1
😝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the light, it's the future.
|
||
echo "Deriving keys..." | ||
|
||
backup_id=$(derive_entropy "umbrel_backup_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, and for later, but I feel these should be exported, and referenced somewhere instead hidden in a script, as they are really important!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should definitely be documenting all the deterministic derivation schemes somewhere.
I also think we should put up some simple web app that takes a seed and derives all the resulting keys for recovery purposes. People could run it offline, it serves as a simple reference for all the derivation schemes, and we should add all schemes to it and never remove the old schemes, even after they're removed from Umbrel.
|
||
mkdir -p "${BACKUP_ROOT}" | ||
|
||
cp --archive "${UMBREL_ROOT}/lnd/data/chain/bitcoin/${BITCOIN_NETWORK}/channel.backup" "${BACKUP_ROOT}/channel.backup" |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the file we're watching for changes and it's explicitly designed to be safe to copy:
On-Disk channel.backup
There are multiple ways of obtaining SCBs from lnd. The most commonly used method will likely be via the channels.backup file that's stored on-disk alongside the rest of the chain data. This is a special file that contains SCB entries for all currently open channels. Each time a channel is opened or closed, this file is updated on disk in a safe manner (atomic file rename). As a result, unlike the channel.db file, it's always safe to copy this file for backup at ones desired location.
https://github.com/lightningnetwork/lnd/blob/master/docs/recovery.md#on-disk-channelbackup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL! thank you
|
||
# Up to 10KB of random binary data | ||
padding="$(shuf -i 0-10240 -n 1)" | ||
dd if=/dev/urandom bs="${padding}" count=1 > "${BACKUP_ROOT}/.padding" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or bs="$((RANDOM % 10<<10))"
since it's BASH anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$RANDOM
is not uniformly random, it's a random integer between 0 and 32767.
I know we don't need this to be a cryptographically secure source since it's just adding some noise, but shuf
is uniformly random, widely available, and the ranges are much more readable.
scripts/backup/backup
Outdated
fi | ||
|
||
# We need `sed 's/^.* //'` to trim the "(stdin)= " prefix from some versions of openssl | ||
echo -n "${identifier}" | openssl dgst -sha256 -hmac "${umbrel_seed}" | sed 's/^.* //' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why seed
as hmac
and identifier
as message
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean why those input parameters or why are they that way round?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are they that way round
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because the official parameters are secret
and message
. The identifier is not secret, it's just a text string that's unique to that use case to result in a unique seed. It's not really a "message" but it's closer to being a message than a secret.
The seed is definitely not a message, and it is secret, so seemed more sensible to set it as the secret
parameter.
But it's an HMAC not just a hash, so it's resistant to length extension attacks, so really I think either way would work just as well.
Had to pick one way and this seemed like the most logical to me.
Do you think it makes more sense the other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @lukechilds! Amazing execution and speed, man.
Ran into an edge case where if the decoy-trigger
runs before the channel.backup
even exists, the cp
command fails and the lock file isn't deleted, thus preventing all future backups.
No worries, I'll make another PR (#193). We can merge this for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really interesting PR. I learned a few things for sure from reading through the code changes and comments 🙏🏽
Does a user have to actively opt out of this feature or is there a manual step to opt in to using this feature? |
Switching to BlueWallet-based ElectrumX connector which is a maintained fork of old ElectrumX connection code. Seems that some extra effort has been put into reconnecting and this also drops some manual error handling that my old code incorporated.
Replaces #165
This utilises the new Umbrel seed from getumbrel/umbrel-manager#39 to derive a deterministic backup id and encryption key from the user's mnemonic for automated remote backups of static channel backups and user settings.
The backups are encrypted client side before being uploaded over Tor and are padded with random data. Backups are made immediately as soon as any relevant data has changed such as user settings or channel open/close. However, Umbrel will also make decoy backups at random intervals.
These features combined ensure that the backup server doesn't learn any sensitive information about the Umbrel.
Due to the key/id being deterministically derived from the Umbrel seed, all that's needed to fully recover an Umbrel is the mnemonic seed phrase. Upon recovery the device can automatically regenerate the same backup id/encryption key, request the latest backup from the backup server, decrypt it, and restore the user's settings and Lightning network channel data.