Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating encrypted child dataset + change-key + incremental receive overwrites master key of replica, causes permission denied on remount #12614

Open
brenc opened this issue Oct 5, 2021 · 21 comments
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@brenc
Copy link

brenc commented Oct 5, 2021

Type Version/Name
Distribution Name several listed below but mainly Debian (Proxmox VE 7)
Distribution Version 11
Kernel Version 5.11.22-4-pve
Architecture x86_64
OpenZFS Version zfs-2.0.5-pve1

This was posted in a comment to #12000. I was asked to open up a new bug report.

Just started using ZoL with native encryption and think I have hit the same or a similar bug (related to #6624 as well).

truncate -s 100M /root/src.img
truncate -s 100M /root/replica.img

zpool create src /root/src.img
zpool create replica /root/replica.img

zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt src/encrypted
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt replica/encrypted

zfs create src/encrypted/a

dd if=/dev/urandom of=/src/encrypted/a/test1.bin bs=1M count=1

zfs snap src/encrypted/a@test1

zfs send -Rvw src/encrypted/a@test1 | zfs receive -svF replica/encrypted/a

zfs mount -l replica/encrypted

zfs mount -l replica/encrypted/a

zfs change-key -i replica/encrypted/a

zfs umount -u replica/encrypted

zfs mount -l replica/encrypted

zfs mount replica/encrypted/a

All good at this point. Everything works as expected. Now, do an incremental send:

dd if=/dev/urandom of=/src/encrypted/a/test2.bin bs=1M count=1

zfs snap src/encrypted/a@test2

zfs send -RvwI @test1 src/encrypted/a@test2 | zfs receive -svF replica/encrypted/a

# ls -al /replica/encrypted/a/
total 2056
drwxr-xr-x 2 root root       4 Sep 26 03:59 .
drwxr-xr-x 3 root root       3 Sep 26 03:57 ..
-rw-r--r-- 1 root root 1048576 Sep 26 03:55 test1.bin
-rw-r--r-- 1 root root 1048576 Sep 26 03:59 test2.bin

Again, all good. Now unmount/mount:

zfs umount -u replica/encrypted

zfs mount -l replica/encrypted

# zfs get encryptionroot,keystatus -rt filesystem replica/encrypted
NAME                 PROPERTY        VALUE              SOURCE
replica/encrypted    encryptionroot  replica/encrypted  -
replica/encrypted    keystatus       available          -
replica/encrypted/a  encryptionroot  replica/encrypted  -
replica/encrypted/a  keystatus       available          -

# zfs mount -l replica/encrypted/a
cannot mount 'replica/encrypted/a': Permission denied

Yikes! This appears to have corrupted 10TB of backup filesystems. I've been trying to recover from this but no luck so far.

If I don't run change-key then I can send incrementals, unmount, and mount no problem (I just have to enter the password in twice). If I run change-key then unmount/mount still no problem. It's when I run change-key and then send an incremental snapshot that seems to render the filesystem unmountable.

After running change-key and sending an incremental, once the filesystem is unmounted it can't be mounted again. It looks like the encryption root absolutely has to be replicated to prevent this from happening. If I replicate the encryption root then everything works as expected.

I may have also uncovered another bug in trying to recover from this. If I run zfs change-key -o keylocation=prompt -o keyformat=passphrase replica/encrypted/a, after entering the new passwords the command hangs forever due to a panic. I have to completely reset the system.

[ 7080.228309] VERIFY3(0 == spa_keystore_dsl_key_hold_dd(dp->dp_spa, dd, FTAG, &dck)) failed (0 == 13)
[ 7080.228369] PANIC at dsl_crypt.c:1450:spa_keystore_change_key_sync_impl()
[ 7080.228399] Showing stack for process 1120
[ 7080.228403] CPU: 2 PID: 1120 Comm: txg_sync Tainted: P           O      5.11.0-36-generic #40-Ubuntu
[ 7080.228406] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[ 7080.228408] Call Trace:
[ 7080.228414]  show_stack+0x52/0x58
[ 7080.228424]  dump_stack+0x70/0x8b
[ 7080.228431]  spl_dumpstack+0x29/0x2b [spl]
[ 7080.228448]  spl_panic+0xd4/0xfc [spl]
[ 7080.228459]  ? dsl_wrapping_key_rele.constprop.0+0x12/0x20 [zfs]
[ 7080.228597]  ? spa_keystore_dsl_key_hold_dd+0x1a8/0x200 [zfs]
[ 7080.228687]  spa_keystore_change_key_sync_impl+0x3c0/0x3d0 [zfs]
[ 7080.228776]  ? zap_lookup+0x16/0x20 [zfs]
[ 7080.228899]  spa_keystore_change_key_sync+0x157/0x3c0 [zfs]
[ 7080.228988]  ? dmu_buf_rele+0xe/0x10 [zfs]
[ 7080.229064]  ? dsl_dir_rele+0x30/0x40 [zfs]
[ 7080.229189]  ? spa_keystore_change_key_check+0x178/0x4f0 [zfs]
[ 7080.229324]  dsl_sync_task_sync+0xb5/0x100 [zfs]
[ 7080.229418]  dsl_pool_sync+0x365/0x3f0 [zfs]
[ 7080.229507]  spa_sync_iterate_to_convergence+0xe0/0x1e0 [zfs]
[ 7080.229609]  spa_sync+0x305/0x5b0 [zfs]
[ 7080.229718]  txg_sync_thread+0x26c/0x2f0 [zfs]
[ 7080.229835]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[ 7080.229952]  thread_generic_wrapper+0x79/0x90 [spl]
[ 7080.229963]  kthread+0x11f/0x140
[ 7080.229970]  ? __thread_exit+0x20/0x20 [spl]
[ 7080.229980]  ? set_kthread_struct+0x50/0x50
[ 7080.229984]  ret_from_fork+0x22/0x30

I've tested this on (all x64_64):

  • Proxmox VE 7 (Debian Bullseye with zfs-2.0.5-pve1, 5.11.22-4-pve)
  • Stock Debian Bullseye (zfs-2.0.3-9, 5.10.0-8-amd64)
  • Stock Ubuntu 20.04 LTS (zfs-2.0.2-1ubuntu5.1, 5.11.0-36-generic)
  • FreeBSD 13.0-RELEASE-p4 (zfs-2.0.0-FreeBSD_gf11b09dec)
@AttilaFueloep
Copy link
Contributor

AttilaFueloep commented Oct 6, 2021

I can reproduce this issue on a somewhat current master. In short the problem is that the incremental receive overwrites the master key of the replica/encrypted/a dataset with the received key which is encrypted with the wrapping key form src/encrypted. Since the unencrypted master key is cached in memory this goes unnoticed until it is unloaded by the unmount. A subsequent mount tries to decrypt the master key with the wrapping key from replica/encrypted which obviously fails.

Please see #12000 (comment) for the terminology used above.

For incremental receives we need to detect if the encryption root on the receiving side changed since the last receive and refuse to receive if so. This would break replication from this point on but keep the existing data intact. Not sure how to accomplish this though but I'll have a look.

@behlendorf behlendorf added the Component: Encryption "native encryption" feature label Oct 6, 2021
@robszy
Copy link

robszy commented Oct 13, 2021

For incremental receives i think there should be no need of updating/changing wrapped master key as it is done now (wrapped key hasn't changed in source in example above but is updated in replica ? ) so there won't be problem with decrypting replica. Master key is one so we should be able to decrypt whether enc root is inherited or not like in changing key only case

@endotronic
Copy link

Hey all, I hit this issue today. I was trying to send incremental snapshots to my backup pool

zfs send -Rw -I snap1 snap2 | zfs recv -dsu -x dedup -x compression receiver

After this, I could not mount. Since this is my backup pool, I tried to fix it by changing the key, and that gave me the panic and stack trace mentioned in this issue. I can't tell if my issue is #12000 or not seeing how I got into this state.

While I'm in this state, I'm happy to provide anything that helps resolve the issue.

Also, if someone with a better understanding of ZFS encryption can help me resolve the issue, I'd really appreciate it. I reeeaaaally don't want to have to rebuild my backup pool if I can help it. I can't even say for sure right now that I haven't lost data, though I don't think I have, luckily.

My setup is something like:
pool <- encryption root
pool/fs1 <- inherits encryption key
pool/fs2 <- inherits encryption key

I'm pretty sure I got into this mess by doing a zfs recv pool which I hoped would be one command that would receive all the descendent filesystems. It didn't actually even succeed, as I didn't realize it, but that root filesystem didn't have the snapshot I thought it had. Still, it seems that it changed the encryption key or IVs? This data all came from my main pool, so is zfs send/recv don't decrypt over the wire and re-encrypt on the receiver, than maybe I can recover these from the main pool. If data is decrypted in flight and re-encrypted on the receiver, then I am not sure how to recover.

@endotronic
Copy link

Ah, I think I understand the situation from @AttilaFueloep's comment in the other issue:

to recover the replica/encrypted/a dataset one would need a custom binary which decrypts the master key with the wrapping key from src/encrypted and reencrypts it with the replica/encrypted wrapping key.

Has anyone built such a tool? Is the master key accessible via zfs get/set or similar?

The plaintext from which the key was derived is the same on both of my systems, but I'm guessing due to salt or IV that the generated key must be different, or the problem does not make sense to me. A fix feels so close, yet so far right now...

@brenc
Copy link
Author

brenc commented Apr 8, 2022

I reeeaaaally don't want to have to rebuild my backup pool if I can help it

@endotronic I tried everything to rescue my 10TB of backup data but did not succeed. I had to start over.

@endotronic
Copy link

@brenc I really appreciate the response here! Real bummer. I'm going to learn from your efforts (thank you so much) and just rebuild my backup pool then. I'm so glad I didn't actually lose anything.

@rincebrain
Copy link
Contributor

rincebrain commented Jan 4, 2023

I have a really nasty workaround that prevents this that I'd like to refine further.

(This specific case, the problem is that it resets the pbkdf2salt, and only the pbkdf2salt, on the child, while it's still the same encryptionroot, and without actually rewrapping the key.)

(It also should be trivial to work around, I think, if someone is burned by this...I have another hacky branch that would allow that.)

@marshalleq
Copy link

marshalleq commented Oct 23, 2023

So in summary, we can't trust native (raw) send / receive with ZFS encryption. And an open issue for two years so far. Lovely.

@rincebrain rincebrain added the Component: Send/Recv "zfs send/recv" feature label Oct 23, 2023
@endotronic
Copy link

I haven't had any issues sending and receiving encrypted datasets as long as I don't try to do it recursively (zfs send -R). So in my opinion, this is a way to massively mess things up, but if you know about the issue and avoid -R, you'll be fine.

@digitalsignalperson
Copy link

I haven't had any issues sending and receiving encrypted datasets as long as I don't try to do it recursively (zfs send -R). So in my opinion, this is a way to massively mess things up, but if you know about the issue and avoid -R, you'll be fine.

I documented some testing with -R raw sends back and forth here and didn't encounter any issues #12123 (comment)

@systemofapwne
Copy link

systemofapwne commented Jan 3, 2024

I just fell for the same trap. Luckily, I noticed this quite quickly. My backup of a 2.4 TB dataset was just unmountable. I wonder, if my zvols (another TB) suffer from the same problem. They look fine so far. Will try to mount them soon to see, if they are readable or utterly garbage [Edit] The zvols are also affected. Full backup is completely garbage now.

I really wonder, why this is not yet solved over all these years. Sounds like a major breaking bug.

@digitalsignalperson
Copy link

digitalsignalperson commented Jan 4, 2024

@systemofapwne I'm planning an encrypted zfs system and am trying to figure out if I will run into this problem. My testing so far did not result in any issues (per my last comment, and more details in this FR: #15687).

Did you do a zfs change-key -i simply after receiving a new dataset from one identical encryption root to another? Or something else? Also was it a replication steam with -R?

One mitigation I'm considering is to create all datasets in advance (E.g. 100-1000) so I never have to add new ones and thus never need to zfs change-key -i when receiving them on a replica system.

@systemofapwne
Copy link

@systemofapwne I'm planning an encrypted zfs system and am trying to figure out if I will run into this problem. My testing so far did not result in any issues (per my last comment, and more details in this FR: #15687).

Did you do a zfs change-key -i simply after receiving a new dataset from one identical encryption root to another? Or something else? Also was it a replication steam with -R?

First: my zfs version is

zfs-2.2.2-1
zfs-kmod-2.2.2-1

Since I am using TrueNAS and not ZFS directly via CLI, I will try to answer your questions as good as possible.

Tank pools root is encrypted, as are all child datasets/zvols, inheriting from the pools root dataset.

NAME  PROPERTY        VALUE        SOURCE
Tank  encryption      aes-256-gcm  -
Tank  encryptionroot  Tank         -

Backup pools root is encrypted too, but is distinct to the Tank pools (hence: different keying / encryptionroot)

NAME    PROPERTY        VALUE        SOURCE
Backup  encryption      aes-256-gcm  -
Backup  encryptionroot  Backup       -

Child datasets have been replicated from Tank -> Backup. I don't remember, if I had recursive snapsthos on at first, but it is definitely off since first replication went fine and the system went into production.

Those snapshots sent to Backup were encrypted and could be unlocked via the password, that I used on Tank. I then enabled inheritance via the UI (equivalent to zfs change-key -i) on all datasets and zvols on Backup.
The Backup pool and all child datasets/zvols unlocked happily with this. No trouble whatsoever.

I then lockded the Backup pool. Replicated snapshots kept flying in. Today, a few weeks after, I randomly decided to unlock the backup pool and bam: Datasets couldn't be mounted and zvols seem to be unusable too.

One mitigation I'm considering is to create all datasets in advance (E.g. 100-1000) so I never have to add new ones and thus never need to zfs change-key -i when receiving them on a replica system.

Can you elaborate a bit on this?

@digitalsignalperson
Copy link

digitalsignalperson commented Jan 4, 2024

Can you elaborate a bit on this?
@systemofapwne

I'll have maybe 4 systems I want to keep in sync with ZFS replication.

Let rpool/enc be the encryption root on the main server.
Assume there is a bunch of child datasets initially on this main sever.

rpool/enc/project_1
rpool/enc/project_2
rpool/enc/application_x
rpool/enc/application_y

Initially, the other 3 servers do not have this filesystem structure. I'll need to replicate this to the other 3 servers.
I can do

zfs snapshot -r rpool/enc@initial
zfs send -R --raw rpool/enc@initial | ssh sever2 zfs recv rpool/enc
zfs send -R --raw rpool/enc@initial | ssh sever3 zfs recv rpool/enc
zfs send -R --raw rpool/enc@initial | ssh sever4 zfs recv rpool/enc

And now whenever a snapshot occurs on any of my datasets, my replication can push incremental snapshot to the severs
e.g. say I've been taking snapshots every day on the project_1 dataset. The update sends would be

zfs snapshot rpool/enc/project_1@2024-01-03
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever2 zfs recv rpool/enc/project_1
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever3 zfs recv rpool/enc/project_1
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever4 zfs recv rpool/enc/project_1

But now in the future, all of a sudden we have a new project! So on the main server we create rpool/enc/project_3. Now we need to push that to the other servers, but it will break the encryption root on the receive, and require loading the key on the received side and doing zfs change-key -i.

# create the new dataset
zfs create rpool/enc/project_3
zfs snapshot rpool/enc/project_3@new

# send it to all the other servers
zfs send --raw rpool/enc/project_3@new | ssh sever2 zfs recv rpool/enc/project_3
zfs send --raw rpool/enc/project_3@new | ssh sever3 zfs recv rpool/enc/project_3
zfs send --raw rpool/enc/project_3@new | ssh sever4 zfs recv rpool/enc/project_3

# if you check the encyptionroot on the received side (zfs get encryptionroot rpool/enc/project_3)
# it will be rpool/enc/project_3 instead of rpool/enc
# so we need to fix it

ssh server2 zfs load-key rpool/enc/project_3
ssh server2 zfs change-key -i rpool/enc/project_3
ssh server3 zfs load-key rpool/enc/project_3
ssh server3 zfs change-key -i rpool/enc/project_3
ssh server4 zfs load-key rpool/enc/project_3
ssh server4 zfs change-key -i rpool/enc/project_3

So that's where I understand I might use zfs change-key -i and where the possible risk is based on this bug. Especially if the "main" server role is taken over by any of the others, and different servers become the snapshot source.

And aside from any data loss risk, it's extra complexity to have to do these extra operations. And some severs you might not want to ever load-key (that's a major benefit of zfs encryption to have an "untrusted" server that never sees the keys).

So my workaround idea is to create say 1000 datasets.

rpool/enc/0000
rpool/enc/0001
rpool/enc/0002
...
rpool/enc/9997
rpool/enc/9998
rpool/enc/9999

and then the initial zfs snapshot -r rpool/enc@initial and zfs send -R --raw rpool/enc@initial will send all datasets that will ever be created. There will never be a new dataset added, and never need to call zfs change-key -i on the received side.

I would use zfs properties or maybe a privileged text file containing the map of dataset names to actual filesystem names, and use some script to sort out mountpoints. (No need to mount the empty slots).

@digitalsignalperson
Copy link

Sorry for spam @systemofapwne, FYI I accidentally smashed Send midway through typing my last response and had to spend another 5 mins editing it. So if you read it via email it won't make sense.

@systemofapwne
Copy link

Can you elaborate a bit on this?
@systemofapwne

I'll have maybe 4 systems I want to keep in sync with ZFS replication.

Let rpool/enc be the encryption root on the main server. Assume there is a bunch of child datasets initially on this main sever.

rpool/enc/project_1
rpool/enc/project_2
rpool/enc/application_x
rpool/enc/application_y

Initially, the other 3 servers do not have this filesystem structure. I'll need to replicate this to the other 3 servers. I can do

zfs snapshot -r rpool/enc@initial
zfs send -R --raw rpool/enc@initial | ssh sever2 zfs recv rpool/enc
zfs send -R --raw rpool/enc@initial | ssh sever3 zfs recv rpool/enc
zfs send -R --raw rpool/enc@initial | ssh sever4 zfs recv rpool/enc

And now whenever a snapshot occurs on any of my datasets, my replication can push incremental snapshot to the severs e.g. say I've been taking snapshots every day on the project_1 dataset. The update sends would be

zfs snapshot rpool/enc/project_1@2024-01-03
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever2 zfs recv rpool/enc/project_1
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever3 zfs recv rpool/enc/project_1
zfs send --raw -i @2024-01-02 rpool/enc/project_1@2024-01-03 | ssh sever4 zfs recv rpool/enc/project_1

But now in the future, all of a sudden we have a new project! So on the main server we create rpool/enc/project_3. Now we need to push that to the other servers, but it will break the encryption root on the receive, and require loading the key on the received side and doing zfs change-key -i.

# create the new dataset
zfs create rpool/enc/project_3
zfs snapshot rpool/enc/project_3@new

# send it to all the other servers
zfs send --raw rpool/enc/project_3@new | ssh sever2 zfs recv rpool/enc/project_3
zfs send --raw rpool/enc/project_3@new | ssh sever3 zfs recv rpool/enc/project_3
zfs send --raw rpool/enc/project_3@new | ssh sever4 zfs recv rpool/enc/project_3

# if you check the encyptionroot on the received side (zfs get encryptionroot rpool/enc/project_3)
# it will be rpool/enc/project_3 instead of rpool/enc
# so we need to fix it

ssh server2 zfs load-key rpool/enc/project_3
ssh server2 zfs change-key -i rpool/enc/project_3
ssh server3 zfs load-key rpool/enc/project_3
ssh server3 zfs change-key -i rpool/enc/project_3
ssh server4 zfs load-key rpool/enc/project_3
ssh server4 zfs change-key -i rpool/enc/project_3

So that's where I understand I might use zfs change-key -i and where the possible risk is based on this bug. Especially if the "main" server role is taken over by any of the others, and different servers become the snapshot source.

I understand. That is almost my situation but for a subtile difference:

  • rpool on your end seems to be the unencrypted pool root dataset and enc a dataset, that serves as encryption root for children.
  • On my end, the pools root dataset is already enrypted (individually on source and backup target) so I only replicate childdatasets but never the original encryptionroot
  • By then using zfs change-key -i on the received child datasets on my backup target, I switch the encryptionroot of those replicated datasets to the backup pools rootdataset. This works fine, until new snapshots are replicated, that mess up the keys somehow, as it has been demonstrated in OP.

And aside from any data loss risk, it's extra complexity to have to do these extra operations. And some severs you might not want to ever load-key (that's a major benefit of zfs encryption to have an "untrusted" server that never sees the keys).

So my workaround idea is to create say 1000 datasets.

rpool/enc/0000
rpool/enc/0001
rpool/enc/0002
...
rpool/enc/9997
rpool/enc/9998
rpool/enc/9999

and then the initial zfs snapshot -r rpool/enc@initial and zfs send -R --raw rpool/enc@initial will send all datasets that will ever be created. There will never be a new dataset added, and never need to call zfs change-key -i on the received side.

I would use zfs properties or maybe a privileged text file containing the map of dataset names to actual filesystem names, and use some script to sort out mountpoints. (No need to mount the empty slots).

I see, what you suggest here. TBH, I myself don't mind so much, if I had to use zfs change-key -i there on new received datasets, as long as this does not render the datasets unusable as in my case. Yet, I see the inconvenience here.
But for your setup (unencrypted poolroot, dedicated encrypted dataset serving as encryptionroot being replicated), you might not suffer from this bug as I do right now, right?

@digitalsignalperson
Copy link

@systemofapwne thanks for explaining your situation. I re-read the original issue, which I realize is more like your situation.

I think what I missed/confused reading it before was, yes they are using raw sends, but the replica encryption root is not the same (different IV set), and they are doing the zfs change-key -i to the received dataset in this different encryptionroot after receiving the raw send.

I.e. here the src and replica encryption roots are created differently:

zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt src/encrypted
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt replica/encrypted

So I appreciate this is still a valid issue, but I wonder why make the encryption roots different initially, and why not raw send src/encrypted to replica/encrypted so it's the same encryption root? In my testing of the latter way, and sending data back and forth between the src/replica, I did not encounter any unmountable datasets (so far). But it still requires load-key and change-key, as described, to fix the received encryptionroot for new datasets received.

@systemofapwne
Copy link

systemofapwne commented Jan 4, 2024

@systemofapwne thanks for explaining your situation. I re-read the original issue, which I realize is more like your situation.

I think what I missed/confused reading it before was, yes they are using raw sends, but the replica encryption root is not the same (different IV set), and they are doing the zfs change-key -i to the received dataset in this different encryptionroot after receiving the raw send.

I.e. here the src and replica encryption roots are created differently:

zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt src/encrypted
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt replica/encrypted

Exactly!

So I appreciate this is still a valid issue, but I wonder why make the encryption roots different initially, and why not raw send src/encrypted to replica/encrypted so it's the same encryption root? In my testing of the latter way, and sending data back and forth between the src/replica, I did not encounter any unmountable datasets (so far). But it still requires load-key and change-key, as described, to fix the received encryptionroot for new datasets received.

I completely understand the way how you avoid this problem now and why your strategy works. The problem here is, that people without and in-depth insight into ZFS encryption inheritance (and replication messing with it) easily fall into this trap, like I and others did.
Especially when, like for me, the pool's rootdatasets are (individually) encrypted and sending snapshots + making them inherit encryption first seem to work (until they stop working on the next replication): ZFS simply should make sure, that when zfs change-key -i is used, that replications wont overwrite the datasets cryptographic properties when sending incremental snapshots.


My current mitigation strategy now is similar to this one, which basically is, what you stated: Use the same encryptionroot.

Luckily, I can easily change my production pool to this setup and redo a replication to my backups, that should now not break:

My current pool setup that breaks encryption when replication

  Tank              <-- Main Pool: Encrypted root dataset
     zroot          <-- pseudo root dataset. Inherites encryption (encryptionroot is Tank)
       dataset1     <-- inherites enc (encryptionroot is Tank)
       zvol1        <-- inherites enc (encryptionroot is Tank)
       ...

  Backup            <-- Backup Pool: Encrypted root dataset (different keying than 'Tank')
     zroot          <-- pseudo root dataset. Inherites encryption (encryptionroot is Backup)
       dataset1     <-- replicated from Tank. inherites enc. (encryptionroot is Backup)
       zvol1        <-- replicated from Tank. inherites enc. (encryptionroot is Backup)
       ...

My future setup:
Remove inheritance on Tank/zroot, replicate Tank/zroot -> Backup/zroot. Now replicated Tank/zroot/* to Backup/zroot/* and toggle zfs change-key -i on replicated data on Backup/zroot/* but never on Backup/zroot itself (this must stay absolutely unchanged!)

  Tank              <-- Main Pool: Encrypted root dataset
     zroot          <-- pseudo root dataset. Unlocks via password or keyfile. **No inherited encryption**
       dataset1     <-- inherites enc (encryptionroot is Tank/zroot)
       zvol1        <-- inherites enc (encryptionroot is Tank/zroot)
       ...

  Backup            <-- Backup Pool: Either unencrypted or encrypted root dataset (different keying than 'Tank')
     zroot          <-- replicated from Tank. Unlocks via password or keyfile. **No inherited encryption**
       dataset1     <-- replicated from Tank, turned on inherite enc. (encryptionroot now is Backup/zroot)
       zvol1        <-- replicated from Tank, turned on inherite enc. (encryptionroot now is Backup/zroot)
       ...

On a test-system, all replicated datasets from Tank/zroot/* -> Backup/zroot/* now seem to remain functional, since they share the same encryptionroot of Tank/zroot, Backup/zroot respectively.

NOTE:
If I was creating the 'Tank' pool from scratch, I would not enable encryption on it directly now but only on Tank/zroot only.
But for my current situation, changing inheritance just on Tank/zroot is an easy in-place operation, leaving my production pool running + enabling working replications now.
This however won't spare me from copying all data to (a fresh) Backup pool again. But that is ok for me.

NOTE2:
If one ever changes the encryption properties (password, keyfile or iterations etc) on the encryptionroot (Here: Tank/zroot), one absolutely needs to replicate this change to the backup pools. Otherwise, child datasets will become unreadable again.

@digitalsignalperson
Copy link

For the original "permission denied" issue, I think I have a couple solutions that work so the data can be mounted again.

Setup to reproduce the bug

  1. Setup test pool
dd if=/dev/zero of=/root/zpool bs=1M count=4096
zpool create testpool /root/zpool -m /mnt/testpool
  1. Create encryption root and data
echo "12345678" | zfs create -o canmount=off -o encryption=on -o keylocation=prompt -o keyformat=passphrase testpool/enc
zfs create testpool/enc/data
zfs snapshot -r testpool/enc@1
  1. Send data to replica
zfs send -Rw testpool/enc@1 | zfs recv testpool/enc_copy
echo "12345678" | zfs load-key testpool/enc_copy
zfs mount -a
zfs mount

the output is

testpool                        /mnt/testpool
testpool/enc/data               /mnt/testpool/enc/data
testpool/enc_copy/data          /mnt/testpool/enc_copy/data
  1. Change key on origin, take new snapshot, and incremental send to replica
echo "87654321" | zfs change-key testpool/enc
touch /mnt/testpool/enc/data/x
zfs snapshot testpool/enc/data@2
zfs send -wi @1 testpool/enc/data@2 | zfs recv testpool/enc_copy/data

Oops, this may give cannot receive incremental stream: destination testpool/enc_copy/data has been modified since most recent snapshot which is another interesting bug, so use -F on zfs recv instead.

zfs send -wi @1 testpool/enc/data@2 | zfs recv testpool/enc_copy/data -F
zfs umount -a
zfs mount -a

Now we have the cannot mount 'testpool/enc_copy/data': Permission denied

zfs unload-key testpool/enc_copy
echo "12345678" | zfs load-key testpool/enc_copy
zfs mount -a
zfs mount

still permission denied.

Solution 1

This involves sending an incremental change from the source encryption root after the key has changed. Which may not be possible if for some reason you didn't keep common snapshots on the enc / enc_copy encryptionroots. If that is the case, see Solution 2.

zfs snapshot testpool/enc@newkey
zfs send -wi @1 testpool/enc@newkey | zfs recv testpool/enc_copy
zfs mount -a
# still denied. reload keys
zfs unload-key testpool/enc_copy
echo "87654321" | zfs load-key testpool/enc_copy
zfs mount -a
zfs mount

this results in everything mounting

testpool                        /mnt/testpool
testpool/enc/data               /mnt/testpool/enc/data
testpool/enc_copy/data          /mnt/testpool/enc_copy/data

Solution 2

Send a completely new copy of the encryption root with the new key. Luckily this is probably an empty dataset.

zfs send -w testpool/enc@1 | zfs recv testpool/enc_copy2

We need to transplant the broken dataset over to this, and then inherit the encryption root.

The following doesn't work:

zfs rename testpool/enc_copy/data testpool/enc_copy2/data

"cannot rename 'testpool/enc_copy/data': cannot move encrypted child outside of its encryption root"

We could always zfs send it, but that is costly. The solution I found is to use the crypt command force_new_key to turn the encrypted child into a standalone encryption root, then move it.

Current encryptionroots:

zfs get encryptionroot -t filesystem
NAME                    PROPERTY        VALUE               SOURCE
testpool                encryptionroot  -                   -
testpool/enc            encryptionroot  testpool/enc        -
testpool/enc/data       encryptionroot  testpool/enc        -
testpool/enc_copy       encryptionroot  testpool/enc_copy   -
testpool/enc_copy/data  encryptionroot  testpool/enc_copy   -
testpool/enc_copy2      encryptionroot  testpool/enc_copy2  -

Now force new key

python -c 'import libzfs_core; libzfs_core.lzc_change_key(b"testpool/enc_copy/data", "force_new_key")'

zfs get encryptionroot -t filesystem
NAME                    PROPERTY        VALUE                   SOURCE
testpool                encryptionroot  -                       -
testpool/enc            encryptionroot  testpool/enc            -
testpool/enc/data       encryptionroot  testpool/enc            -
testpool/enc_copy       encryptionroot  testpool/enc_copy       -
testpool/enc_copy/data  encryptionroot  testpool/enc_copy/data  -
testpool/enc_copy2      encryptionroot  testpool/enc_copy2      -

Now we can move it

zfs rename testpool/enc_copy/data testpool/enc_copy2/data

zfs get encryptionroot -t filesystem
NAME                     PROPERTY        VALUE                    SOURCE
testpool                 encryptionroot  -                        -
testpool/enc             encryptionroot  testpool/enc             -
testpool/enc/data        encryptionroot  testpool/enc             -
testpool/enc_copy        encryptionroot  testpool/enc_copy        -
testpool/enc_copy2       encryptionroot  testpool/enc_copy2       -
testpool/enc_copy2/data  encryptionroot  testpool/enc_copy2/data  -

Final step is to force inherit

python -c 'import libzfs_core; libzfs_core.lzc_change_key(b"testpool/enc_copy2/data", "force_inherit")'

zfs get encryptionroot -t filesystem
NAME                     PROPERTY        VALUE               SOURCE
testpool                 encryptionroot  -                   -
testpool/enc             encryptionroot  testpool/enc        -
testpool/enc/data        encryptionroot  testpool/enc        -
testpool/enc_copy        encryptionroot  testpool/enc_copy   -
testpool/enc_copy2       encryptionroot  testpool/enc_copy2  -
testpool/enc_copy2/data  encryptionroot  testpool/enc_copy2  -

Now if we load the key, mounting works

echo "87654321" | zfs load-key testpool/enc_copy2
zfs mount -a
zfs mount

output:

testpool                        /mnt/testpool
testpool/enc/data               /mnt/testpool/enc/data
testpool/enc_copy2              /mnt/testpool/enc_copy2
testpool/enc_copy2/data         /mnt/testpool/enc_copy2/data

the bad one can be deleted

zfs destroy testpool/enc_copy -r

Maybe there are some other solutions based on similar ideas. Note the force_inherit/force_new_key commands aren't available in the zfs cli right now, but see #15821

@olidal
Copy link

olidal commented Mar 3, 2024

Same issue here, with kernel 5.15.143-1-pve (proxmox) and zfsutils-linux 2.1.14-pve1:

Dataset fails to mount with permission denied after reboot.

This dataset had a somehow complicated history as described above:

  • involving a replica of encrypted dataset (using send -Rw)
  • switching encryption root at some point between child dataset and parent
  • worked fine... until I rebooted

I suspected an encryption key issue and tried to change the key on the not mounting replica, and got this nice OOPS (this one was with kernel 5.15.126, but same happens on all version I tested: 5.15.107, 5.15.126, 5.15.131, 5.15.143)
[Sun Mar 3 14:06:47 2024] VERIFY3(0 == spa_keystore_dsl_key_hold_dd(dp->dp_spa, dd, FTAG, &dck)) failed (0 == 13)
[Sun Mar 3 14:06:47 2024] PANIC at dsl_crypt.c:1450:spa_keystore_change_key_sync_impl()
[Sun Mar 3 14:06:47 2024] Showing stack for process 1424
[Sun Mar 3 14:06:47 2024] CPU: 3 PID: 1424 Comm: txg_sync Tainted: PO 5.15.126-1-pve #1
[Sun Mar 3 14:06:47 2024] Hardware name: Micro-Star International Co., Ltd. MS-7C89/H410M PRO (MS-7C89), BIOS 1.80 11/16/2020
[Sun Mar 3 14:06:47 2024] Call Trace:
[Sun Mar 3 14:06:47 2024]
[Sun Mar 3 14:06:47 2024] dump_stack_lvl+0x4a/0x63
[Sun Mar 3 14:06:47 2024] dump_stack+0x10/0x16
[Sun Mar 3 14:06:47 2024] spl_dumpstack+0x29/0x2f [spl]
[Sun Mar 3 14:06:47 2024] spl_panic+0xd1/0xe9 [spl]
[Sun Mar 3 14:06:47 2024] ? spa_keystore_dsl_key_hold_dd.isra.0+0xf4/0x270 [zfs]
[Sun Mar 3 14:06:47 2024] spa_keystore_change_key_sync_impl+0x42d/0x440 [zfs]
[Sun Mar 3 14:06:47 2024] spa_keystore_change_key_sync+0x18e/0x480 [zfs]
[Sun Mar 3 14:06:47 2024] ? dmu_buf_rele+0x3d/0x50 [zfs]
[Sun Mar 3 14:06:47 2024] ? dsl_dir_rele+0x30/0x40 [zfs]
[Sun Mar 3 14:06:47 2024] ? spa_keystore_change_key_check+0x1a5/0x550 [zfs]
[Sun Mar 3 14:06:47 2024] dsl_sync_task_sync+0xb7/0x110 [zfs]

@digitalsignalperson , am I right understanding your solution 2 only works if you still have both the source and replica? So if I am left only with the replica, I have no option to recover my data?

@lschuermann
Copy link

Coincidentally, I also just ran into this issue and thought I lost a bunch of data. Managed to recover thanks to @digitalsignalperson's advice and I figured it's worth noting that Solution 2 also works for partially broken streams.

In my case, I still had the original source of the dataset, but only intermittent access (would crash every couple of minutes). However, the dataset created even by the first couple of MBs of the zfs send -w stream managed to, when received with zfs recv -s, create a dataset that could not be mounted, but be used as an encryptionroot for the full dataset that was unmountable.

I think that this might also come in handy for folks that do not have an empty encryptionroot and where sending the full dataset is too expensive. Sending the first couple of megabytes seems to be enough to transfer the encrypted key material into the partially received dataset, which can then be used to decrypt the complete dataset (though not sure whether there's any risks with this that I'm not aware of).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests