Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs receive crashed with "internal error: Invalid argument" #7755

Closed
philwo opened this issue Jul 29, 2018 · 5 comments · Fixed by #7757
Closed

zfs receive crashed with "internal error: Invalid argument" #7755

philwo opened this issue Jul 29, 2018 · 5 comments · Fixed by #7757

Comments

@philwo
Copy link

philwo commented Jul 29, 2018

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version current
Linux Kernel 4.14.56-1-lts
Architecture x86_64
ZFS Version 0.7.9-1
SPL Version 0.7.9-1

Describe the problem you're observing

"zfs receive" crashed just when it was done receiving 875G of data. Although this worries me a bit, the destination seems to be fine and all data is there.

[root@philwo-nas ~]# zfs snapshot -r pool@now
[root@philwo-nas ~]# zfs send -R pool@now | zfs recv -Fdu -o checksum=sha512 tank
internal error: Invalid argument
Aborted (core dumped)

Describe how to reproduce the problem

I can reproduce it with the two steps shown above. Here's what I did in full. (I'm playing around with ZFS on a test server and am migrating some data from a two-disk mirror to a four-disk RAIDZ1.)

  1. Starting with a two-disk mirror pool called 'pool'.
  2. Detach one of the mirrored disks: zpool detach pool ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E5KU5YRA. Add two more disks to the server. Now I have three free disks and can create a degraded four-disk RAIDZ1 for the migration.
  3. Create a sparse disk image that represents the (to be replaced later) fourth disk of my RAIDZ1: truncate -s 4000787030016 /fake-WDC_WD40EZRX-00SPEB0_WD-WCC4E4UVASH2
  4. Create the new pool 'tank' with the new RAIDZ1:
zpool create -f -o ashift=12 -O checksum=sha512 -O compression=lz4 -O normalization=formD -O xattr=sa -O relatime=on tank raidz1 \
  /fake-WDC_WD40EZRX-00SPEB0_WD-WCC4E4UVASH2 \
  /dev/disk/by-id/ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E5KU5YRA \
  /dev/disk/by-id/ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ0913328 \
  /dev/disk/by-id/ata-WDC_WD30EZRX-00SPEB0_WD-WCC4E1VNT317
  1. Take the disk image offline, so that it won't be used: zpool offline tank /fake-WDC_WD40EZRX-00SPEB0_WD-WCC4E4UVASH2
  2. Create a snapshot on the old pool: zfs snapshot -r pool@now
  3. Migrate the data to the new pool (and also migrate to sha512 as the checksum): zfs send -R pool@now | zfs recv -Fdu -o checksum=sha512 tank
  4. Got error message as shown above.

Include any warning/errors/backtraces from the system logs

Jul 30 00:04:31 philwo-nas systemd-coredump[7394]: Process 10773 (zfs) of user 0 dumped core.
                                                   
Stack trace of thread 10773:
#0  0x00007fe14cf9a86b raise (libc.so.6)
#1  0x00007fe14cf8540e abort (libc.so.6)
#2  0x00007fe14d77d329 n/a (libzfs.so.2)
#3  0x00007fe14d77d979 zfs_standard_error_fmt (libzfs.so.2)
#4  0x00007fe14d776b99 n/a (libzfs.so.2)
#5  0x00007fe14d778e05 n/a (libzfs.so.2)
#6  0x00007fe14d77930f n/a (libzfs.so.2)
#7  0x00007fe14d77b9d2 zfs_receive (libzfs.so.2)
#8  0x000055a96d5f0f91 n/a (zfs)
#9  0x000055a96d5e8d38 n/a (zfs)
#10 0x00007fe14cf8706b __libc_start_main (libc.so.6)
#11 0x000055a96d5e8f5a n/a (zfs)

There's nothing else of relevance in the syslog (e.g. no messages from the kernel).

Please let me know if I can provide more information that would help.

@rincebrain
Copy link
Contributor

My immediate suggestion would be to cut a new set of ZFS userland binaries with the same version, but debug symbols, so you can get a reasonable backtrace. (That crash is probably something inane like trying to string format and dereferencing into a null struct, though, if I had to guess.)

What's the state of the receive on the destination - e.g. is there a receive-resume-token property on the dataset, and does it crash again in the same fashion if you try resuming with it?

This reminds me of #7576.

@philwo
Copy link
Author

philwo commented Jul 30, 2018

Thanks @rincebrain!

I found a nice way to reproduce this in just a few steps. The culprit is passing the "-o checksum=sha512" flag to "zfs recv". If I remove that flag, it works. Interestingly, the filesystem does have the correct checksum set when I verify it afterwards via "zfs get checksum". (However I don't know a way to verify that the contents are actually hashed with that checksum.)

Repro steps:

truncate -s 1000000000 /disk1
truncate -s 1000000000 /disk2
zpool create -f test1 /disk1
zpool create -f test2 /disk2
zfs create test1/sub
echo hello > /test1/sub/world.txt
zfs snapshot -r test1@now
zfs send -R test1@now > /stream
# Crash:
zfs recv -Fd -o checksum=sha512 test2 < /stream
# Debugged with:
#gdb zfs -ex "run recv -Fd -o checksum=sha512 test2 < /stream"
zpool destroy test1
zpool destroy test2
rm -f /disk{1,2}

Attaching the backtrace that I got from gdb after I rebuilt the zfs-utils with full debugging symbols and -O0: gdb.txt

Regarding the state of the receive: There's no receive_resume_token property set on the destination filesystem, even if I pass the "-s" flag to "zfs send". All files show up in the destination filesystem and the contents seem to be fine.

I agree that this looks to be the same issue as your #7576.

@rincebrain
Copy link
Contributor

I think this might be similar to the problem that #7478 was supposed to fix; tagging @loli10K to see if they agree.

@loli10K
Copy link
Contributor

loli10K commented Jul 31, 2018

I think this might be similar to the problem that #7478 was supposed to fix

#7478 fixes how we identify a top-level dataset in an incremental replication stream with intermediary snapshots (-I) and the change is entirely in userland code.

This new issue seems to affect only some properties, the reproducer doesn't use -I (or -i) and the error seems to be coming from the kernel.

@rincebrain
Copy link
Contributor

Well, that will teach me to try and debug problems too late at night, but TY for finding and fixing the actual problem anyway! :)

behlendorf pushed a commit that referenced this issue Aug 3, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7755 
Closes #7576 
Closes #7757
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 15, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755 
Closes openzfs#7576 
Closes openzfs#7757
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 15, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755
Closes openzfs#7576
Closes openzfs#7757

Requires-spl: refs/pull/707/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 23, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755
Closes openzfs#7576
Closes openzfs#7757

Requires-spl: refs/pull/707/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 27, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755
Closes openzfs#7576
Closes openzfs#7757

Requires-spl: refs/pull/707/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 30, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755
Closes openzfs#7576
Closes openzfs#7757
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 5, 2018
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#7755
Closes openzfs#7576
Closes openzfs#7757
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants