Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot use zfs send recursivly #1388

Closed
prometheanfire opened this issue Apr 4, 2013 · 19 comments
Closed

cannot use zfs send recursivly #1388

prometheanfire opened this issue Apr 4, 2013 · 19 comments
Milestone

Comments

@prometheanfire
Copy link
Contributor

zfs send -Rvvvvvvvv khorne-m4-zp00@20130403-backup | pigz -c > /mnt/backups/${date}-backup-zfssend.gz
send from @ to khorne-m4-zp00@20130403-backup estimated size is 15K
send from @ to khorne-m4-zp00/ROOT@20130403-backup estimated size is 15K
send from @ to khorne-m4-zp00/ROOT/rootfs@20130403-backup estimated size is 70.9M
send from @ to khorne-m4-zp00/ROOT/rootfs/USR@20130403-backup estimated size is 5.30G
send from @ to khorne-m4-zp00/ROOT/rootfs/OPT@20130403-backup estimated size is 595M
send from @ to khorne-m4-zp00/ROOT/rootfs/VAR@20130403-backup estimated size is 1.04G
send from @ to khorne-m4-zp00/GENTOO@20130403-backup estimated size is 14.5K
send from @ to khorne-m4-zp00/GENTOO/portage@20130403-backup estimated size is 340M
send from @ to khorne-m4-zp00/GENTOO/distfiles@20130403-backup estimated size is 4.67G
send from @ to khorne-m4-zp00/GENTOO/packages@20130403-backup estimated size is 2.69G
send from @ to khorne-m4-zp00/HOME@20130403-backup estimated size is 15.5K
send from @ to khorne-m4-zp00/HOME/root@20130403-backup estimated size is 30.3M
send from @ to khorne-m4-zp00/HOME/mthode@20130403-backup estimated size is 95.1G
total estimated size is 110G
TIME SENT SNAPSHOT
internal error: Invalid argument

here, have a core dump
http://dev.gentoo.org/~prometheanfire/crash-logs/zfs-core-0.6.1-20130403.dump

@behlendorf
Copy link
Contributor

This looks new to me. The core would be helpful but I don't have the symbols for your build of the zfs tools. Can you fire up gdb and post the backtrace from the core.

gdb /sbin/zfs <core>
bt

@prometheanfire
Copy link
Contributor Author

here you go

#0 0x00000307a4291b85 in raise () from /lib64/libc.so.6
#1 0x00000307a4292ffb in abort () from /lib64/libc.so.6
#2 0x00000307a484bdf2 in ?? () from /lib64/libzfs.so.1
#3 0x00000307a484c1f8 in zfs_standard_error_fmt () from /lib64/libzfs.so.1
#4 0x00000307a4845185 in ?? () from /lib64/libzfs.so.1
#5 0x00000307a4849b48 in ?? () from /lib64/libzfs.so.1
#6 0x00000307a483a121 in zfs_iter_snapshots_sorted () from /lib64/libzfs.so.1
#7 0x00000307a48454e2 in ?? () from /lib64/libzfs.so.1
#8 0x00000307a48495a9 in ?? () from /lib64/libzfs.so.1
#9 0x00000307a484a445 in zfs_send () from /lib64/libzfs.so.1
#10 0x00000000004094c1 in ?? ()
#11 0x00000000004050b2 in ?? ()
#12 0x00000307a427e5dd in __libc_start_main () from /lib64/libc.so.6
#13 0x00000000004051e5 in ?? ()
#14 0x000003ca7b71d208 in ?? ()
#15 0x00000000ffffffff in ?? ()
#16 0x0000000000000004 in ?? ()
#17 0x000003ca7b71d4cd in ?? ()
#18 0x000003ca7b71d4d1 in ?? ()
#19 0x000003ca7b71d4d6 in ?? ()
#20 0x000003ca7b71d4d9 in ?? ()
#21 0x0000000000000000 in ?? ()

@prometheanfire
Copy link
Contributor Author

now with more debug symbols

#0 0x00000307a4291b85 in raise () from /lib64/libc.so.6
#1 0x00000307a4292ffb in abort () from /lib64/libc.so.6
#2 0x00000307a484bdf2 in zfs_verror (error=2070, hdl=0x3c2a480, fmt=, ap=) at ../../lib/libzfs/libzfs_util.c:270
#3 zfs_verror (hdl=0x3c2a480, error=2070, fmt=, ap=) at ../../lib/libzfs/libzfs_util.c:255
#4 0x00000307a484c1f8 in zfs_standard_error_fmt (hdl=0x3c2a480, error=22, fmt=0x307a4855811 "%s") at ../../lib/libzfs/libzfs_util.c:395
#5 0x00000307a4845185 in dump_ioctl (zhp=0x3c43180, fromsnap=0x3ca7b71c9e0 "", fromsnap_obj=0, fromorigin=, outfd=1, debugnv=0x0) at ../../lib/libzfs/libzfs_sendrecv.c:940
#6 0x00000307a4849b48 in dump_snapshot (zhp=0x3c43180, arg=0x3ca7b71c9d0) at ../../lib/libzfs/libzfs_sendrecv.c:1177
#7 0x00000307a483a121 in zfs_iter_snapshots_sorted (zhp=, callback=0x307a4849880 <dump_snapshot>, data=0x3ca7b71c9d0) at ../../lib/libzfs/libzfs_iter.c:245
#8 0x00000307a48454e2 in dump_filesystem (zhp=0x3c2e9b0, arg=0x3ca7b71c9d0) at ../../lib/libzfs/libzfs_sendrecv.c:1231
#9 0x00000307a48495a9 in dump_filesystems (rzhp=0x3c2b2e0, arg=0x3ca7b71c9d0) at ../../lib/libzfs/libzfs_sendrecv.c:1338
#10 0x00000307a484a445 in dump_filesystems (arg=0x3ca7b71c9d0, rzhp=0x3c2b2e0) at ../../lib/libzfs/libzfs_sendrecv.c:1267
#11 zfs_send (zhp=0x3c2b2e0, fromsnap=, tosnap=, flags=0x3ca7b71d090, outfd=1, filter_func=0x0, cb_arg=0x0, debugnvp=0x0) at ../../lib/libzfs/libzfs_sendrecv.c:1576
#12 0x00000000004094c1 in ?? ()
#13 0x0000000000000000 in ?? ()

@behlendorf
Copy link
Contributor

This indicates that the ZFS_IOC_SEND ioctl which was passed to the kernel contained an invalid argument. It would be useful to see what those arguments were, perhaps there was an invalid snapshot name. Can you either:

  • Patch dump_ioctl() with a few printf's to output the zc.zc_name, zc.zc_cookie, zc.zc_obj, zc.zc_sendobj, and zc.zc_fromobj value before the call to zio_ioctl. Or,
  • Grab a call graph with ftrace using trace-cmd. You should be able to run the following which will show where the EINVAL is being returned in the kernel.
trace-cmd record -p function_graph -g zfs_ioc_send zfs send ...

@prometheanfire
Copy link
Contributor Author

Here's the patch http://bpaste.net/show/89072/

#0 0x00000301ae670b18 in vfprintf () from /lib64/libc.so.6
#1 0x00000301ae7241e8 in __printf_chk () from /lib64/libc.so.6
#2 0x00000301aec1205c in printf (__fmt=0x1 <Address 0x1 out of bounds>) at /usr/include/bits/stdio2.h:105
#3 dump_ioctl (zhp=0x1cb8a10, fromsnap=0x3b588237850 "", fromsnap_obj=0, fromorigin=, outfd=1, debugnv=0x1ca56a0) at ../../lib/libzfs/libzfs_sendrecv.c:894
#4 0x00000301aec16bc8 in dump_snapshot (zhp=0x1cb8a10, arg=0x3b588237840) at ../../lib/libzfs/libzfs_sendrecv.c:1188
#5 0x00000301aec07121 in zfs_iter_snapshots_sorted (zhp=, callback=0x301aec16900 <dump_snapshot+32>, data=0x3b588237840) at ../../lib/libzfs/libzfs_iter.c:245
#6 0x00000301aec12562 in dump_filesystem (zhp=0x1ca02c0, arg=0x3b588237840) at ../../lib/libzfs/libzfs_sendrecv.c:1247
#7 0x00000301aec16629 in dump_filesystems (rzhp=0x1ca0020, arg=0x3b588237840) at ../../lib/libzfs/libzfs_sendrecv.c:1347
#8 0x00000301aec174c5 in zfs_send (zhp=0x1ca0020, fromsnap=0x1 <Address 0x1 out of bounds>, tosnap=0x0, flags=0x3b588237f00, outfd=1, filter_func=0x0, cb_arg=0x0, debugnvp=0x3b588237f30) at ../../lib/libzfs/libzfs_sendrecv.c:1405
#9 0x0000000000409415 in zfs_do_send (argc=, argv=) at ../../cmd/zfs/zfs_main.c:3614
#10 0x00000000004050b2 in main (argc=4, argv=0x3b588238088) at ../../cmd/zfs/zfs_main.c:6402

@prometheanfire
Copy link
Contributor Author

did this help at all?

@ryao
Copy link
Contributor

ryao commented Apr 8, 2013

Static analysis from issue #1392 suggests a possible cause:

[C:\repo\zfs\lib\libzfs\libzfs_sendrecv.c:2824]: (error) Dereferencing 'cp' after it is deallocated / released

@behlendorf As easy as it should be to fix that, I do not have time right now, so it is all yours.

@behlendorf
Copy link
Contributor

@prometheanfire Let me get you a patch to try to provide the needed debugging (maybe today). Also if you get a chance the trace-cmd log would be very helpful.

@ryao I don't think #1392 is likely the root cause since the reason for the core is due to an explicit abort() not a dereference of an invalid address.

@prometheanfire
Copy link
Contributor Author

@behlendorf I don't have access to the trace-cmd but do you have that patch for me to test?

@prometheanfire
Copy link
Contributor Author

I narrowed it down to a specific dataset that I cannot send.

zfs list
NAME USED AVAIL REFER MOUNTPOINT
khorne-m4-zp00 107G 125G 136K none
khorne-m4-zp00/GENTOO 10.5G 125G 136K none
khorne-m4-zp00/GENTOO/distfiles 5.97G 125G 5.97G /usr/portage/distfiles
khorne-m4-zp00/GENTOO/packages 3.61G 125G 3.61G /usr/portage/packages
khorne-m4-zp00/GENTOO/portage 983M 125G 923M /usr/portage
khorne-m4-zp00/HOME 91.0G 125G 144K /home
khorne-m4-zp00/HOME/mthode 91.0G 125G 90.9G /home/mthode
khorne-m4-zp00/HOME/root 30.9M 125G 30.8M /root
khorne-m4-zp00/ROOT 5.76G 125G 136K none
khorne-m4-zp00/ROOT/rootfs 5.76G 125G 48.8M /
khorne-m4-zp00/ROOT/rootfs/OPT 372M 125G 372M /opt
khorne-m4-zp00/ROOT/rootfs/USR 5.09G 125G 5.06G /usr
khorne-m4-zp00/ROOT/rootfs/VAR 269M 125G 267M legacy

zfs list -t snap
NAME USED AVAIL REFER MOUNTPOINT
khorne-m4-zp00@20130506-backup 0 - 136K -
khorne-m4-zp00/GENTOO@20130506-backup 0 - 136K -
khorne-m4-zp00/GENTOO/distfiles@20130506-backup 0 - 5.97G -
khorne-m4-zp00/GENTOO/packages@20130506-backup 0 - 3.61G -
khorne-m4-zp00/GENTOO/portage@20130506-backup 15.8M - 923M -
khorne-m4-zp00/HOME@20130506-backup 0 - 144K -
khorne-m4-zp00/HOME/mthode@20130506-backup 35.7M - 90.9G -
khorne-m4-zp00/HOME/root@20130506-backup 0 - 30.8M -
khorne-m4-zp00/ROOT@20130506-backup 0 - 136K -
khorne-m4-zp00/ROOT/rootfs@20130506-backup 184K - 48.8M -
khorne-m4-zp00/ROOT/rootfs/OPT@20130506-backup 0 - 372M -
khorne-m4-zp00/ROOT/rootfs/USR@20130506-backup 13.1M - 5.06G -
khorne-m4-zp00/ROOT/rootfs/VAR@20130506-backup 520K - 267M -

works
zfs send -R khorne-m4-zp00/GENTOO@${date}-backup | pigz -c > /mnt/backups/laptop/${date}-GENTOO-zfssend.gz
zfs send -R khorne-m4-zp00/HOME@${date}-backup | pigz -c > /mnt/backups/laptop/${date}-HOME-zfssend.gz
zfs send -R khorne-m4-zp00/ROOT/rootfs@${date}-backup | pigz -c > /mnt/backups/laptop/${date}-rootfs-zfssend.gz

doesn't work
zfs send -R khorne-m4-zp00/ROOT@${date}-backup | pigz -c > /mnt/backups/laptop/${date}-ROOT-zfssend.gz
internal error: Invalid argument

zfs send khorne-m4-zp00/ROOT@20130506-backup > asd
internal error: Invalid argument
Aborted

zdb khorne-m4-zp00/ROOT@20130506-backup
Dataset khorne-m4-zp00/ROOT@20130506-backup [ZPL], ID 271, cr_txg 2143498, 136K, 6 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
     0    7    16K    16K  56.0K    16K   18.75  DMU dnode
    -1    1    16K    512     8K    512  100.00  ZFS user/group used
    -2    1    16K    512     8K    512  100.00  ZFS user/group used
     1    1    16K    512     8K    512  100.00  ZFS master node
     2    1    16K    512     8K    512  100.00  SA master node
     3    1    16K    512     8K    512  100.00  ZFS delete queue
     4    1    16K    512     8K    512  100.00  ZFS directory
     5    1    16K  1.50K     8K  1.50K  100.00  SA attr registration
     6    1    16K    16K    16K    32K  100.00  SA attr layouts

zdb khorne-m4-zp00/ROOT
Dataset khorne-m4-zp00/ROOT [ZPL], ID 38, cr_txg 11, 136K, 6 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
     0    7    16K    16K  56.0K    16K   18.75  DMU dnode
    -1    1    16K    512     8K    512  100.00  ZFS user/group used
    -2    1    16K    512     8K    512  100.00  ZFS user/group used
     1    1    16K    512     8K    512  100.00  ZFS master node
     2    1    16K    512     8K    512  100.00  SA master node
     3    1    16K    512     8K    512  100.00  ZFS delete queue
     4    1    16K    512     8K    512  100.00  ZFS directory
     5    1    16K  1.50K     8K  1.50K  100.00  SA attr registration
     6    1    16K    16K    16K    32K  100.00  SA attr layouts

@prometheanfire
Copy link
Contributor Author

I noticed a couple of odd things as well about the dataset that is giving the errors.

NAME                             TYPE        CREATION                USED  AVAIL  REFER  RATIO  MOUNTED  ORIGIN  QUOTA  RESERV  VOLSIZE  VOLBLOCK  RECSIZE  MOUNTPOINT              SHARENFS   CHECKSUM  COMPRESS  ATIME  DEVICES  EXEC  SETUID  RDONLY  ZONED  SNAPDIR     ACLINHERIT  CANMOUNT  XATTR  COPIES  VERSION  UTF8ONLY  NORMALIZATION         CASE  VSCAN  NBMAND  SHARESMB  REFQUOTA  REFRESERV  PRIMARYCACHE  SECONDARYCACHE  USEDSNAP  USEDDS  USEDCHILD  USEDREFRESERV  DEFER_DESTROY  USERREFS     LOGBIAS          DEDUP  MLSLABEL      SYNC  REFRATIO  WRITTEN  CLONES  SNAPDEV
khorne-m4-zp00/GENTOO            filesystem  Fri Aug 31 15:55 2012  12.5G  72.1G   136K  1.00x       no  -        none    none        -         -     128K  none                    off              on       off     on       on    on      on     off    off   hidden     restricted        on     on       1        5        on          formD    sensitive    off     off  off           none       none           all             all         0    136K      12.5G              0              -         -     latency            off  none      standard     1.00x        0  -        hidden
khorne-m4-zp00/ROOT              filesystem  Fri Aug 31 15:55 2012  8.91G  72.1G   136K  1.93x       no  -        none    none        -         -     128K  none                    off              on        on     on       on    on      on     off    off   hidden     restricted        on     on       1        -         -              -            -    off     off  off           none       none           all             all         0    136K      8.91G              0              -         -     latency            off  none      standard     1.00x        0  -        hidden

these pop out at me

version = -
utf8only = -
normalization = -
case = -

@behlendorf
Copy link
Contributor

The output of trace-cmd revealed that failing call path in the kernel is the following which returns EINVAL.

zfs_ioc_send()->dmu_send()->zfs_get_zplprop()

Unfortunately the error code from zfs_get_zplprop() is lost we don't know exactly why it was unable to fetch the version. However, based on the previous comment I suspect that somehow the property is outright missing. How that could happen isn't at all clear, but this chunk of kernel code can't cope with it and correctly errors out.

You can apply the following patch to get the exact error code.

diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index b256803..79ce9da 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -464,7 +464,10 @@ dmu_send(objset_t *tosnap, objset_t *fromsnap, boolean_t fr
 #ifdef _KERNEL
        if (dmu_objset_type(tosnap) == DMU_OST_ZFS) {
                uint64_t version;
-               if (zfs_get_zplprop(tosnap, ZFS_PROP_VERSION, &version) != 0) {
+               err = zfs_get_zplprop(tosnap, ZFS_PROP_VERSION, &version);
+               if (err) {
+                       printk("zfs_get_zplprop(%p, %d, %d) = %d\n",
+                              tosnap, ZFS_PROP_VERSION, version, err);
                        kmem_free(drr, sizeof (dmu_replay_record_t));
                        return (EINVAL);
                }

If you're more ambitious simply you can apply this path to comment out the version check and set version=ZPL_VERSION_SA for the purposes of a test. It may be able to successfully send the dataset.

diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index b256803..3157b37 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -463,10 +463,13 @@ dmu_send(objset_t *tosnap, objset_t *fromsnap, boolean_t f

 #ifdef _KERNEL
        if (dmu_objset_type(tosnap) == DMU_OST_ZFS) {
-               uint64_t version;
+               uint64_t version = ZPL_VERSION_SA;
+
+#if 0
                if (zfs_get_zplprop(tosnap, ZFS_PROP_VERSION, &version) != 0) {
                        kmem_free(drr, sizeof (dmu_replay_record_t));
                        return (EINVAL);
+#endif
                }
                if (version == ZPL_VERSION_SA) {
                        DMU_SET_FEATUREFLAGS(

@prometheanfire
Copy link
Contributor Author

I was more ambitious

./tmp/trace-cmd/trace-cmd record -p function_graph -F zfs send khorne-m4-zp00/ROOT@20130807 > /mnt/20130807-backup-zfssend.gz
plugin function_graph
warning: cannot send 'khorne-m4-zp00/ROOT@20130807': Input/output error
^Ctrace-cmd: Interrupted system call
trace-cmd: Interrupted system call
recorder error in splice inputrecorder error in splice input

trace-cmd: Interrupted system call
recorder error in splice input
offset=200000
offset=3c0000
offset=3d9000
offset=4eb000

http://dev.gentoo.org/~prometheanfire/crash-logs/zfs-send-trace-2.out.xz

@prometheanfire
Copy link
Contributor Author

closing this since it seems like it's just corruption on my end

zpool status -v

pool: khorne-m4-zp00
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub in progress since Thu Aug 8 14:40:30 2013
152G scanned out of 156G at 166M/s, 0h0m to go
0 repaired, 96.95% done
config:

NAME        STATE     READ WRITE CKSUM
khorne-m4-zp00  ONLINE       0     0   266
  root      ONLINE       0     0 1.04K

errors: Permanent errors have been detected in the following files:

    khorne-m4-zp00:<0x0>
    khorne-m4-zp00/ROOT:<0x1>
    khorne-m4-zp00/ROOT:<0x3>
    <0xcd>:<0x1>
    <0xea>:<0x0>

@bobziuchkovski
Copy link

I ran into something that looks quite similar to this today. I've been using the ZFS Stable Ubuntu PPA (Raring) and just today updated my ZFS debs from 0.6.1 to 0.6.2. After the update, my recursive zfs sends no longer worked:

sudo zfs send -R -v -I griffin-ssd@20130906_020002 griffin-ssd@20130906_221801 | sudo zfs receive griffin-sata/REP/griffin-ssd
internal error: Invalid argument

I downgraded to 0.6.1 and the send worked properly. Unfortunately I'm in the middle of a couple projects and am not totally comfortable running additional tests/patches right now. However, I would be happy to do so in a few weeks if you would like additional testing.

@behlendorf
Copy link
Contributor

@ziuchkovski There was an accidentally incompatibility introduced between the 0.6.2 user space utilities and the 0.6.1 kernel modules related to zfs send. We saw the same thing after the 0.6.2 tag was made. Make sure your're using the 0.6.2 modules if you install the 0.6.2 user space utilities. You'll need to manual unload and load the new kernel modules. Rolling the user space utilities back to 0.6.1 will avoid the issue as well which you've already discovered.

@bobziuchkovski
Copy link

@behlendorf Thanks, that was the problem. Userspace/module incompatibility didn't occur to me. Reloading the modules after the 0.6.2 upgrade fixed the issue.

@behlendorf
Copy link
Contributor

@ziuchkovski Glad to hear it. Unfortunately, you should expect similar breakage when updating to 0.6.3 it looks like it's going to be unavoidable. Sorry.

@bobziuchkovski
Copy link

@behlendorf Good to know, thanks. Now I know what to look for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants