Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs send segfault in fletcher_4_native for encrypted replication (-Rw) sends #13620

Open
implr opened this issue Jul 3, 2022 · 6 comments
Open
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@implr
Copy link

implr commented Jul 3, 2022

System information

Type Version/Name
Distribution Name gentoo
Distribution Version ~amd64
Kernel Version 5.18.8-gentoo (also fails on 5.17.9)
Architecture amd64
OpenZFS Version zfs-2.1.5-r2-gentoo

zfs send -Rw dataset@snap consistently crashes before writing out anything. Initially noticed this with this dmesg message:

[  489.458480] traps: zfs[16083] general protection fault ip:7f0bd178b940 sp:7ffe5c380440 error:0 in libzfs.so.4.1.0[7f0bd174f000+44000]

Full backtrace:

# gdb /sbin/zfs
GNU gdb (Gentoo 12.1 vanilla) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/zfs...
Reading symbols from /usr/lib/debug//sbin/zfs.debug...
(gdb) r send  -Rw  zslow/crypt@tape3-220703 > /dev/null
Starting program: /sbin/zfs send  -Rw  zslow/crypt@tape3-220703 > /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
fletcher_4_native (buf=0x555555596310, size=3164, ctx_template=<optimized out>, zcp=0x7fffffffbdb0) at ../../module/zcommon/zfs_fletcher.c:482
482                             fletcher_4_scalar_native((fletcher_4_ctx_t *)zcp,
(gdb) bt
#0  fletcher_4_native (buf=0x555555596310, size=3164, ctx_template=<optimized out>, zcp=0x7fffffffbdb0) at ../../module/zcommon/zfs_fletcher.c:482
#1  0x00007ffff7f5c9f3 in fletcher_4_incremental_impl (zcp=<optimized out>, size=3164, buf=<optimized out>, native=<optimized out>) at ../../module/zcommon/zfs_fletcher.c:565
#2  fletcher_4_incremental_native (buf=buf@entry=0x555555596310, size=size@entry=3164, data=data@entry=0x7fffffffbe80) at ../../module/zcommon/zfs_fletcher.c:584
#3  0x00007ffff7f471d1 in dump_record (outfd=0, zc=0x7fffffffbe80, payload_len=3164, payload=0x555555596310, drr=0x7fffffffbea0) at libzfs_sendrecv.c:106
#4  send_prelim_records (zhp=zhp@entry=0x555555584950, from=from@entry=0x0, fd=fd@entry=1, gather_props=<optimized out>, recursive=<optimized out>, verbose=verbose@entry=B_FALSE, dryrun=<optimized out>, raw=<optimized out>, replicate=<optimized out>, skipmissing=<optimized out>, backup=<optimized out>, 
    holds=<optimized out>, props=<optimized out>, doall=<optimized out>, fssp=<optimized out>, fsavlp=<optimized out>) at libzfs_sendrecv.c:2087
#5  0x00007ffff7f4c8db in zfs_send (zhp=zhp@entry=0x55555557f690, fromsnap=fromsnap@entry=0x0, tosnap=tosnap@entry=0x55555557f5fc "tape3-220703", flags=flags@entry=0x7fffffffd090, outfd=outfd@entry=1, filter_func=filter_func@entry=0x0, cb_arg=<optimized out>, debugnvp=<optimized out>) at libzfs_sendrecv.c:2179
#6  0x0000555555562dec in zfs_do_send (argc=<optimized out>, argv=<optimized out>) at zfs_main.c:4725
#7  0x000055555555b37d in main (argc=4, argv=<optimized out>) at zfs_main.c:8711
@implr implr added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 3, 2022
@rincebrain
Copy link
Contributor

I think this is just #13605, which in that person's case was a problem when they used overridden CFLAGS for -march - in particular, I suspect what's happening is that it's compiling the non-SIMD version of the code to a SIMD version, but then something unsafe is ensuing because it's using it somewhere that that would not be safe.

@implr
Copy link
Author

implr commented Jul 4, 2022

That is possible, I missed that issue. I'm building with -O2 -march=native -pipe -ggdb, which would be equivalent to znver2 in my case. I've been using those flags for a year+ though, so it's either caused by my recent upgrade to gcc12, or something in zfs.

I'll try without -march.

@implr
Copy link
Author

implr commented Jul 4, 2022 via email

@KungFuJesus
Copy link

I think this is just #13605, which in that person's case was a problem when they used overridden CFLAGS for -march - in particular, I suspect what's happening is that it's compiling the non-SIMD version of the code to a SIMD version, but then something unsafe is ensuing because it's using it somewhere that that would not be safe.

Hmm userspace auto vectorization I would imagine is fair game, no? This seems like it could be a compiler bug or possibly some undefined behavior?

@rincebrain
Copy link
Contributor

In theory, yes.

I have a few guesses about what broke, but haven't looked into it yet.

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Jul 4, 2022
Workaround issue with GCC 12 until solved upstream. Segfault
occurs w/ 'zfs send' otherwise (and very possibly other commands).

Bug: openzfs/zfs#13605
Bug: openzfs/zfs#13620
Closes: https://bugs.gentoo.org/856373
Signed-off-by: Sam James <sam@gentoo.org>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Jul 4, 2022
Workaround issue with GCC 12 until solved upstream. Segfault
occurs w/ 'zfs send' otherwise (and very possibly other commands).

Let's backport for older versions to be safe after discussion
w/ gyakovlev.

Bug: openzfs/zfs#13605
Bug: openzfs/zfs#13620
Closes: https://bugs.gentoo.org/856373
See: 1cbf3fb
Signed-off-by: Sam James <sam@gentoo.org>
algitbot pushed a commit to alpinelinux/aports that referenced this issue Aug 13, 2022
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
Workaround issue with GCC 12 until solved upstream. Segfault
occurs w/ 'zfs send' otherwise (and very possibly other commands).

Bug: openzfs/zfs#13605
Bug: openzfs/zfs#13620
Closes: https://bugs.gentoo.org/856373
Signed-off-by: Sam James <sam@gentoo.org>
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
Workaround issue with GCC 12 until solved upstream. Segfault
occurs w/ 'zfs send' otherwise (and very possibly other commands).

Let's backport for older versions to be safe after discussion
w/ gyakovlev.

Bug: openzfs/zfs#13605
Bug: openzfs/zfs#13620
Closes: https://bugs.gentoo.org/856373
See: 1cbf3fbc336adfdcd122da5b0989c2993de358dc
Signed-off-by: Sam James <sam@gentoo.org>
@stale
Copy link

stale bot commented Aug 10, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants