SEEK_DATA fails randomly #11900

ismell · 2021-04-13T23:05:35Z

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	rodete 20210324.05.05RL
Linux Kernel	5.10.19-1rodete1
Architecture	amd64
ZFS Version	zfs-2.0.3-1~bpo10+1
SPL Version	2.0.3-1~bpo10+1

Describe the problem you're observing

lseek + SEEK_DATA is randomly returning -ENXIO.

Code showing the problem: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/portage_tool/src/portage_util_file_copy_reflink_linux.c;l=145

I patched it with the following:

$ git diff
diff --git a/src/portage_util_file_copy_reflink_linux.c b/src/portage_util_file_copy_reflink_linux.c
index 352342c06..f261777fc 100644
--- a/src/portage_util_file_copy_reflink_linux.c
+++ b/src/portage_util_file_copy_reflink_linux.c
@@ -140,8 +140,9 @@ do_lseek_data(int fd_out, int fd_in, off_t *off_out) {
     /* Use lseek SEEK_DATA/SEEK_HOLE for sparse file support,
      * as suggested in the copy_file_range man page.
      */
-    off_t offset_data, offset_hole;
+    off_t offset_data, offset_hole, current;
 
+    current = lseek(fd_in, 0, SEEK_CUR);
     offset_data = lseek(fd_in, *off_out, SEEK_DATA);
     if (offset_data < 0) {
         if (errno == ENXIO) {

On a successful run this is what the strace looks like(depthcharge.elf):

[pid 3014182] lseek(3, 0, SEEK_CUR)     = 0
[pid 3014182] lseek(3, 0, SEEK_DATA)    = 0
[pid 3014182] lseek(3, 0, SEEK_HOLE)    = 2439852
[pid 3014182] copy_file_range(3, [0], 4, [0], 2439852, 0) = 2439852
[pid 3014182] lseek(3, 0, SEEK_CUR)     = 2439852
[pid 3014182] lseek(3, 2439852, SEEK_DATA) = -1 ENXIO (No such device or address)
[pid 3014182] lseek(3, 0, SEEK_END)     = 2439852
[pid 3014182] ftruncate(4, 2439852)     = 0
[pid 3014182] close(4)                  = 0
[pid 3014182] close(3)                  = 0

When a failure happens this is what it looks like (netboot.elf):

[pid 3014182] lseek(3, 0, SEEK_CUR)     = 0
[pid 3014182] lseek(3, 0, SEEK_DATA)    = -1 ENXIO (No such device or address)
[pid 3014182] lseek(3, 0, SEEK_END)     = 1824516
[pid 3014182] lseek(4, 1824516, SEEK_SET) = 1824516
[pid 3014182] ftruncate(4, 1824516)     = 0
[pid 3014182] close(4)                  = 0
[pid 3014182] close(3)                  = 0

man lseek shows:

SEEK_DATA
              Adjust the file offset to the next location in the file greater than or equal to offset containing data.  If offset points to data, then the file offset is set to offset.

ENXIO  whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end of the file, or whence is SEEK_DATA and offset is within a hole at the end of the file.

Describe how to reproduce the problem

I don't have a simple way to reproduce this yet. It happens as part of a portage build. I have attached a full strace log. The problem shows up randomly, so running the same command multiple times will cause different files to fail to copy.

Include any warning/errors/backtraces from the system logs

depthcharge-strace-patch1.log

The text was updated successfully, but these errors were encountered:

vapier · 2021-04-13T23:10:35Z

@zmedico fyi

h1z1 · 2021-04-16T19:46:39Z

Any chance #11910 is related?

ismell · 2021-04-19T15:28:34Z

See https://bugs.chromium.org/p/chromium/issues/detail?id=1198679 for more of the nitty-gritty debugging.

behlendorf · 2021-04-19T17:41:07Z

@ismell it's not clear to me from the comment above, but does the patch you posted above avoid the issue in your testing?

One plausible way you might try and reproduce this is to write to the file while performing the lseek(). This case is handled by the code but it's one of the less likely code paths so it's possible there might be an issue here.

ismell · 2021-04-19T18:59:02Z

No the patch above was just to confirm that the pointer was at the start of the file. I haven't yet been able to find an easy way to reproduce the problem. There should be no writers to the input file at this point. I'll keep working on trying to find an easy way to reproduce it. Is there anything else I should be tracing that will help?

behlendorf · 2021-04-19T20:46:58Z

One thing you could try to rule out the dirty data case would be to add an fsync(fd) before the the lseek() in the portage code. That would help us narrow it down a bit, if you no longer see failures we'll have a good idea where to look. Getting a good strace of exactly the IO pattern being issued to cause the problem would also be helpful.

ismell · 2021-04-20T17:06:02Z

Thanks for the tip.

So I wrote a script that calls emerge depthcharge in a loop and verifies the build files are not NULL.

No patch

$ test-zfs
>>> 10:51:18 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:51:38 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Jobs: 1 of 1 complete                           Load avg: 2.13, 1.59, 1.44

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 1, Success: 2, Failure: 1
>>> 10:51:44 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:04 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Jobs: 1 of 1 complete                           Load avg: 2.09, 1.62, 1.46

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 2, Success: 3, Failure: 3
>>> 10:52:10 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:30 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Jobs: 1 of 1 complete                           Load avg: 1.85, 1.60, 1.45

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 3, Success: 4, Failure: 5
>>> 10:52:36 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:56 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Jobs: 1 of 1 complete                           Load avg: 1.56, 1.55, 1.44

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 4, Success: 5, Failure: 7 <-- Everything's not awesome :(

With fsync

diff --git a/src/portage_util_file_copy_reflink_linux.c b/src/portage_util_file_copy_reflink_linux.c
index 352342c06..004b8618a 100644
--- a/src/portage_util_file_copy_reflink_linux.c
+++ b/src/portage_util_file_copy_reflink_linux.c
@@ -140,8 +140,10 @@ do_lseek_data(int fd_out, int fd_in, off_t *off_out) {
     /* Use lseek SEEK_DATA/SEEK_HOLE for sparse file support,
      * as suggested in the copy_file_range man page.
      */
-    off_t offset_data, offset_hole;
+    off_t offset_data, offset_hole, current;
 
+    fsync(fd_in);
+    current = lseek(fd_in, 0, SEEK_CUR);
     offset_data = lseek(fd_in, *off_out, SEEK_DATA);
     if (offset_data < 0) {
         if (errno == ENXIO) {

$ test-zfs
>>> 10:55:17 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:55:37 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Jobs: 1 of 1 complete                           Load avg: 3.36, 2.60, 1.87

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 1, Success: 3, Failure: 0
>>> 10:55:43 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:06 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Jobs: 1 of 1 complete                           Load avg: 2.57, 2.48, 1.85

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 2, Success: 6, Failure: 0
>>> 10:56:12 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:32 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Jobs: 1 of 1 complete                           Load avg: 3.10, 2.59, 1.91

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 3, Success: 9, Failure: 0
>>> 10:56:39 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:58 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Jobs: 1 of 1 complete                           Load avg: 4.32, 2.96, 2.05

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 4, Success: 12, Failure: 0
>>> 10:57:05 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:57:24 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Jobs: 1 of 1 complete                           Load avg: 3.42, 2.86, 2.05

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 5, Success: 15, Failure: 0 <--- Everything is awesome!

It looks like adding fsync fixes the problem. straceing emerge results in a huge log. I'll try and find the relevant parts.

behlendorf · 2021-04-20T20:50:30Z

@ismell if you're comfortable rebuilding ZFS for the testing I could provide a debug patch which should give us a little more insight.

ismell · 2021-04-20T21:07:59Z

I'm sure I can figure it out :)

behlendorf · 2021-04-20T22:53:39Z

In that case, would you mind testing commit f73e0a6 from the https://github.com/behlendorf/zfs/tree/zfs-holey branch. It implements a possible fix by closing a small race. It's only been lightly tested since I wasn't able to reproduce the issue.

ismell · 2021-05-10T19:32:03Z

Sorry for the long cycle time. I was waiting until I was forced to reboot my computer.

So I tried f73e0a6~1 to verify I could reproduce the issue without your patch. I was not able to reproduce it....

$ zfs version
zfs-2.1.99-117_g44287ffa0
zfs-kmod-2.1.99-117_g44287ffa0

$ ./test-zfs
Iteration: 100, Success: 300, Failure: 0
Done
  Success: 300
  Failure: 0

This normally reproduced within a few iterations.

So maybe it was fixed already?

behlendorf · 2021-05-12T16:47:17Z

So I tried f73e0a6~1 to verify I could reproduce the issue without your patch. I was not able to reproduce it....

To be clear you're saying that even without the path you're unable to reproduce the issue now? There weren't any other recent changes in this area which I would have expected to fix this.

The dn_struct_rwlock lock which prevents the dnode structure from changing should be held over the dirty check and the subsequent call to dnode_next_offset(). This ensures that a clean dnode can't be dirtied before the the data/hole is located. Furthermore, refactor the code to provide a dnode_is_dirty() helper function which performs the multilist_link_active() check while holding the correct multilist lock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#11900

ismell · 2021-06-21T17:34:10Z

So I went back to the original version:

$ zfs version
zfs-2.0.3-1~bpo10+1
zfs-kmod-2.0.3-1~bpo10+1

It looks like I can't reproduce the issue anymore...

>>> 11:31:58 Emerging (1 of 1) sys-boot/depthcharge-9999::chromiumos for /build/guybrush/
>>> 11:32:20 Installing (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Completed (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Jobs: 1 of 1 complete                           Load avg: 12.2, 10.9, 8.3

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 7, Success: 21, Failure: 0

I'm wondering if rebooting helped clear what ever bad state I was in...

thesamesam · 2021-09-27T06:23:42Z

I've started hitting a similar issue which manifests as go segfaulting immediately on launch (initially reported at golang/go#48636 - I'd assumed the Go compiler was somehow spitting out a completely malformed binary).

More precisely:

I emerge (install) Go on my Gentoo system with ZFS as rootfs;
Within the temporary directory where Go is first built, the binary runs fine;
The binary copied to the image directory (to be merged to the live filesystem) is broken. Same size, different SHA256 hashes, and completely different (corrupted) ELF headers in the latter. Segfaults when executed.

I eventually somehow figured I should try downgrading coreutils from 9.0 -> 8.32(-r1 in Gentoo) which made the issue disappear: rebuilding Go worked! coreutils 9.0 changed some default behaviours around reflinking, which ZFS doesn't have, and sparse files, but I tried forcing them off in the cp invocation in the ebuilds with no luck anyway.

I then decided to bisect and arrived at:

a6eaee501f6ec0c152abe88640203a64c390993e is the first bad commit
commit a6eaee501f6ec0c152abe88640203a64c390993e
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Thu Jun 25 16:31:44 2020 -0700

    cp: use SEEK_DATA/SEEK_HOLE if available
    
    If it works, prefer lseek with SEEK_DATA and SEEK_HOLE to FIEMAP,
    as lseek is simpler and more portable (will be in next POSIX).
    Problem reported in 2011 by Jeff Liu (Bug#8061).
    * NEWS: Mention this.
    * src/copy.c (lseek_copy) [SEEK_HOLE]: New function.
    (enum scantype): New constants ERROR_SCANTYPE, LSEEK_SCANTYPE.
    (union scan_inference): New type.
    (infer_scantype): Last arg is now union scan_inference *,
    not struct extent_scan *.  All callers changed.
    Prefer SEEK_HOLE to FIEMAP if both work, since
    SEEK_HOLE is simpler and more portable.
    (copy_reg): Do the fdadvise after initial scan, in case the scan
    fails.  Report an error if the initial scan fails.
    (copy_reg) [SEEK_HOLE]: Use lseek_copy if scantype says so.

 NEWS       |   3 +
 src/copy.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 201 insertions(+), 11 deletions(-)
bisect run success

I've straced Portage (just for the lseek syscall) while installing Go and it is filled with:

[...]
11327 lseek(3, 5310, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 9987, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 8334, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 18386, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 33472, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 15270, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 9132, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 15471, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 7389, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 7876, SEEK_DATA)         = -1 ENXIO (No such device or address)
[...]

TL;DR: I seem to be hitting this bug as a result of changes in coreutils which exposes the problem. Please let me know if you feel this is a different issue and I should open a new bug.

system information (emerge --info)

Portage 3.0.23 (python 3.10.0-candidate-2, default/linux/amd64/17.1/hardened, gcc-11.2.0, glibc-2.33-r7, 5.14.7-adry x86_64) ================================================================= System uname: Linux-5.14.7-adry-x86_64-11th_Gen_Intel-R-_Core-TM-_i7-11800H_@_2.30GHz-with-glibc2.33 KiB Mem: 16105692 total, 1991680 free KiB Swap: 0 total, 0 free Timestamp of repository gentoo: Sun, 26 Sep 2021 20:51:27 +0000 Head commit of repository gentoo: fe49ef8e5b7b42f9fcb1f7bf36fbf4c571b8b120

Timestamp of repository kde: Thu, 23 Sep 2021 11:06:23 +0000
Head commit of repository kde: 0bd2f887a7bdfe01ae0c753908bf1132f253263a

sh dash 0.5.11.5
ld GNU ld (Gentoo 2.37_p1 p0) 2.37
ccache version 4.4.1 [disabled]
app-shells/bash: 5.1_p8::gentoo
dev-java/java-config: 2.3.1::gentoo
dev-lang/perl: 5.34.0-r2::gentoo
dev-lang/python: 3.8.12::gentoo, 3.9.7::gentoo, 3.10.0_rc2::gentoo
dev-lang/rust-bin: 1.55.0::gentoo
dev-util/ccache: 4.4.1::gentoo
dev-util/cmake: 3.21.3::gentoo
sys-apps/baselayout: 2.7-r3::gentoo
sys-apps/openrc: 0.44.5::gentoo
sys-apps/sandbox: 2.25::gentoo
sys-devel/autoconf: 2.13-r1::gentoo, 2.71-r1::gentoo
sys-devel/automake: 1.16.4::gentoo
sys-devel/binutils: 2.37_p1::gentoo
sys-devel/gcc: 10.3.0-r2::gentoo, 11.2.0::gentoo
sys-devel/gcc-config: 2.4::gentoo
sys-devel/libtool: 2.4.6-r6::gentoo
sys-devel/make: 4.3::gentoo
sys-kernel/linux-headers: 5.14::gentoo (virtual/os-headers)
sys-libs/glibc: 2.33-r7::gentoo
Repositories:

gentoo
location: /var/db/repos/gentoo
sync-type: git
sync-uri: git://github.com/gentoo-mirror/gentoo.git
priority: -1000
sync-git-clone-extra-opts: -b stable -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now
sync-git-verify-commit-signature: yes

kde
location: /var/db/repos/kde
sync-type: git
sync-uri: https://github.com/gentoo-mirror/kde.git
masters: gentoo

local
location: /var/db/repos/local
masters: gentoo

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
DISTDIR="/var/cache/distfiles"
EMERGE_DEFAULT_OPTS="--keep-going=y --complete-graph --deep --changed-deps-report=y"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox mount-sandbox multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
GENTOO_MIRRORS="http://mirror.bytemark.co.uk/gentoo/ http://www.mirrorservice.org/sites/distfiles.gentoo.org/ http://mirrors.soeasyto.com/distfiles.gentoo.org/ http://mirrors.gethosted.online/gentoo"
LANG="en_GB.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--defsym=gentoo_check_ldflags=0"
LINGUAS="en en_GB"
MAKEOPTS="-j8"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="PIC X a52 aac acl acpi activities aes alsa amd64 avx avx2 avx512bw avx512cd avx512dq avx512f avx512vbmi avx512vl bluetooth branding bzip2 cairo caps cdda cdr clang crypt dbus declarative dist-kernel dri dts dvd dvdr elogind emacs emboss encode exif f16c flac fma3 fortran gif gles2 gmp gpm graphite gtk gui hardened hunspell iconv icu ipv6 iwd jit jpeg kde kipi kwallet lcms libglvnd libnotify libtirpc mad mmx mmxext mng mp3 mp4 mpeg multilib ncurses networkmanager nptl ogg opengl openmp pam pango pclmul pcre pdf pic pie plasma png policykit popcnt ppds pulseaudio qml qt5 rdrand readline screencast sdl seccomp semantic-desktop sha smartcard spell split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification svg syslog system-av1 system-binutils system-boost system-bootstrap system-cairo system-clang system-digest system-ffmpeg system-harfbuzz system-heimdal system-icu system-jpeg system-jsoncpp system-leveldb system-libevent system-libs system-libvpx system-libyaml system-lz4 system-mitkrb5 system-sqlite system-ssl system-tbb system-uulib system-webp system-zlib threads tiff truetype udev udisks unicode upower usb vaapi verify-sig vorbis wayland widgets x264 xattr xcb xml xtpax xv xvid zlib" ABI_X86="32 64" ADA_TARGET="gnat_2019" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512vbmi f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en en-GB" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9 python3_10 python3_8" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="intel i965 iris nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RUSTFLAGS

EDIT: Interestingly, upstream made a kind-of-related remark in a commit after the release:

Upstream commit mentioning ZFS

tests: cp/sparse-perf: make more robust and add zfs comments

init.cfg (seek_data_capable_): Add a timeout to ensure failure for
slow lseek(...SEEK_DATA) calls (even if that syscall isn't interrupted).
tests/cp/sparse-perf.sh: Run the SEEK_DATA check on the
1TiB empty file to exclude both FreeBSD 9.1 which takes 35s,
and ZFS which requires a delay of about 5s between file creation
and use of SEEK_DATA to correctly determine it's empty (return ENXIO).
Also remove the stat size checks as they invalidate the test
due to cp never writing data due to it being always zeros,
and thus converted to holes in the output.

@behlendorf I've tried applying the patch you suggested to ZFS 2.1.1 but unfortunately, while it applies, it ends up failing at build time. Is there any chance you're able to please rebase it?

The failure looks like:

/bin/sh ../../libtool  --tag=CC --silent  --mode=compile x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -include ../../zfs_config.h -I../../include -I../../include -I../../module/icp/include -I../../lib/libspl/include -I../../lib/libspl/include/os/linux  -I../../include/os/linux/zfs  -D_GNU_SOURCE -D_REENTRANT -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -DHAVE_LARGE_STACKS=1 -DLIBEXECDIR=\"/usr/libexec\" -DRUNSTATEDIR=\"/var/run\" -DSBINDIR=\"/sbin\" -DSYSCONFDIR=\"/etc\" -DPKGDATADIR=\"/usr/share/zfs\" -UDEBUG -DNDEBUG  -DTEXT_DOMAIN=\"zfs-linux-user\"  -DDEBUG -UNDEBUG -DZFS_DEBUG  -std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes -fno-strict-aliasing -fno-omit-frame-pointer    -Wno-format-zero-length  -Wno-unused-but-set-variable -Wframe-larger-than=4096  -DLIB_ZPOOL_BUILD -O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches -c -o dnode.lo ../../module/zfs/dnode.c
../../module/zfs/dnode.c: In function ‘dnode_is_dirty’:
../../module/zfs/dnode.c:1661:29: error: incompatible types when assigning to type ‘multilist_t *’ {aka ‘struct multilist *’} from type ‘multilist_t’ {aka ‘struct multilist’}
 1661 |                 dirtylist = dn->dn_objset->os_dirty_dnodes[i];
      |                             ^~
make[3]: *** [Makefile:1326: dnode.lo] Error 1
make[3]: *** Waiting for unfinished jobs....

The dn_struct_rwlock lock which prevents the dnode structure from changing should be held over the dirty check and the subsequent call to dnode_next_offset(). This ensures that a clean dnode can't be dirtied before the data/hole is located. Furthermore, refactor the code to provide a dnode_is_dirty() helper function which performs the multilist_link_active() check while holding the correct multilist lock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#11900

behlendorf · 2021-09-28T22:58:58Z

@thesamesam thanks for continuing the investigation of this issue.

I've rebased my https://github.com/behlendorf/zfs/tree/zfs-holey branch and resolved the build failure (commit 1722451). If you can give it a try that would be great.

thesamesam · 2021-09-28T23:40:28Z

@behlendorf Thanks!

(Note: I hadn't been able to reproduce this on other ZFS'd machines with pretty similar setups, so I/O speed might be a factor, which I guess is obvious given it's racey).

Applied to 2.1.1 and built fine (rebuilt initramfs and so on, and rebooted) but no luck. Same corrupted /usr/bin/go.

I'm happy to enable more debugging as necessary, especially given I know this is tricky given you can't reproduce it on your end.

Here's a partial strace during this cp line from the ebuild (I've just picked out the bits around copying the corrupted binaries, but I can get you more if it's useful):

8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/api", {st_mode=S_IFDIR|0755, st_size=23, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9730, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9850, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0755) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005  newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_EMPTY_PATH) = 0
8005  getdents64(3, 0x5619626c61c0 /* 4 entries */, 131072) = 104
8005  getdents64(3, 0x5619626c61c0 /* 0 entries */, 131072) = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "bin/gofmt", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin/gofmt", O_RDONLY|O_NOFOLLOW) = 3
8005  newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_EMPTY_PATH) = 0
8005  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/gofmt", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005  newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005  lseek(3, 0, SEEK_DATA)            = 0
8005  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005  lseek(3, 0, SEEK_HOLE)            = 131072
8005  lseek(3, 0, SEEK_SET)             = 0
8005  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005  write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  ftruncate(4, 3318813)             = 0
8005  close(4)                          = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "bin/go", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin/go", O_RDONLY|O_NOFOLLOW) = 3
8005  newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_EMPTY_PATH) = 0
8005  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/go", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005  newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005  lseek(3, 0, SEEK_DATA)            = 0
8005  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005  lseek(3, 0, SEEK_HOLE)            = 131072
8005  lseek(3, 0, SEEK_SET)             = 0
8005  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005  write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  ftruncate(4, 14013626)            = 0
8005  close(4)                          = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "doc", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b96f0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b9810, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0755) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "doc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005  newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_EMPTY_PATH) = 0
8005  getdents64(3, 0x5619626c62b0 /* 6 entries */, 131072) = 176
8005  getdents64(3, 0x5619626c62b0 /* 0 entries */, 131072) = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "doc/go_mem.html", {st_mode=S_IFREG|0644, st_size=13565, ...}, AT_SYMLINK_NOFOLLOW) = 0

EDIT: I did out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync but it didn't make any difference.

openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

behlendorf · 2021-10-06T23:06:37Z

out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync but it didn't make any difference.

That's very surprising. When set ZFS should always report the correct holes at the cost of performance. Were you able to reproduce this behavior with both the unmodified and patched code?

thesamesam · 2021-10-06T23:47:26Z

Yeah, I just tried again unpatched and it happens regardless of the value of the toggle (0 or 1).

Happy to try enabling more debugging if can think of a way to toggle it on/off easily during the merge.

behlendorf · 2021-10-07T00:38:20Z

Looking at the strace output what appears to be happening is:

The bin/go file is opened and determined to be 14,013,626 bytes is size
ZFS reports that the first hole in the file starts at 128k (which looks wrong given I don't see any holes in the file on my system)
That 128k is copied to a new file, and the new file is truncated to 14,013,626 bytes zero filling the rest.

The story is similar for bin/gofmt. Let me take another look and see if I can provide you a small debugging patch to determine what's going wrong. If you still have the strace logs it would be helpful to know how the bin/go file being copied here was originally written (mmap or write syscalls) and when.

thesamesam · 2021-10-07T01:09:23Z

Ah, it gets renamed during the build process:

# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out'
go_nuts_zfs.log:4923824:4931  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log:4923832:4931  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log:4936086:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log:4936699:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log:4936733:1309  fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log:4936735:1309  renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0

Here's the output for 10 lines after every instance of the path being mentioned:

# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out' -A 10
go_nuts_zfs.log:4923824:4931  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log-4923827-716   <... epoll_pwait resumed>[], 128, 5, NULL, 1033473606543) = 0
go_nuts_zfs.log-4923828-4932  <... nanosleep resumed>NULL)      = 0
go_nuts_zfs.log-4923829-4931  <... unlinkat resumed>)           = -1 ENOENT (No such file or directory)
go_nuts_zfs.log-4923830-716   epoll_pwait(4,  <unfinished ...>
go_nuts_zfs.log-4923831-4932  nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
go_nuts_zfs.log:4923832:4931  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log-4923833-716   <... epoll_pwait resumed>[], 128, 0, NULL, 140732691746960) = 0
go_nuts_zfs.log-4923834-716   epoll_pwait(4,  <unfinished ...>
go_nuts_zfs.log-4923835-4931  <... openat resumed>)             = 3
go_nuts_zfs.log-4923836-4931  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3840094264, u64=140698378776632}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4923837-4932  <... nanosleep resumed>NULL)      = 0
go_nuts_zfs.log-4923838-4931  mmap(0xc000400000, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0 <unfinished ...>
go_nuts_zfs.log-4923839-4932  nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
go_nuts_zfs.log-4923840-4931  <... mmap resumed>)               = 0xc000400000
go_nuts_zfs.log-4923841-4931  madvise(0xc000200000, 2097152, MADV_HUGEPAGE) = 0
go_nuts_zfs.log-4923842-4932  <... nanosleep resumed>NULL)      = 0
--
go_nuts_zfs.log:4936086:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936087-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936088-1309  pread64(3, "\177ELF", 4, 0)       = 4
go_nuts_zfs.log-4936089-1309  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 31744) = 31744
go_nuts_zfs.log-4936090-1309  read(3, "H\205\322t$H\211\\$8H\213A H\211\336H\211\323H\211\361\350\4\323\0\0H\213\214$"..., 31744) = 31744
go_nuts_zfs.log-4936091-1309  read(3, "\0G\17\266\24\32I\203\372D\17\203\34\1\0\0L\215\35\311\24\222\0C\17\267\fS\353\"I\215"..., 31744) = 31744
go_nuts_zfs.log-4936092-1309  read(3, "L\211D$(L\211L$0L\211T$8L\211\\$@\350G\276\4\0H\213D$\10\213\\"..., 31744) = 31744
go_nuts_zfs.log-4936093-1309  read(3, "\0\273\21\0\0\0\350\225a\1\0L\213\204$\320\1\0\0M\213@(L\211\204$\320\1\0\0L"..., 31744) = 31744
go_nuts_zfs.log-4936094-1309  read(3, "i\1\0\0\0\0\0\203=\362X\231\0\0u\tH\211\220Xi\1\0\353\fH\215\270Xi\1\0"..., 31744) = 31744
go_nuts_zfs.log-4936095-1309  read(3, "\350{\344\375\377H\213l$\30H\203\304 \303\350LJ\3\0\351G\377\377\377\314\314\314\314\314\314\314"..., 31744) = 31744
go_nuts_zfs.log-4936096-1309  read(3, "\1H\213\5p\221Z\0\3503\271\1\0\350\16\306\1\0\211D$\24H\215\rS\221Z\0\204\1H"..., 31744) = 31744
--
go_nuts_zfs.log:4936699:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log-4936700-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936701-1309  pwrite64(3, "Zcbx5r99nvfz0xObj6_4/q7UMKiu3A_b"..., 83, 3984) = 83
go_nuts_zfs.log-4936702-1309  pread64(3, "\177ELF", 4, 0)       = 4
go_nuts_zfs.log-4936703-1309  close(3)                          = 0
go_nuts_zfs.log-4936704-1309  write(2, "", 0)                   = 0
go_nuts_zfs.log-4936705-1309  futex(0xc00089a550, FUTEX_WAKE_PRIVATE, 1) = 1
go_nuts_zfs.log-4936706-31945 <... futex resumed>)              = 0
go_nuts_zfs.log-4936707-1309  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin",  <unfinished ...>
go_nuts_zfs.log-4936708-31945 futex(0xc00089ad50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
go_nuts_zfs.log-4936709-1612  <... epoll_pwait resumed>[], 128, 1, NULL, 1033883350377) = 0
--
go_nuts_zfs.log:4936733:1309  fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log-4936734-1309  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go", 0xc00048b7c8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4936735:1309  renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0
go_nuts_zfs.log-4936736-1309  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", 0) = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936737-1309  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", AT_REMOVEDIR) = -1 ENOTEMPTY (Directory not empty)
go_nuts_zfs.log-4936738-1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936739-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936740-1309  unlinkat(3, "b303", 0)            = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936741-1309  newfstatat(3, "b303", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
go_nuts_zfs.log-4936742-1309  openat(3, "b303", O_RDONLY|O_CLOEXEC) = 7
go_nuts_zfs.log-4936743-1309  epoll_ctl(4, EPOLL_CTL_ADD, 7, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936744-1309  getdents64(7, 0xc000aa6000 /* 6 entries */, 8192) = 176
go_nuts_zfs.log-4936745-1309  getdents64(7, 0xc000aa6000 /* 0 entries */, 8192) = 0

You're welcome to the full 550MB of strace spew if you want it rather than me trying to clumsily filter, but I appreciate that's no holiday for you either!

The dn_struct_rwlock lock which prevents the dnode structure from changing should be held over the dirty check and the subsequent call to dnode_next_offset(). This ensures that a clean dnode can't be dirtied before the data/hole is located. Furthermore, refactor the code to provide a dnode_is_dirty() helper function which checks the dnode for any dirty records to determine its dirtiness. Finally, hold the znodes range lock to prevent any raciness with concurrent reads or writes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#11900

behlendorf · 2021-10-08T22:18:55Z

@thesamesam would you mind trying this updated patch with your test setup, 6338227. It would be great if you could try it with zfs_dmu_offset_next_sync=0 first, if you still observe a failure then please try zfs_dmu_offset_next_sync=1.

rincebrain · 2021-12-30T09:07:20Z

This can probably be closed now, unless someone's reproduced it since the fixes?

When using lseek(2) to report data/holes memory mapped regions of the file were ignored. This could result in incorrect results. To handle this zfs_holey_common() was updated to asynchronously writeback any dirty mmap(2) regions prior to reporting holes. Additionally, while not strictly required, the dn_struct_rwlock is now held over the dirty check to prevent the dnode structure from changing. This ensures that a clean dnode can't be dirtied before the data/hole is located. The range lock is now also taken to ensure the call cannot race with zfs_write(). Furthermore, the code was refactored to provide a dnode_is_dirty() helper function which checks the dnode for any dirty records to determine its dirtiness. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#11900 Closes openzfs#12724

In addition to flushing memory mapped regions when checking holes, commit de198f2 modified the dirty dnode detection logic to check the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying on the dirty record has not be reliable, switch back to the previous method. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#11900 Closes openzfs#12745

This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b upstream. See #14753 - possible corruption again, very similar symptoms to the nightmare that was #11900 and same area of code. We can safely revert it as it's an optimisation rather than a bugfix in itself. Keen Gentoo users may remember the following commits (in Gentoo): - 8e5626d - 9fb275f - 95c250a Bug: openzfs/zfs#14753 Bug: openzfs/zfs#11900 Signed-off-by: Sam James <sam@gentoo.org>

openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

openzfs/zfs#12724 openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

Seems corruption still happening with coreutils-9 even with the SEEK patch applied Bug: https://bugs.gentoo.org/815469 openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

Preliminary investigation suggests this patch (and earlier iterations) may introduce regressions. Let's yank it for now while we continue the investigation with upstream. Bug: https://bugs.gentoo.org/815469 See: openzfs/zfs#11900 Signed-off-by: Sam James <sam@gentoo.org>

take two Bug: https://bugs.gentoo.org/815469 openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b upstream. See #14753 - possible corruption again, very similar symptoms to the nightmare that was #11900 and same area of code. We can safely revert it as it's an optimisation rather than a bugfix in itself. Keen Gentoo users may remember the following commits (in Gentoo): - 8e5626dc90e4e6166c2e296371b6ff5a9d13a8c4 - 9fb275f656de639e25acc9497b70b4cae593d35d - 95c250a3f3986b2bc2091dd3981ff1e1d3de0c73 Bug: openzfs/zfs#14753 Bug: openzfs/zfs#11900 Signed-off-by: Sam James <sam@gentoo.org>

As a mitigation until more is understood and fixes are tested & reviewed, change the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before 05b3eb6d232009db247882a39d518e7282630753 upstream. There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1: that does not mean this is a cure or a real fix, but it _appears_ to be at least effective in reducing the chances of it happening. By itself, it's a safe change anyway, so it feels worth us doing while we wait. Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both 2.1.13 and 2.2.1. Bug: openzfs/zfs#11900 Bug: openzfs/zfs#15526 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org>

This helps mitigate a data corruption bug. This was previously defaulted to zero prior to upstream commit 05b3eb6d232009db247882a39d518e7282630753, and it is already a tunable, so doing this seems safe. Initially data corruption was thought to be introduced with v2.2, but further upstream investigation currently believes that changes in v2.2 only exacerbated the issue that already existed. A longer-term fix is likely to be openzfs/zfs#15571, though that is not yet merged. The zfs_2_1 package has already backported that, so do not apply the tunable default change there. Positioning of `extraPatches` is to avoid merge conflicts with NixOS#269097. Patch is nearly identical to the [Gentoo][1] patch, but better patch formatting. See openzfs/zfs#11900 See openzfs/zfs#15526 [1]: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch

This helps mitigate a data corruption bug. This was previously defaulted to zero prior to upstream commit 05b3eb6d232009db247882a39d518e7282630753, and it is already a tunable, so doing this seems safe. Initially data corruption was thought to be introduced with v2.2, but further upstream investigation currently believes that changes in v2.2 only exacerbated the issue that already existed. A longer-term fix is likely to be openzfs/zfs#15571, though that is not yet merged. The zfs_2_1 package has already backported that, so do not apply the tunable default change there. Positioning of `extraPatches` is to avoid merge conflicts with NixOS#269097. Patch is nearly identical to the [Gentoo][1] patch, but better patch formatting. See openzfs/zfs#11900 See openzfs/zfs#15526 [1]: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch (cherry picked from commit 9670ddc)

This helps mitigate a data corruption bug. This was previously defaulted to zero prior to upstream commit 05b3eb6d232009db247882a39d518e7282630753, and it is already a tunable, so doing this seems safe. Initially data corruption was thought to be introduced with v2.2, but further upstream investigation currently believes that changes in v2.2 only exacerbated the issue that already existed. A longer-term fix is likely to be openzfs/zfs#15571, though that is not yet merged. The zfs_2_1 package has already backported that, so do not apply the tunable default change there. Positioning of `extraPatches` is to avoid merge conflicts with NixOS#269097. Patch is nearly identical to the [Gentoo][1] patch, but better patch formatting. See openzfs/zfs#11900 See openzfs/zfs#15526 [1]: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch

As a mitigation until more is understood and fixes are tested & reviewed, change the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before 05b3eb6d232009db247882a39d518e7282630753 upstream. There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1: that does not mean this is a cure or a real fix, but it _appears_ to be at least effective in reducing the chances of it happening. By itself, it's a safe change anyway, so it feels worth us doing while we wait. Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both 2.1.13 and 2.2.1. Bug: openzfs/zfs#11900 Bug: openzfs/zfs#15526 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org>

ismell added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Apr 13, 2021

zmedico mentioned this issue Apr 14, 2021

Add runtime fallback for _optimized_copyfile(). gentoo/portage#647

Open

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Sep 29, 2021

sys-fs/zfs: revbump 2.0.6, add coreutils-9 blocker

fb6dc96

openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Oct 6, 2021

behlendorf self-assigned this Oct 6, 2021

Mic92 mentioned this issue Nov 16, 2021

Revert "zfs: unlock for 5.15" NixOS/nixpkgs#146310

Merged

1 task

behlendorf closed this as completed Jan 7, 2022

rincebrain mentioned this issue Apr 15, 2022

MySQL's ib_logfile1 cannot be created properly #13329

Closed

jittygitty mentioned this issue Apr 30, 2022

FAST-Tracking REFLINK and Offline Deduplication, first for LINUX only #13349

Closed

pittagurneyi mentioned this issue Jul 4, 2022

ZFS 2.0.7 unsupported on LTS >= 5.15.37, Missing (supported) ZFS LTS Release #13624

Open

rincebrain mentioned this issue Mar 9, 2023

severe performance regression on virtual disk migration for qcow2 on ZFS with ZFS 2.1.5 #14594

Open

rincebrain mentioned this issue Apr 15, 2023

Data corruption with 519851122b1703b8 ("ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()") #14753

Closed

gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023

sys-fs/zfs: revbump 2.0.6, add coreutils-9 blocker

5a836d3

openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023

sys-fs/zfs-kmod: revbump, add HOLE patch and dist-kernel version limit

41e11b5

openzfs/zfs#12724 openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023

sys-fs/zfs-kmod: revbump 2.1.1 with SEEK/mmap/dnode fixes

9dd5744

take two Bug: https://bugs.gentoo.org/815469 openzfs/zfs#11900 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

thesamesam mentioned this issue Nov 14, 2023

some copied files are corrupted (chunks replaced by zeros) #15526

Closed

rincebrain mentioned this issue Nov 22, 2023

How to detect bad block cloning? #15554

Open

amarshall mentioned this issue Nov 26, 2023

zfs: default disable zfs_dmu_offset_next_sync to avoid data corruption NixOS/nixpkgs#270275

Merged

13 tasks

rincebrain mentioned this issue Mar 16, 2024

Add a kill switch for SEEK_HOLE/SEEK_DATA #16000

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEEK_DATA fails randomly #11900

SEEK_DATA fails randomly #11900

ismell commented Apr 13, 2021

vapier commented Apr 13, 2021

h1z1 commented Apr 16, 2021 •

edited

ismell commented Apr 19, 2021

behlendorf commented Apr 19, 2021

ismell commented Apr 19, 2021

behlendorf commented Apr 19, 2021

ismell commented Apr 20, 2021

behlendorf commented Apr 20, 2021

ismell commented Apr 20, 2021

behlendorf commented Apr 20, 2021

ismell commented May 10, 2021

behlendorf commented May 12, 2021 •

edited

ismell commented Jun 21, 2021

thesamesam commented Sep 27, 2021 •

edited

behlendorf commented Sep 28, 2021

thesamesam commented Sep 28, 2021 •

edited

behlendorf commented Oct 6, 2021

thesamesam commented Oct 6, 2021 •

edited

behlendorf commented Oct 7, 2021

thesamesam commented Oct 7, 2021 •

edited

behlendorf commented Oct 8, 2021

rincebrain commented Dec 30, 2021

SEEK_DATA fails randomly #11900

SEEK_DATA fails randomly #11900

Comments

ismell commented Apr 13, 2021

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

vapier commented Apr 13, 2021

h1z1 commented Apr 16, 2021 • edited

ismell commented Apr 19, 2021

behlendorf commented Apr 19, 2021

ismell commented Apr 19, 2021

behlendorf commented Apr 19, 2021

ismell commented Apr 20, 2021

No patch

With fsync

behlendorf commented Apr 20, 2021

ismell commented Apr 20, 2021

behlendorf commented Apr 20, 2021

ismell commented May 10, 2021

behlendorf commented May 12, 2021 • edited

ismell commented Jun 21, 2021

thesamesam commented Sep 27, 2021 • edited

behlendorf commented Sep 28, 2021

thesamesam commented Sep 28, 2021 • edited

behlendorf commented Oct 6, 2021

thesamesam commented Oct 6, 2021 • edited

behlendorf commented Oct 7, 2021

thesamesam commented Oct 7, 2021 • edited

behlendorf commented Oct 8, 2021

rincebrain commented Dec 30, 2021

h1z1 commented Apr 16, 2021 •

edited

behlendorf commented May 12, 2021 •

edited

thesamesam commented Sep 27, 2021 •

edited

thesamesam commented Sep 28, 2021 •

edited

thesamesam commented Oct 6, 2021 •

edited

thesamesam commented Oct 7, 2021 •

edited