Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some copied files are corrupted (chunks replaced by zeros) #15526

Closed
terinjokes opened this issue Nov 14, 2023 · 338 comments · Fixed by #15571
Closed

some copied files are corrupted (chunks replaced by zeros) #15526

terinjokes opened this issue Nov 14, 2023 · 338 comments · Fixed by #15571
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@terinjokes
Copy link

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version (rolling)
Kernel Version 6.5.11
Architecture amd64
OpenZFS Version 2.2.0
Reference https://bugs.gentoo.org/917224

Describe the problem you're observing

When installing the Go compiler with Portage, many of the internal compiler commands have been corrupted by having most of the files replaced by zeros.

$  file /usr/lib/go/pkg/tool/linux_amd64/* | grep data
/usr/lib/go/pkg/tool/linux_amd64/asm:       data
/usr/lib/go/pkg/tool/linux_amd64/cgo:       data
/usr/lib/go/pkg/tool/linux_amd64/compile:   data
/usr/lib/go/pkg/tool/linux_amd64/covdata:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=xHCzRQtrkEP-Bbxql0SF/zxsofCJFlBoPlUclgwBG/TrsgK6SKiY4q6TIhyBjU/UwcISvZgqfQaEf3Kr_Tq, not stripped
/usr/lib/go/pkg/tool/linux_amd64/cover:     data
/usr/lib/go/pkg/tool/linux_amd64/link:      data
/usr/lib/go/pkg/tool/linux_amd64/vet:       data

$ hexdump /usr/lib/go/pkg/tool/linux_amd64/compile
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000fa0 0000 0000 0000 0000 0000 0000 5a41 3447
0000fb0 336a 3933 5a49 4f2d 6641 6342 7a6d 3646
0000fc0 582f 5930 5a4d 6761 5659 6f34 6d39 4130
0000fd0 4957 6555 2f67 686d 6a63 6675 5976 4e6a
0000fe0 346c 3070 5157 494e 5f41 5a2f 336d 6342
0000ff0 4e6d 4a4f 306c 4277 4a72 774d 4d41 006c
0001000 0000 0000 0000 0000 0000 0000 0000 0000
*
0ac9280 5a41 3447 336a 3933 5a49 4f2d 6641 6342
0ac9290 7a6d 3646 582f 5930 5a4d 6761 5659 6f34
0ac92a0 6d39 4130 4957 6555 2f67 686d 6a63 6675
0ac92b0 5976 4e6a 346c 3070 5157 494e 5f41 5a2f
0ac92c0 336d 6342 4e6d 4a4f 306c 4277 4a72 774d
0ac92d0 4d41 006c 0000 0000 0000 0000 0000 0000
0ac92e0 0000 0000 0000 0000 0000 0000 0000 0000
*
1139380 0000 0000 0000 0000 0000
1139389

I'm able to reproduce on two separate machines running 6.5.11 and ZFS 2.2.0.

ZFS does not see any errors with the pool.

$ zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:07:24 with 0 errors on Wed Nov  1 00:06:45 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        zroot                                         ONLINE       0     0     0
          nvme-WDS100T1X0E-XXXXXX_XXXXXXXXXXXX-part2  ONLINE       0     0     0

errors: No known data errors

$ zpool status -t
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:07:24 with 0 errors on Wed Nov  1 00:06:45 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        zroot                                         ONLINE       0     0     0
          nvme-WDS100T1X0E-XXXXXX_XXXXXXXXXXXX-part2  ONLINE       0     0     0  (100% trimmed, completed at Tue 31 Oct 2023 11:15:47 PM GMT)

errors: No known data errors

Describe how to reproduce the problem

  1. On a system running ZFS 2.2.0, upgrade pools to enable the block cloning feature.
  2. emerge -1 dev-lang/go, where Portage's TMPDIR is on ZFS.
  3. After a successful install of Go, the files in /usr/lib/go/pkg/tool/linux_amd64/compile are corrupted.

I was able to reproduce with and without Portage's "native-extensions" feature. I was unable to reproduce after changing Portage's TMPDIR to another filesystem (such as tmpfs).

Include any warning/errors/backtraces from the system logs

@terinjokes terinjokes added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 14, 2023
@thulle
Copy link

thulle commented Nov 14, 2023

I'm also hit by this bug, happened after upgrade to ZFS 2.2.0. Mounting tmpfs on portage TMPDIR (build directory for go) solves the issue. Currently on kernel 6.5.9, originally happened on 6.5.7.

No idea if it's worth noting, but the remaining data in the files seems to be a repetition of the same base64 encoded data. Decoding it gave me nothing comprehensible.

@terinjokes
Copy link
Author

After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption. My understanding is 8.32 predates the switch to automatically using reflink?

No idea if it's worth noting, but the remaining data in the files seems to be a repetition of the same base64 encoded data. Decoding it gave me nothing comprehensible.

This is the Go BuildID. Compare the output of a non-corrupted build:

$ file /usr/lib/go/pkg/tool/linux_amd64/compile
/usr/lib/go/pkg/tool/linux_amd64/compile: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=AZG4j339IZ-OAfBcmzF6/X0YMZagYV4o9m0AWIUeg/mhcjufvYjNl4p0WQNIA_/Zm3BcmNOJl0wBrJMwAMl, not stripped

With the data remaining in the file:

$ hexdump -C /var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/compile
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000fa0  00 00 00 00 00 00 00 00  00 00 00 00 41 5a 47 34  |............AZG4|
00000fb0  6a 33 33 39 49 5a 2d 4f  41 66 42 63 6d 7a 46 36  |j339IZ-OAfBcmzF6|
00000fc0  2f 58 30 59 4d 5a 61 67  59 56 34 6f 39 6d 30 41  |/X0YMZagYV4o9m0A|
00000fd0  57 49 55 65 67 2f 6d 68  63 6a 75 66 76 59 6a 4e  |WIUeg/mhcjufvYjN|
00000fe0  6c 34 70 30 57 51 4e 49  41 5f 2f 5a 6d 33 42 63  |l4p0WQNIA_/Zm3Bc|
00000ff0  6d 4e 4f 4a 6c 30 77 42  72 4a 4d 77 41 4d 6c 00  |mNOJl0wBrJMwAMl.|
00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00ac9280  41 5a 47 34 6a 33 33 39  49 5a 2d 4f 41 66 42 63  |AZG4j339IZ-OAfBc|
00ac9290  6d 7a 46 36 2f 58 30 59  4d 5a 61 67 59 56 34 6f  |mzF6/X0YMZagYV4o|
00ac92a0  39 6d 30 41 57 49 55 65  67 2f 6d 68 63 6a 75 66  |9m0AWIUeg/mhcjuf|
00ac92b0  76 59 6a 4e 6c 34 70 30  57 51 4e 49 41 5f 2f 5a  |vYjNl4p0WQNIA_/Z|
00ac92c0  6d 33 42 63 6d 4e 4f 4a  6c 30 77 42 72 4a 4d 77  |m3BcmNOJl0wBrJMw|
00ac92d0  41 4d 6c 00 00 00 00 00  00 00 00 00 00 00 00 00  |AMl.............|
00ac92e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01139380  00 00 00 00 00 00 00 00  00                       |.........|
01139389

@RichardBelzer
Copy link

Had the same issue here... Upgraded from 2.1.13 to 2.2.0 last month right after it came out. I'm on Ubuntu and built the ZFS package from the 2.2.0 tag.

I am discovering random files that were being stored after the 2.2.0 upgrade have suffered silent corruption and either have repetitious data like @thulle reported or have large blocks of contiguous zeroes instead of the data that they should be filled with. I don't use or build go like the other posters.

zpool scrub reports no errors, zpool status reports no errors. Let me know if there is any other information needed.

@thulle
Copy link

thulle commented Nov 14, 2023

After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption.

This seems to solve the corruption issue on my end too.

@thesamesam
Copy link
Contributor

thesamesam commented Nov 14, 2023

#11900 (comment) onwards is highly relevant. See also #14753 and #11900 as a whole.

It would be interesting to know if people here have all upgraded their pool for the new block cloning feature or if there's a mix here.

@thulle
Copy link

thulle commented Nov 14, 2023

feature@block_cloning active here

@RichardBelzer
Copy link

RichardBelzer commented Nov 14, 2023

My understanding is that block cloning is enabled by default when upgrading to 2.2.0, but my not be getting used unless an application like coreutils is actually leveraging it.

Can people post the result of this:

zpool get all tank | grep bclone

(where tank is the name of their pool with the corruption)

@thulle
Copy link

thulle commented Nov 14, 2023

kc3000    bcloneused                     442M                           -
kc3000    bclonesaved                    1.42G                          -
kc3000    bcloneratio                    4.30x                          -

@RichardBelzer
Copy link

RichardBelzer commented Nov 14, 2023

My understanding is this: If the result is 0 for both bcloneused and bclonesaved then it's safe to say that you don't have silent corruption.

Any non-zero number for either field and you may OR may not have silent corruption.

@rincebrain
Copy link
Contributor

That is likely but not really safe to say yet, we don't have enough data.

@terinjokes
Copy link
Author

but my not be getting used unless an application like coreutils is actually leveraging it.

Is it accurate to say it's being used when the FICLONE ioctl is used, and possibly when copy_source_range?

@rincebrain
Copy link
Contributor

I wouldn't even feel comfortable saying that yet, I agree it's likely, but I'd really like more certainty before declaring "you're safe if not this".

@RichardBelzer
Copy link

Agreed, we don't know what we don't know, we need to:

  1. Get a machine / test case to repro it most of the time (Ideally 100%). I think we're here already here with newer coreutils and building go.
  2. Determine what we think is the bug and produce a patch
  3. Patch the source, rerun the test(s), verify that this now happens 0% of the time

I don't know if there's a more robust way of figuring this out outside of that. And of course I'm handwaving 2 which could be a heavy lift.

@rincebrain
Copy link
Contributor

I'd assume, not having looked, that it's something like copy_file_range isn't dirtying things so the thing that triggers a force txg sync on SEEK_DATA/SEEK_HOLE with a dirty thing isn't firing.

But as I'm somewhat occupied recovering from being badly ill at home, my time to test and debug this is rather finite, so I wouldn't assume I'll be doing that in a timely fashion.

@robn
Copy link
Contributor

robn commented Nov 14, 2023

I agree with @rincebrain on the rough theory. I did look, and landed on code I've looked at before while trying to get my head around corner cases in block cloning. I have been suspicious of it for a while now, and remain so.

When we start to clone a block, we indicate intent to modify the target dbuf by calling dmu_buf_will_clone, which is a special case that unwinds any changes on the dbuf, and then sets it to DB_NOFILL.

In dmu.c:

void
dmu_buf_will_clone(dmu_buf_t *db_fake, dmu_tx_t *tx)
{
	dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;

	/*
	 * Block cloning: We are going to clone into this block, so undirty
	 * modifications done to this block so far in this txg. This includes
	 * writes and clones into this block.
	 */
	mutex_enter(&db->db_mtx);
	VERIFY(!dbuf_undirty(db, tx));
	ASSERT0P(dbuf_find_dirty_eq(db, tx->tx_txg));
	if (db->db_buf != NULL) {
		arc_buf_destroy(db->db_buf, db);
		db->db_buf = NULL;
	}
	mutex_exit(&db->db_mtx);

	dmu_buf_will_not_fill(db_fake, tx);
}

void
dmu_buf_will_not_fill(dmu_buf_t *db_fake, dmu_tx_t *tx)
{
	dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;

	mutex_enter(&db->db_mtx);
	db->db_state = DB_NOFILL;
	DTRACE_SET_STATE(db, "allocating NOFILL buffer");
	mutex_exit(&db->db_mtx);

	dbuf_noread(db);
	(void) dbuf_dirty(db, tx);
}

It seems there's a window there where the lock is down and the dbuf is not-dirty. That may be a place where a second thread can add a change to that block, which then gets trampled.

Unfortunately dbuf_dirty and dbuf_undirty are extremely complex, and the gap is tiny, and without a tight test case its gonna take me a lot of time to track this down, which I don't have at the moment1. So this is an unhelpful comment but maybe it gives someone else an idea of where to look.

Footnotes

  1. but you can buy some of my time 😉

@grahamperrin
Copy link
Contributor

#15526 (comment)

My understanding is that block cloning is enabled by default when upgrading to 2.2.0, …

With FreeBSD 14.0-RELEASE, there's complementary use of the vfs.zfs.bclone_enabled kernel state, 0 (zero):

root@mowa219-gjp4-freebsd-14-zfs-vm:~ # sysctl -d vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: Enable block cloning
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # sysctl vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: 0
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # freebsd-version -kru ; uname -aKU
14.0-RELEASE
14.0-RELEASE
14.0-RELEASE
FreeBSD mowa219-gjp4-freebsd-14-zfs-vm 14.0-RELEASE FreeBSD 14.0-RELEASE releng/14.0-n265380-f9716eee8ab GENERIC amd64 1400097 1400097
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # pkg upgrade -r FreeBSD-base
Updating FreeBSD-base repository catalogue...
FreeBSD-base repository is up to date.
All repositories are up to date.
Checking for upgrades (0 candidates): 100%
Processing candidates (0 candidates): 100%
Checking integrity... done (0 conflicting)
Your packages are up to date.
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # exit
logout
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % pkg -vv | grep -e url -e enabled
    url             : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/latest",
    enabled         : yes,
    url             : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/base_release_0",
    enabled         : yes,
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % 

sysctl(8)

Whilst 14.0 is not yet announced, I can't imagine a change to the value at this time.

@rincebrain
Copy link
Contributor

Time to upstream that and add Linux support, with how broken this is.

@KungFuJesus
Copy link

KungFuJesus commented Nov 15, 2023

I am curious if it is broken in the same way on FreeBSD. Has anyone managed to reproduce this with intentional reflink copies? From what I recall the implementer's main platform was FreeBSD.

Also some of these stack traces are blown assertions in the ZIL. Are we certain all of these are in regard to the BRT feature? I know a lot of the more recent optimizations with regard to ZIL messed with lock granularity, do we know that these things haven't happened with BRT not being leveraged?

@rincebrain
Copy link
Contributor

Here is probably not the right place to debate whether the other bugs around the ZIL are or are not from BRT. At least in 15529, I only tagged the ones where it was explicitly the case that block cloning was happening.

I'm also not sure what you mean by "intentional reflink support" here - in both FreeBSD and Linux's case, they're calling a copy_file_range analogue that is going to reflink if it can and do a boring copy otherwise, nobody here is explicitly trying FICLONE, which has last I checked no analogue on FreeBSD at all.

@terinjokes
Copy link
Author

The cp from coreutils-9.3 is using FICLONE, and falling back to copy_file_range if it fails.

@KungFuJesus
Copy link

By intentional I mean doing cp with a reflink argument, assuming the BSD variant of cp has it. I'd expect that by default it doesn't try to use it automatically like coreutils' --reflink=auto behavior. When something like make is copying things around with quickly generated build artifacts, I expect you're dealing with code that is likely to contend race conditions more than a file that's been stable on disk for a few seconds and is being copied once with the BRT backed clone feature.

@rincebrain
Copy link
Contributor

My understanding is that cp on FreeBSD just invokes copy_file_range in every case, and as I said, has no FICLONE analogue.

@KungFuJesus
Copy link

Ah gotcha, so the benefits of the reduced syscall are there without requiring repeated userspace / kernelspace round trips, but the BRT based shortcut and space savings are prohibited behind that sysctl. Makes sense. I am curious if FICLONE is the issue here and copy_file_range worked just fine?

@rincebrain
Copy link
Contributor

Since the code, if I'm reading it right, for FICLONE just invokes the same backend function, it shouldn't be FICLONE specifically. I would guess differing semantics about how easy races are to trigger between the two platforms' memory management, but I really don't know.

@tonyhutter
Copy link
Contributor

Pinging @pjd on block cloning.

I'm seeing if I can make a non-gentoo reproducer.

@thulle
Copy link

thulle commented Nov 16, 2023

Since I have 1.42GB of possibly affected files I thought I'd check for stretches of zeroes in the other files to see if anything other than compiling go might have triggered this.
The block cloning commit only seem to change zdb code regarding the ZIL, so I just want to verify that we have no way of dumping the BRT to check which files contain cloned blocks, right?

Currently dumping a list of all file blocks to check for multiple references to the same block.

@rincebrain
Copy link
Contributor

I don't think zdb knows how to dump it as it is now, no. I don't think it'd be hard to teach it, though.

@robn
Copy link
Contributor

robn commented Nov 16, 2023

Its not hard to dump, but also it doesn't really show what you want. Each entry is just the offset within a vdev (half a DVA), and the number of references to it. It doesn't know what a file is, nor even really what a block pointer is.

Probably the dumbest version is to extend the in-memory mini-BRT used in zdb -b to note the object in which cloned blocked were seen, and then dump those out at the end. It does mean a full scan though (about the same heft as a scrub).

Hmm, now I think about it, it might not help anyway. If the problem is that there was a race, and we applied a change in the wrong order, such that zeroes got written, then the clone itself was never performed - we did a real write, just with zero content. So there's no clone to look for.

@numinit
Copy link
Contributor

numinit commented Dec 4, 2023

correct, it will generally only replicate with a coreutils that is trying the SEEK_HOLE optimization.

@ipkpjersi
Copy link

I'm using coreutils 8.32, so basically any distro that isn't using coreutils 9.x is completely unaffected by this issue?

@classabbyamp
Copy link

classabbyamp commented Dec 4, 2023

not completely, but it takes away many possible points of creating corruption (this also assumes the distro didn't backport the SEEK_HOLE changes, like EL9 did, iirc)

@Ukko-Ylijumala
Copy link

I'm using coreutils 8.32, so basically any distro that isn't using coreutils 9.x is completely unaffected by this issue?

To add to @classabbyamp answer, any software (besides coreutils) which tries to find holes in files it is both reading and writing at the same time is potentially affected. However, the potential timing window is so small and the access pattern so specific that the likelihood of actually tripping on this bug is rather small.

@ipkpjersi
Copy link

ipkpjersi commented Dec 4, 2023

In that case, would zfs send/recv be affected and would rsync be affected? Or would that only really be the case if coreutils 9.x is present?

I did find some ppas I could look into like ppa:arter97/zfs and ppa:patrickdk/zfs and ppa:ofthesun/zfs at least so I can probably patch my Ubuntu systems, but I have a feeling patching my Proxmox systems may require me learning how to build zfs from source, although I did find kneutron/ansitest has a couple zfs build scripts so maybe that will help me learn building zfs from source.

Thanks for all the additional information, it's quite interesting learning about this bug even though it's been patched already.

Edit: Sounds like there's a newer Proxmox with this bug already fixed/backported so that's good at least.

Edit 2: I have gone with ppa:arter97/zfs and upgraded my Ubuntu systems to zfs 2.2.2 and upgraded my Proxmox systems to Proxmox 8.1 which supposedly includes the zfs fix backported to 2.2.0, so I suppose all is well for me now.

Relms12345 added a commit to Relms12345/buildroot that referenced this issue Dec 4, 2023
commit 5abe7bd
Author: Peter Korsgaard <peter@korsgaard.com>
Date:   Mon Dec 4 14:06:08 2023 +0100

    Update for 2023.08.4

    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 6b68ace
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 19:44:00 2023 +0100

    package/mariadb: security bump to version 10.11.6

    This bump will fix the following build failure raised since bump of fmt
    to version 10.1.0 in commit 619b558
    thanks to
    MariaDB/server@f4cec36:

    -- Performing Test HAVE_SYSTEM_LIBFMT
    -- Performing Test HAVE_SYSTEM_LIBFMT - Failed

    [...]

    -- Downloading...
       dst='/home/buildroot/autobuild/instance-3/output-1/build/mariadb-10.11.4/extra/libfmt/src/8.0.1.zip'
       timeout='none'
       inactivity timeout='none'
    -- Using src='https://github.com/fmtlib/fmt/archive/refs/tags/8.0.1.zip'
    CMake Error at libfmt-stamp/download-libfmt.cmake:170 (message):
      Each download failed!

        error: downloading 'https://github.com/fmtlib/fmt/archive/refs/tags/8.0.1.zip' failed
              status_code: 1
              status_string: "Unsupported protocol"
              log:
              --- LOG BEGIN ---
              Protocol "https" not supported or disabled in libcurl

    This bump will also fix CVE-2023-22084

    https://mariadb.com/kb/en/mariadb-10-11-5-release-notes/
    https://mariadb.com/kb/en/mariadb-10-11-6-release-notes/

    Fixes:
     - http://autobuild.buildroot.org/results/9cb577195aa939289102116df5a2eac03f0d5017

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit d20329e)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit b1509f7
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 18:42:04 2023 +0100

    package/libmemcached: fix static build

    Fix the following static build failure raised since bump to version
    1.1.4 in commit 7205df8:

    CMake Error at /home/autobuild/autobuild/instance-13/output-1/build/libmemcached-1.1.4/src/bin/cmake_install.cmake:60 (file):
      file RPATH_CHANGE could not write new RPATH:

        $ORIGIN/../lib

      to the file:

        /home/autobuild/autobuild/instance-13/output-1/host/arc-buildroot-linux-uclibc/sysroot/usr/bin/memcapable

      No valid ELF RPATH or RUNPATH entry exists in the file;
    Call Stack (most recent call first):
      /home/autobuild/autobuild/instance-13/output-1/build/libmemcached-1.1.4/src/cmake_install.cmake:52 (include)
      /home/autobuild/autobuild/instance-13/output-1/build/libmemcached-1.1.4/cmake_install.cmake:52 (include)

    Fixes:
     - http://autobuild.buildroot.org/results/778ff517d465896f54a3cd5316a66c54f66fd4cb

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit b47b206)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit dedfab8
Author: Peter Korsgaard <peter@korsgaard.com>
Date:   Fri Dec 1 22:14:01 2023 +0100

    toradex_apalis_imx6_defconfig: add download hashes for linux/uboot

    The defconfig fetches Linux and U-Boot from a git repo using the
    unauthenticated git:// protocol, so add download hashes for them to ensure
    we get the right sources by adding a global patch dir and running
    utils/add-custom-hashes.

    The defconfig uses the Linux sources for the kernel headers, so make
    linux-headers/linux-headers.hash a symlink to linux/linux.hash so the same
    hash file is used.

    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit cdc9b8a)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 100ba32
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 15:54:18 2023 +0100

    package/xenomai: fix build with gcc >= 12

    Fix the following build failure with gcc >= 12:

    task.c: In function 't_start':
    task.c:398:16: error: 'ret' may be used uninitialized [-Werror=maybe-uninitialized]
      398 |         return ret;
          |                ^~~
    task.c:364:13: note: 'ret' was declared here
      364 |         int ret;
          |             ^~~
    task.c: In function 't_resume':
    task.c:444:16: error: 'ret' may be used uninitialized [-Werror=maybe-uninitialized]
      444 |         return ret;
          |                ^~~
    task.c:428:13: note: 'ret' was declared here
      428 |         int ret;
          |             ^~~

    Fixes:
     - http://autobuild.buildroot.org/results/bc1b40de22e563b704ad7f20b6bf4d1f73a6ed8a

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit a3db1dd)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit ce9b0d5
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 15:15:18 2023 +0100

    package/speechd: fix NLS build

    Fix the following NLS build failure raised since the addition of the
    package in commit 9f4f8c5:

    /home/buildroot/autobuild/run/instance-2/output-1/host/lib/gcc/arm-buildroot-linux-musleabihf/12.3.0/../../../../arm-buildroot-linux-musleabihf/bin/ld: ../../src/common/.libs/libcommon.a(libcommon_la-i18n.o): undefined reference to symbol 'libintl_bindtextdomain'

    Fixes:
     - http://autobuild.buildroot.org/results/8ab13cf474d732c95a1da65592d950b24b3d474b

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit f6a7050)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 37dfdda
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 09:44:45 2023 +0100

    package/libmemcached: fix build with gcc 4.8

    Fix the following build failure with gcc 4.8 raised since bump to
    version 1.1.4 in commit 7205df8:

    /home/buildroot/autobuild/run/instance-0/output-1/build/libmemcached-1.1.4/src/libmemcachedprotocol/ascii_handler.c: In function 'ascii_get_response_handler':
    /home/buildroot/autobuild/run/instance-0/output-1/build/libmemcached-1.1.4/src/libmemcachedprotocol/ascii_handler.c:249:3: error: 'for' loop initial declarations are only allowed in C99 mode
       for (int x = 0; x < keylen; ++x) {
       ^

    Fixes:
     - http://autobuild.buildroot.org/results/202aeec4dda822ac341d8882f84f968a303697c3

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 5eb79ff)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 50abc2e
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 15:20:11 2023 +0100

    package/libde265: security bump to version 1.0.14

    Fix CVE-2023-43887: Libde265 v1.0.12 was discovered to contain multiple
    buffer overflows via the num_tile_columns and num_tile_row parameters in
    the function pic_parameter_set::dump.

    Fix CVE-2023-47471: Buffer Overflow vulnerability in strukturag libde265
    v1.10.12 allows a local attacker to cause a denial of service via the
    slice_segment_header function in the slice.cc component.

    https://github.com/strukturag/libde265/releases/tag/v1.0.14
    https://github.com/strukturag/libde265/releases/tag/v1.0.13

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 4cf5d91)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 2369c3b
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Dec 3 09:02:14 2023 +0100

    package/libmemcached: link with -latomic when needed

    Fix the following build failure raised since bump to version 1.1.4 in
    commit 7205df8:

    /home/buildroot/autobuild/instance-2/output-1/host/opt/ext-toolchain/bin/../lib/gcc/sparc-buildroot-linux-uclibc/11.3.0/../../../../sparc-buildroot-linux-uclibc/bin/ld: CMakeFiles/aslap.dir/ms_conn.c.o: undefined reference to symbol '__atomic_fetch_add_4@@LIBATOMIC_1.0'

    Fixes:
     - http://autobuild.buildroot.org/results/c8e4e1f9609d1339fe070afe440c63660892600e

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit a73cbe6)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 55678b8
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sat Dec 2 22:45:29 2023 +0100

    package/putty: disable gssapi

    PUTTY_GSSAPI is enabled by default resulting in the following build
    failure since bump to version 0.78 in commit
    5673ea3:

     /home/fabrice/buildroot/output/build/putty-0.79/unix/gss.c:133:10: fatal error: gssapi/gssapi.h: No such file or directory
      133 | #include <gssapi/gssapi.h>
          |          ^~~~~~~~~~~~~~~~~

    Fixes:
     - http://autobuild.buildroot.org/results/d6d06b5aa0df070c3880399e044fb3cd3a830aec

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 499b4d6)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 49da7a4
Author: Francois Perrad <fperrad@gmail.com>
Date:   Sun Dec 3 09:42:51 2023 +0100

    package/perl: security bump to version 5.36.3

    fix CVE-2023-47038 - Write past buffer end via illegal user-defined Unicode property

    note: 5.36.2 was a broken release
    Signed-off-by: Francois Perrad <francois.perrad@gadz.org>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit bc7b0e1)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 0b3f844
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Fri Dec 1 22:23:18 2023 +0100

    package/libpjsip: security bump to version 2.14

    Fix CVE-2023-38703: PJSIP is a free and open source multimedia
    communication library written in C with high level API in C, C++, Java,
    C#, and Python languages. SRTP is a higher level media transport which
    is stacked upon a lower level media transport such as UDP and ICE.
    Currently a higher level transport is not synchronized with its lower
    level transport that may introduce use-after-free issue. This
    vulnerability affects applications that have SRTP capability
    (`PJMEDIA_HAS_SRTP` is set) and use underlying media transport other
    than UDP. This vulnerability’s impact may range from unexpected
    application termination to control flow hijack/memory corruption. The
    patch is available as a commit in the master branch.

    GHSA-f76w-fh7c-pc66
    https://github.com/pjsip/pjproject/releases/tag/2.14

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 38c4aa2)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 275d74b
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Fri Dec 1 21:38:22 2023 +0100

    package/putty: fix static build

    Fix the following static build failure raised since bump to version 0.78
    in commit 5673ea3:

    In file included from /home/buildroot/autobuild/instance-0/output-1/build/putty-0.78/putty.h:8,
                     from /home/buildroot/autobuild/instance-0/output-1/build/putty-0.78/callback.c:8:
    /home/buildroot/autobuild/instance-0/output-1/build/putty-0.78/unix/platform.h:11:10: fatal error: dlfcn.h: No such file or directory
       11 | #include <dlfcn.h>                     /* Dynamic library loading */
          |          ^~~~~~~~~

    Fixes:
     - http://autobuild.buildroot.org/results/06f0b14bd0414f97b06070198e290fb3253348c5

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 3d8e0a2)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 758b779
Author: Bernd Kuhls <bernd@kuhls.net>
Date:   Fri Dec 1 21:34:15 2023 +0100

    package/samba4: security bump version to 4.18.9

    Fixes CVE-2018-14628:
    https://www.samba.org/samba/security/CVE-2018-14628.html

    Release notes:
    https://www.samba.org/samba/history/samba-4.18.9.html

    Signed-off-by: Bernd Kuhls <bernd@kuhls.net>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 75abb66
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Thu Nov 30 23:49:04 2023 +0100

    package/rtty: fix wolfssl build

    Fix the following wolfssl build failure raised at least since bump to
    version 7.4.0 in commit 6b5907b:

    /home/autobuild/autobuild/instance-4/output-1/build/rtty-8.1.0/src/ssl/openssl.c: In function 'ssl_last_error_string':
    /home/autobuild/autobuild/instance-4/output-1/build/rtty-8.1.0/src/ssl/openssl.c:143:24: error: implicit declaration of function 'ERR_peek_error_line_data'; did you mean 'wolfSSL_ERR_get_error_line_data'? [-Werror=implicit-function-declaration]
      143 |         ssl_err_code = ERR_peek_error_line_data(&file, &line, &data, &flags);
          |                        ^~~~~~~~~~~~~~~~~~~~~~~~
          |                        wolfSSL_ERR_get_error_line_data

    Fixes:
     - http://autobuild.buildroot.org/results/9db9f1dcc6760de4b78771bb79f109c4efd06c36
     - http://autobuild.buildroot.org/results/16422af9469de114e552124542508c3b18ea8f19

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    [yann.morin.1998@free.fr: don't force wolfssl-all]
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 67cb7d8)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 4073574
Author: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Date:   Fri Dec 1 08:33:05 2023 +0100

    package/zfs: bump version to 2.2.2

    This release contains an important fix for a data corruption
    bug. Full details are in the issue [1] and bug fix [2].

    1. openzfs/zfs#15526
    2. openzfs/zfs#15571

    Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit c068fc4)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 9e2e2cb
Author: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Date:   Mon Nov 13 01:58:34 2023 +0100

    package/zfs: bump version to 2.2.0

    Removed backported patch:
    - https://github.com/openzfs/zfs/commit/bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch

    Updated ZFS test to pass this new version; drop the explicit /pool
    mountpoint option to rely on the default location (which happens to be
    /pool already).

    Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    [yann.morin.1998@free.fr:
      - needed on master to further bump to a data-corruption fix
    ]
    (cherry picked from commit d153e58)
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit a44d1a1)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 236a009
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Wed Nov 29 18:39:01 2023 +0100

    package/xtables-addons: bump to version 3.24

    This bump will fix the following build failure with kernel >= 6.2 thanks
    to
    https://codeberg.org/jengelh/xtables-addons/commit/51761c3fe2454e0b4bc25274dd55d4ab72c54bf0:

    /home/buildroot/autobuild/instance-1/output-1/build/xtables-addons-3.22/extensions/xt_TARPIT.c:
    In function 'xttarpit_honeypot':
    /home/buildroot/autobuild/instance-1/output-1/build/xtables-addons-3.22/extensions/xt_TARPIT.c:110:26:
    error: implicit declaration of function 'prandom_u32_max'; did you mean
    'prandom_u32_state'? [-Werror=implicit-function-declaration]
      110 |                         (prandom_u32_max(0x20) - 0xf);
          |                          ^~~~~~~~~~~~~~~
          |                          prandom_u32_state

    Fixes:
     - http://autobuild.buildroot.org/results/e8f2a0cb5b38ff98da97268c4b642554a0a732e1
     - http://autobuild.buildroot.org/results/0191ee0590c08b73f17b35a5c8521796693772b5

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 84b721c)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 49e3269
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Wed Nov 29 18:39:00 2023 +0100

    package/xtables-addons: drop unrecognized option

    --with-xtables is an unrecognized option since the addition of the
    package in commit 4909173:
    https://github.com/nawawi/xtables-addons/blob/a576f4d43e80f9f91705c9e6a86f2d58c283df14/configure.ac

    configure: WARNING: unrecognized options: --disable-gtk-doc, --disable-gtk-doc-html, --disable-doc, --disable-docs, --disable-documentation, --with-xmlto, --with-fop, --enable-ipv6, --disable-nls, --with-xtables

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit e81dc9d)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 0ffbc8e
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Wed Nov 29 22:43:08 2023 +0100

    package/imagemagick: security bump to version 7.1.1-21

    Fix CVE-2023-1289, CVE-2023-2157, CVE-2023-34151, CVE-2023-34152,
    CVE-2023-34153, CVE-2023-3428, CVE-2023-34474 and CVE-2023-34475

    https://github.com/ImageMagick/Website/blob/main/ChangeLog.md

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 758d79f)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit fb3f6d1
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Mon Nov 27 23:11:19 2023 +0100

    package/gsl: fix musl build on m68k

    Update patch to fix the following musl build failure with m68k which is
    only raised (for an unknown reason) since bump to version 2.7.1 in commit
    3e48f83:

    In file included from fp.c:6:
    fp-gnum68k.c:21:10: fatal error: fpu_control.h: No such file or directory
       21 | #include <fpu_control.h>
          |          ^~~~~~~~~~~~~~~

    Add also upstream link to first patch iteration which was sent in
    November 2022 but didn't get it any reply (like most of the other emails
    sent to bug-gsl@gnu.org ...)

    Fixes:
     - http://autobuild.buildroot.org/results/e59636f6ac148807c1c67f09eef0e0a9f5d52303

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 02e80e0)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit a17063e
Author: Yann E. MORIN <yann.morin@orange.com>
Date:   Mon Nov 27 10:40:44 2023 +0100

    package/erlang: disable for uclibc, fix glibc-build

    Commit 2cfa86a(package/erlang: bump version to 26.0.2) added a
    patch to restore building on uClibc.

    However, that patch is not upstream, and has been rejected:

        erlang/otp#7500

        Please open a PR to https://github.com/asmjit/asmjit instead and we
        will get the fix next time we sync with upstream. We do not want
        theirs and our implementation to diverge.

    Furthermore, it happens to work on uClibc, because uClibc does not
    expose sys/auxv.h, but it fails to work on glibc, because the define is
    not propagated to "sub-trees", and thus is never defined where it is
    checked for, even when sys/auxv.h is available. This causes build
    failures such as:

        asmjit/core/cpuinfo.cpp: In function ‘void asmjit::_abi_1_10::detectHWCaps(CpuInfo&, long unsigned int, const LinuxHWCapMapping*, size_t)’:
        asmjit/core/cpuinfo.cpp:840:24: error: ‘getauxval’ was not declared in this scope
          840 |   unsigned long mask = getauxval(type);
              |                        ^~~~~~~~~
        asmjit/core/cpuinfo.cpp: In function ‘void asmjit::_abi_1_10::detectARMCpu(CpuInfo&)’:
        asmjit/core/cpuinfo.cpp:972:21: error: ‘AT_HWCAP’ was not declared in this scope
          972 |   detectHWCaps(cpu, AT_HWCAP, hwCapMapping, ASMJIT_ARRAY_SIZE(hwCapMapping));
              |                     ^~~~~~~~
        asmjit/core/cpuinfo.cpp:973:21: error: ‘AT_HWCAP2’ was not declared in this scope
          973 |   detectHWCaps(cpu, AT_HWCAP2, hwCapMapping2, ASMJIT_ARRAY_SIZE(hwCapMapping2));
              |                     ^~~~~~~~~

    Yet, sys/auxv.h was detected at configure time:

        checking for sys/auxv.h... yes

    This defconfig is enough to reproduce the error:

        BR2_aarch64=y
        BR2_TOOLCHAIN_EXTERNAL=y
        BR2_TOOLCHAIN_EXTERNAL_BOOTLIN=y
        BR2_PACKAGE_ERLANG=y

    Since upstream refused the patch, and there is no fix that was submitted
    to the actual upstream (asmjit), drop the rejectred patch, and disable
    for uClibc: the patch is incorrect, and we can't fix a build issue on
    uClibc by introducing another on glibc.

    Fixes:
        http://autobuild.buildroot.org/results/fc1/fc19bad2263bdfacea594217d5ddfde0e27895b1/
        http://autobuild.buildroot.org/results/114/11416d81d5b27fc0627b335a971154c088d5754a/

    Signed-off-by: Yann E. MORIN <yann.morin@orange.com>
    Cc: Bernd Kuhls <bernd@kuhls.net>
    Cc: Maxim Kochetkov <fido_max@inbox.ru>

    Changes v1 -> v2:
      - update comment when unavailable

    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit fb72418)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 7867302
Author: Francois Perrad <fperrad@gmail.com>
Date:   Mon Nov 27 04:26:39 2023 +0100

    package/perl: security bump to 5.36.2

    fix CVE-2023-47038 - Write past buffer end via illegal user-defined Unicode property

    Signed-off-by: Francois Perrad <francois.perrad@gadz.org>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 127986f)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit d353e51
Author: Bernd Kuhls <bernd@kuhls.net>
Date:   Tue Nov 28 18:51:25 2023 +0100

    {linux, linux-headers}: bump 4.{14, 19}.x / 5.{4, 10, 15}.x / 6.{1, 5, 6}.x series

    Signed-off-by: Bernd Kuhls <bernd@kuhls.net>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit c9222fe)
    [Peter: drop 6.5.x / 6.6.x bump]
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit fe30c57
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Tue Nov 28 21:30:59 2023 +0100

    package/libxml2: security bump to version 2.11.6

    Fix CVE-2023-45322: libxml2 through 2.11.5 has a use-after-free that can
    only occur after a certain memory allocation fails. This occurs in
    xmlUnlinkNode in tree.c. NOTE: the vendor's position is "I don't think
    these issues are critical enough to warrant a CVE ID ... because an
    attacker typically can't control when memory allocations fail."

    https://gitlab.gnome.org/GNOME/libxml2/-/blob/v2.11.6/NEWS

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit e5af07d)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 11be509
Author: Bernd Kuhls <bernd@kuhls.net>
Date:   Sat Oct 7 12:25:00 2023 +0200

    package/libxml2: bump version to 2.11.5

    Release notes:
    https://download.gnome.org/sources/libxml2/2.11/libxml2-2.11.5.news

    Signed-off-by: Bernd Kuhls <bernd@kuhls.net>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 622698d)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 7241abc
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Tue Nov 28 21:23:52 2023 +0100

    package/vim: security bump to version 9.0.2136

    Fix CVE-2023-46246, CVE-2023-48231, CVE-2023-48232, CVE-2023-48233,
    CVE-2023-48234, CVE-2023-48235, CVE-2023-48236 and CVE-2023-48237

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 6bd302c)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit e6eda1b
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Tue Nov 28 21:21:13 2023 +0100

    package/squid: security bump to version 6.5

    Fix CVE-2023-5824, CVE-2023-46724, CVE-2023-46846, CVE-2023-46847 and
    CVE-2023-46848

    GHSA-543m-w2m2-g255
    GHSA-j83v-w3p4-5cqh
    GHSA-73m6-jm96-c6r3
    GHSA-phqj-m8gv-cq4g
    GHSA-2g3c-pg7q-g59w

    https://github.com/squid-cache/squid/blob/SQUID_6_5/ChangeLog

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 7fb3c96)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 7223351
Author: Waldemar Brodkorb <wbx@openadk.org>
Date:   Thu Oct 5 08:14:09 2023 +0200

    package/squid: bump version to 6.3

    Signed-off-by: Waldemar Brodkorb <wbx@openadk.org>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 0e15854)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit bc63929
Author: Waldemar Brodkorb <wbx@openadk.org>
Date:   Thu Aug 10 11:58:55 2023 +0200

    package/squid: update to 6.2

    See the release notes for Squid 6 for any news:
    http://www.squid-cache.org/Versions/v6/RELEASENOTES.html

    Tested with qemu_aarch64_virt_defconfig.

    Signed-off-by: Waldemar Brodkorb <wbx@openadk.org>
    Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
    (cherry picked from commit 2a7c681)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit c06c127
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Tue Nov 28 21:14:33 2023 +0100

    package/memcached: security bump to version 1.6.22

    Fix CVE-2023-46852: In Memcached before 1.6.22, a buffer overflow exists
    when processing multiget requests in proxy mode, if there are many
    spaces after the "get" substring.

    Fix CVE-2023-46853: In Memcached before 1.6.22, an off-by-one error
    exists when processing proxy requests in proxy mode, if \n is used
    instead of \r\n.

    https://github.com/memcached/memcached/wiki/ReleaseNotes1622

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit bc96e9d)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit f86173d
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Oct 1 15:04:59 2023 +0200

    package/memcached: fix uclibc-ng build

    Fix the following uclibc-ng build failure raised since bump to version
    1.6.21 in commit 6ce55ab and
    memcached/memcached@875371a:

    /home/buildroot/autobuild/instance-2/output-1/host/lib/gcc/arc-buildroot-linux-uclibc/10.2.0/../../../../arc-buildroot-linux-uclibc/bin/ld: memcached-thread.o: in function `thread_setname':
    thread.c:(.text+0xea2): undefined reference to `pthread_setname_np'

    Fixes:
     - http://autobuild.buildroot.org/results/e856d381f5ec7d2727f21c8bd46dacb456984416

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
    (cherry picked from commit bfa3cd7)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 1cdd069
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Sep 24 17:09:26 2023 +0200

    package/memcached: bump to version 1.6.21

    - Send first patch upstream
    - Drop second and third patches (already in version) and so drop
      autoreconf

    https://github.com/memcached/memcached/wiki/ReleaseNotes1618
    https://github.com/memcached/memcached/wiki/ReleaseNotes1619
    https://github.com/memcached/memcached/wiki/ReleaseNotes1620
    https://github.com/memcached/memcached/wiki/ReleaseNotes1621

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 6ce55ab)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 8b0ba84
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Tue Nov 28 21:12:50 2023 +0100

    package/vlc: security bump to version 3.0.20

    Fix CVE-2023-47359: Videolan VLC prior to version 3.0.20 contains an
    incorrect offset read that leads to a Heap-Based Buffer Overflow in
    function GetPacket() and results in a memory corruption.

    Fix CVE-2023-47360: Videolan VLC prior to version 3.0.20 contains an
    Integer underflow that leads to an incorrect packet length.

    https://code.videolan.org/videolan/vlc/-/blob/3.0.20/NEWS

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit d675873)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 31ddad9
Author: Bernd Kuhls <bernd@kuhls.net>
Date:   Tue Oct 17 22:20:57 2023 +0200

    package/vlc: bump version to 3.0.19

    Rebased patch 0006 due to upstream commit
    https://code.videolan.org/videolan/vlc/-/commit/3f9fc44176cc5505132977885799fa988c5e7701

    Release notes: https://code.videolan.org/videolan/vlc/-/blob/3.0.19/NEWS

    Signed-off-by: Bernd Kuhls <bernd@kuhls.net>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit f45fa3b)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 69f4ee8
Author: Brandon Maier <Brandon.Maier@collins.com>
Date:   Tue Nov 28 19:55:07 2023 +0000

    docs/website: fix favicon

    When the favicon image was added in f26e613 (docs/website: add
    favicon.png), it was added to a different directory then where the header's
    icon link points. This causes the favicon to fail to load with 404.

    While we are here, remove the "shortcut" rel attribute as it is non-standard
    and it's recommended not to use it[1].

    [1] https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel#sect4

    Signed-off-by: Brandon Maier <brandon.maier@collins.com>
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    (cherry picked from commit 8ad1a2e)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 66acf39
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Mon Nov 27 22:27:12 2023 +0100

    package/motion: fix webp build

    Fix the following build failure raised since bump of webp to version
    1.3.2 in commit c88c1d3:

    /home/autobuild/autobuild/instance-9/output-1/host/lib/gcc/aarch64_be-buildroot-linux-uclibc/13.2.0/../../../../aarch64_be-buildroot-linux-uclibc/bin/ld: picture.o: undefined reference to symbol 'WebPMemoryWriterClear'
    /home/autobuild/autobuild/instance-9/output-1/host/lib/gcc/aarch64_be-buildroot-linux-uclibc/13.2.0/../../../../aarch64_be-buildroot-linux-uclibc/bin/ld: /home/autobuild/autobuild/instance-9/output-1/host/aarch64_be-buildroot-linux-uclibc/sysroot/usr/lib64/libwebp.so.7: error adding symbols: DSO missing from command line

    Fixes:
     - http://autobuild.buildroot.org/results/9b859a701debeaddf1f9909e16adc6811a620576

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 1267a23)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 30bfbf6
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Mon Nov 27 22:25:58 2023 +0100

    package/exfatprogs: security bump to version 1.2.2

    Fix CVE-2023-45897: exfatprogs before 1.2.2 allows out-of-bounds memory
    access, such as in read_file_dentry_set.

    https://github.com/exfatprogs/exfatprogs/blob/1.2.2/NEWS

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 07dad08)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit b68a880
Author: Peter Seiderer <ps.report@gmx.net>
Date:   Tue Aug 8 20:09:58 2023 +0200

    board/raspberrypi/config_4_64bit.txt: remove testing dtoverlay entries (vc4-kms-v3d-pi4, imx219)

    Remove private/testing dtoverlay entries (vc4-kms-v3d-pi4, imx219 and
    commented out ov5647) wrongly introduced by commit 689b9ac
    ("package/rpi-firmware: rework boot/config file handling") [1].

    [1] https://git.buildroot.net/buildroot/commit/?id=689b9ac439ab7b507c8982b6102bddf59d03efbf

    Signed-off-by: Peter Seiderer <ps.report@gmx.net>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit fbf0a6e)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit ec866af
Author: Gaël PORTAY <gael.portay@rtone.fr>
Date:   Mon Nov 20 22:41:50 2023 +0100

    board/raspberrypi: fix autoprobing of bluetooth driver

    The commit 689b9ac (package/rpi-firmware: rework boot/config file
    handling) has split in two the property:

    	dtoverlay=miniuart-bt,krnbt=on

    Into:

    	dtoverlay=miniuart-bt
    	dtoverlay=krnbt=on

    The initial property contained the dtbo file miniuart-bt[1] and its
    parameter krnbt=on[2][3].

    The first syntax is correct while the second is not. The krnbt=on is not
    a dtoverlay[4] but a dtparam[5]. Therefore the property dtparam must be
    used instead.

    This fixes:

    	# cat /sys/firmware/devicetree/base/chosen/user-warnings
    	Failed to load overlay 'krnbt=on'

    [1]: https://github.com/raspberrypi/linux/blob/rpi-5.10.y/arch/arm/boot/dts/overlays/miniuart-bt-overlay.dts
    [2]: https://github.com/raspberrypi/linux/blob/rpi-5.10.y/arch/arm/boot/dts/overlays/miniuart-bt-overlay.dts#L91
    [3]: https://github.com/raspberrypi/linux/blob/rpi-5.10.y/arch/arm/boot/dts/overlays/README#L213-L215
    [4]: https://www.raspberrypi.com/documentation/computers/config_txt.html#dtoverlay
    [5]: https://www.raspberrypi.com/documentation/computers/config_txt.html#dtparam

    Signed-off-by: Gaël PORTAY <gael.portay@rtone.fr>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 5be42d8)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit d8bc17f
Author: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Date:   Sun Nov 26 23:57:17 2023 +0100

    package/exfatprogs: add EXFATPROGS_CPE_ID_VENDOR

    cpe:2.3:a:namjaejeon:exfatprogs is a valid CPE identifier for this
    package:

      https://nvd.nist.gov/products/cpe/detail/F174A846-F275-4AD8-A0E3-6D0CEFDFF308

    Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 3da6267)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit ec2238b
Author: Maxim Kochetkov <fido_max@inbox.ru>
Date:   Thu Nov 23 09:15:00 2023 +0300

    package/postgresql: security bump version to 15.5

    Release notes:
    https://www.postgresql.org/about/news/postgresql-161-155-1410-1313-1217-and-1122-released-2749/

    Fixes CVE-2023-5868, CVE-2023-5869, CVE-2023-5870.

    Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 4d549c0)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 8212d48
Author: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Date:   Thu Nov 16 14:51:35 2023 +0100

    package/netsnmp: revert back to 5.9.3, backport security fix

    In commit 13fc9dc, netsnmp was bumped
    from 5.9.3 to 5.9.4 to fix two CVEs.

    However, even though it's a minor version bump, there are actually 163
    commits upstream between those two minor releases, and some of them
    are breaking existing use-cases. In particular upstream
    a2cb167514ac0c7e1b04e8f151e0b015501362e0 now requires that config_()
    macros in MIB files are terminated with a semicolon, causing a build
    breakage with existing MIB files that were totally valid with 5.9.3.

    This commit therefore proposes to revert back to 5.9.3, by reverting
    those two commits:

    56caafc package/netsnmp: fix musl build
    13fc9dc package/netsnmp: security bump to version 5.9.4

    and instead backport the one upstream commit that fixes both CVEs.

    Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
    [yann.morin.1998@free.fr: fix typo as reported by Baruch]
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 44243b4)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit bc63ab9
Author: Gaël PORTAY <gael.portay@rtone.fr>
Date:   Wed Nov 22 02:04:08 2023 +0100

    board/raspberrypi/readme.txt: fix typos

    Signed-off-by: Gaël PORTAY <gael.portay@rtone.fr>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit acd833c)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 29e2700
Author: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Date:   Sun Nov 12 23:11:17 2023 +0100

    package/zfs: fix zfs autotools cross-compilation

    This commit addresses a long-standing bug encountered during ZFS
    compilation in cross-platform environments. The issue arises because ZFS
    autoconf triggers a `make modules` to detect if the kernel can compile
    modules [1]. The problem occurs when autoconf uses the host environment
    instead of the cross-platform environment.

    To fix this, we export necessary environment variables to ensure that ZFS
    autoconf utilizes the cross-platform environment correctly.

    This patch resolves ZFS cross-platform compilations:
    - http://autobuild.buildroot.net/results/ebeab256101bcba38c35fd55075c414e62f92caa/
    - http://autobuild.buildroot.net/results/03b9f12a106bf100eec695a92b83bf09b22c68b0/
    - http://autobuild.buildroot.net/results/c2da90337463607c2fadfeac7ad72e5c3899a61f/
    - http://autobuild.buildroot.net/results/465a249f92d2f5db7ac4b61b4111e6cbaaa15688/
    - http://autobuild.buildroot.net/results/7e2d3277e26fa5b0c8073a0e8b9e82f47ade9697/
    - http://autobuild.buildroot.net/results/a8fb87336b09fef8787a7889dfcccf14fe1215b9/
    - https://gitlab.com/kubu93/buildroot/-/jobs/1522848483

    And fix a few emails:
    - alpine.DEB.2.22.394.2108181630280.2028262@ridzo [build zfs into buildroot for raspberry pi 4]
    - https://lists.buildroot.org/pipermail/buildroot/2021-August/621696.html
    - https://lists.buildroot.org/pipermail/buildroot/2021-August/621345.html
    - https://lists.buildroot.org/pipermail/buildroot/2022-July/646379.html
    - https://lists.buildroot.org/pipermail/buildroot/2023-June/668467.html

    [1] This is the full callback, you can just check the last link:
    - https://github.com/openzfs/zfs/blob/zfs-2.1.12/config/kernel-declare-event-class.m4#L7C11-L7C11
    - https://github.com/openzfs/zfs/blob/zfs-2.1.12/config/kernel.m4#L883
    - https://github.com/openzfs/zfs/blob/zfs-2.1.12/config/kernel.m4#L868
    - https://github.com/openzfs/zfs/blob/zfs-2.1.12/config/kernel.m4#L668

    Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 7fe685c)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit 76699a7
Author: Yann E. MORIN <yann.morin.1998@free.fr>
Date:   Sun Nov 26 17:11:18 2023 +0100

    package/zfs: don't download patch generated from github

    Git-generated patches embed the short-hash of the objects in the
    repository. The length of those short hashes are subject to change
    in at least three cases:

      - the number of objects in the repository increases, so git increases
        the length of short hashes to get a good change there is no
        collision;

      - the git configuration changes, see core.abbrev in git-config;

      - the heuristic to compute the length changes in a newer git version.

    Since the bump to zfs 2.1.4 in commit 68dfd09, the patch generated
    by github has changed, causing download failures:

        wget --passive-ftp -nd -t 3 -O '/home/ymorin/dev/buildroot/O/master/build/.bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch.uoFq9e/output' 'https://github.com/openzfs/zfs/commit/bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch'
        --2023-11-26 16:53:25--
        https://github.com/openzfs/zfs/commit/bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch
        Resolving github.com (github.com)... 140.82.121.3
        Connecting to github.com (github.com)|140.82.121.3|:443...  connected.
        HTTP request sent, awaiting response... 200 OK
        Length: 2976 (2.9K) [text/plain]
        Saving to: ‘/home/ymorin/dev/buildroot/O/master/build/.bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch.uoFq9e/output’

        /home/ymorin/dev/buildroot/O/ 100%[================================================>]   2.91K --.-KB/s in 0s

        2023-11-26 16:53:25 (15.0 MB/s) - ‘/home/ymorin/dev/buildroot/O/master/build/.bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch.uoFq9e/output’ saved [2976/2976]

        ERROR: while checking hashes from package/zfs//zfs.hash
        ERROR: bc3f12bfac152a0c28951cec92340ba14f9ccee9.patch has wrong sha256 hash:
        ERROR: expected: 96a27353fe717ff2c8b95deb8b009c4eb750303c6400e2d8a2582ab1ec12b25a
        ERROR: got     : 246c80f66abca5a7e0c41cc7c56eec0b4cb7f16b142262480401142bbc2f999f
        ERROR: Incomplete download, or man-in-the-middle (MITM) attack

    And indeed, the length of short hashes has increased by one since then.

    Fix that by bundling the patch, with the short hashes that were known
    then, so that it matches the sha256 we had for it.

    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit 2c3946f)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit b1a3096
Author: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Date:   Wed Nov 22 16:47:36 2023 +0100

    package/gcc: fix disabling the documentation

    gcc.mk attempts to disable building the documentation by setting
    MAKEINFO=missing, but it is not working.  If makeinfo is installed
    and recent enough, gcc still uses it.  This can be checked easily:

    grep BUILD_INFO='info' host-gcc-initial-*/build/gcc/config.log

    It happens because the root ./configure script will check
    $MAKEINFO --version (aka 'missing --version') and will overwrite it with
    MAKEINFO='missing makeinfo' because the version does not match.

    Having MAKEINFO='missing makeinfo' is a problem because
    'missing makeinfo' will actually attempt to run 'makeinfo' before
    failing with an error message.  If makeinfo is installed on the host,
    then 'missing makeinfo' will successfully run makeinfo anyway.

    Many gcc subprojects will check $MAKEINFO --version and enable building
    the documentation if it is recent enough.  This patch overrides these
    checks by forcing gcc_cv_prog_makeinfo_modern=no.

    Building the GCC documentation can fail with the wrong makeinfo version.
    It happened at least when building GCC 11.3.0 with makeinfo 7.1.

    Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit f7b9d3a)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>

commit d3302c3
Author: Peter Korsgaard <peter@korsgaard.com>
Date:   Wed Nov 15 12:26:42 2023 +0100

    package/intel-microcode: security bump to version 20231114

    Includes fixes for INTEL-SA-00950:
    https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00950.html
    https://lock.cmpxchg8b.com/reptar.html
    https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20231114

    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
    Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
    (cherry picked from commit c544075)
    Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
@alpha754293
Copy link

alpha754293 commented Dec 4, 2023

I have been testing this, using this command: parallel --lb --halt-on-error now,fail=1 ./zhammer.sh /data 10000000 16k 10000 ::: $(seq $(nproc))

I haven't been able to replicate this issue using zhammer on a PC running Ubuntu 22.04 with kernel 6.2.0-37-generic, zfs-2.1.6-0york1~22.04, and zfs-kmod-2.1.9-2ubuntu1.1.

I also haven't been able to replicate it using zhammer on another PC running Proxmox pve-manager/7.4-17/513c62be (running kernel: 5.15.131-1-pve) with zfs-2.1.11-pve1 and zfs-kmod-2.1.11-pve1.

I thought that this issue definitely affected version 2.1.11 and above, and possibly even earlier versions (like the 2.1.6/2.1.9 I am running on my other PC) - or am I mistaken and this is not the case?

Same thing here.

I am using Proxmox 7.4-3 with coreutils version 8.32-4+b1 and zfsutils-linux: 2.1.11-pve1 and zfs-2.1.11-pve1 and zfs-kmod-2.1.11-pve1 and I've tested 640 million files (64 threads * 10 million files/thread) and haven't been able to replicate the issue on my end neither.

When I check my Solaris 10 1/13 system, GNU coreutils is NOT installed by default.

As far as I can so, it looks like (until I can find information otherwise, or someone else can educate me) that the core software libraries used for system administration is installed under the Solaris package SUNWadmc, so I am guessing that that's where the tool cp for Solaris comes through (rather than through GNU coreutils).

This should suggest that the probability of there being an issue with Solaris ZFS for this specific issue, should be relatively rather low, given that as far as I can tell, it doesn't use the same method to copy files as GNU coreutils cp does -- but again, I can be wrong.

Thanks.

@0x5c
Copy link

0x5c commented Dec 5, 2023

This should suggest that the probability of there being an issue with Solaris ZFS for this specific issue, should be relatively rather low, given that as far as I can tell, it doesn't use the same method to copy files as GNU coreutils cp does -- but again, I can be wrong.

The bug could be hit by anything that tries to find holes in a file that is also being written to, which includes but is not limited to coreutils 9.x. Anything that seeks to the next hole in a file, while it is being written to, during a very specific moment of the write operation, could end up incorrectly reading zeroes.

@robszy
Copy link

robszy commented Dec 5, 2023

Anyone can answer my question in comment: #15526 (comment) ?

@alpha754293
Copy link

This should suggest that the probability of there being an issue with Solaris ZFS for this specific issue, should be relatively rather low, given that as far as I can tell, it doesn't use the same method to copy files as GNU coreutils cp does -- but again, I can be wrong.

The bug could be hit by anything that tries to find holes in a file that is also being written to, which includes but is not limited to coreutils 9.x. Anything that seeks to the next hole in a file, while it is being written to, during a very specific moment of the write operation, could end up incorrectly reading zeroes.

Thank you, @0x5c.

Unfortunately, since Oracle bought Sun Microsystems, Solaris has been closed source since 2010, so it would be impossible to tell from the outside.

@0x5c
Copy link

0x5c commented Dec 6, 2023

This should suggest that the probability of there being an issue with Solaris ZFS for this specific issue, should be relatively rather low, given that as far as I can tell, it doesn't use the same method to copy files as GNU coreutils cp does -- but again, I can be wrong.

The bug could be hit by anything that tries to find holes in a file that is also being written to, which includes but is not limited to coreutils 9.x. Anything that seeks to the next hole in a file, while it is being written to, during a very specific moment of the write operation, could end up incorrectly reading zeroes.

Thank you, @0x5c.

Unfortunately, since Oracle bought Sun Microsystems, Solaris has been closed source since 2010, so it would be impossible to tell from the outside.

If the bug is present in that version of ZFS, then it should theoretically be possible to trigger with anything that seeks to the next hole in a file, regardless of what that program may be. If coreutils 9.x can build for solaris, then that could be a way to try to reproduce the bug. Even a minimal program that just does the equivalent to coreutils cp would be enough.

behlendorf pushed a commit that referenced this issue Dec 11, 2023
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
#15526

The bug was fixed in #15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Dec 12, 2023
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce3 Revert "Report holes when there are only metadata changes"
  ec4f9b8 Report holes when there are only metadata changes
  454365b Fix dirty check in dmu_offset_next()
  66aca24 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#15571 
Closes openzfs#15526
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Dec 12, 2023
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#15566
Closes openzfs#15526
@robszy
Copy link

robszy commented Dec 13, 2023

I found out why second copy from my comment #15526 (comment) is done by seeking hole/data:

ZFS reports number of blocks that is less than size of file thats why second copy is sparse but it shouldn't. Of course it whouldn't be good for reproducer but the real question:

Do we have another bug in zfs in reporting number of blocks ?

BEcause of that cp doesn't to copy_file_Range as it should and perform faster :)
For me it should report real number of blocks.

Maxwell-lt added a commit to Maxwell-lt/machine-configuration that referenced this issue Dec 18, 2023
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jan 27, 2024
As a mitigation until more is understood and fixes are tested & reviewed, change
the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before
05b3eb6d232009db247882a39d518e7282630753 upstream.

There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1:
that does not mean this is a cure or a real fix, but it _appears_ to be at least
effective in reducing the chances of it happening. By itself, it's a safe change
anyway, so it feels worth us doing while we wait.

Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both
2.1.13 and 2.2.1.

Bug: openzfs/zfs#11900
Bug: openzfs/zfs#15526
Bug: https://bugs.gentoo.org/917224
Signed-off-by: Sam James <sam@gentoo.org>
ixhamza pushed a commit to truenas/zfs that referenced this issue Jan 30, 2024
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
openzfs#15526

The bug was fixed in openzfs#15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15608
behlendorf pushed a commit that referenced this issue Feb 13, 2024
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
#15526

The bug was fixed in #15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
@RichardBelzer
Copy link

We're still hitting this bug in 2.2.3 and it hasn't been fully fixed yet. I found this issue here, more people are still experiencing it: #15933

Just wanted to keep people aware and to keep checking your data for silent corruption.

lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Mar 13, 2024
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
openzfs#15526

The bug was fixed in openzfs#15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15608
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Mar 13, 2024
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
openzfs#15526

The bug was fixed in openzfs#15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15608
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.