Skip to content

zloop fixes and improvements#374

Closed
blktests-ci[bot] wants to merge 7 commits intolinus-master_basefrom
series/1023830=>linus-master
Closed

zloop fixes and improvements#374
blktests-ci[bot] wants to merge 7 commits intolinus-master_basefrom
series/1023830=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented Nov 15, 2025

Pull request for series with
subject: zloop fixes and improvements
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1023830

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 15, 2025

Upstream branch: 6da43bb
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 16, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from 2ca62c3 to 6fca0f5 Compare November 16, 2025 07:44
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 00d5e5c to d782508 Compare November 17, 2025 00:45
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 17, 2025

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from 6fca0f5 to 3c6d4db Compare November 17, 2025 00:54
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from d782508 to 6099a4d Compare November 17, 2025 23:44
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 17, 2025

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from 3c6d4db to c76ab0e Compare November 17, 2025 23:55
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 6099a4d to 5121c4d Compare November 18, 2025 02:19
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 18, 2025

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from c76ab0e to 99d3cc4 Compare November 18, 2025 02:29
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 5121c4d to 4458758 Compare November 19, 2025 00:24
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 19, 2025

Upstream branch: 8b69055
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from 99d3cc4 to bf56958 Compare November 19, 2025 00:32
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 4458758 to 6f43942 Compare November 21, 2025 09:45
The write pointer of zones that are in the full condition is always
invalid. Reflect that fact by setting the write pointer of full zones
to ULLONG_MAX.

Fixes: eb0570c ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
zloop_rw() will fail any regular write operation that targets a full
sequential zone. The check for this is indirect and achieved by checking
the write pointer alignment of the write operation. But this check is
ineffective for zone append operations since these are alwasy
automatically directed at a zone write pointer.

Prevent zone append operations from being executed in a full zone with
an explicit check of the zone condition.

Fixes: eb0570c ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
The function zloop_rw() already checks early that a request is fully
contained within the target zone. So this check does not need to be done
again for regular writes to sequential zones. Furthermore, since zone
append operations are always directed to the zone write pointer
location, we do not need to check for their alignment to that value
after setting it. So turn the "if" checking the write pointer alignment
into an "else if".

While at it, improve the comment describing the write pointer
modification and how this value is corrected in case of error.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
A zloop zoned block device declares to the block layer that it supports
zone append operations. That is, a zloop device ressembles an NVMe ZNS
devices supporting zone append.

This native support is fine but it does not allow exercising the block
layer zone write plugging emulation of zone append, as is done with SCSI
or ATA SMR HDDs.

Introduce the zone_append configuration parameter to allow creating a
zloop device without native support for zone append, thus relying on the
block layer zone append emulation. If not specified, zone append support
is enabled by default. Otherwise, a value of 0 disables native zone
append and a value of 1 enables it.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
The zone append operation processing for zloop devices is similar to any
other command, that is, the operation is processed as a command work
item, without any special serialization between the work items (beside
the zone mutex for mutually exclusive code sections).

This processing is fine and gives excellent performance. However, it has
a side effect: zone append operation are very often reordered and
processed in a sequence that is very different from their issuing order
by the user. This effect is very visible using an XFS file system on top
of a zloop device. A simple file write leads to many file extents as the
data writes using zone append are reordered and so result in the
physical order being different than the file logical order.
E.g. executing:

$ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync
$ xfs_bmap /mnt/test
/mnt/test:
	0: [0..4095]: 2162688..2166783
	1: [4096..6143]: 2168832..2170879
	2: [6144..8191]: 2166784..2168831
	3: [8192..10239]: 2170880..2172927
	4: [10240..12287]: 2174976..2177023
	5: [12288..14335]: 2172928..2174975
	6: [14336..20479]: 2177024..2183167

For 10 IOs, 6 extents are created.

This is fine and actually allows to exercise XFS zone garbage collection
very well. However, this also makes debugging/working on XFS data
placement harder as the underlying device will most of the time reorder
IOs, resulting in many file extents.

Allow a user to mitigate this with the new ordered_zone_append
configuration parameter. For a zloop device created with this parameter
specified, the sector of a zone append command is set early, when the
command is submitted by the block layer with the zloop_queue_rq()
function, instead of in the zloop_rw() function which is exectued later
in the command work item context. This change ensures that more often
than not, zone append operations data end up being written in the same
order as the command submission by the user.

In the case of XFS, this leads to far less file data extents. E.g., for
the previous example, we get a single file data extent for the written
file.

$ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync
$ xfs_bmap /mnt/test
/mnt/test:
	0: [0..20479]: 2162688..2183167

Since we cannot use a mutex in the context of the zloop_queue_rq()
function to atomically set a zone append operation sector to the target
zone write pointer location and increment that the write pointer, a new
per-zone spinlock is introduced to protect a zone write pointer access
and modifications. To check a zone write pointer location and set a zone
append operation target sector to that value, the function
zloop_set_zone_append_sector() is introduced and called from
zloop_queue_rq().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Nov 21, 2025

Upstream branch: fd95357
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

In Documentation/admin-guide/blockdev/zoned_loop.rst, add the
description of the zone_append and ordered_zone_append configuration
arguments of zloop "add" command (device creation).

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
@blktests-ci blktests-ci Bot force-pushed the series/1023830=>linus-master branch from bf56958 to a03d1ab Compare November 21, 2025 09:54
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Dec 8, 2025

Upstream branch: c2f2b01
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023830
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=1023830
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Applying: zloop: make the write pointer of full zones invalid
Using index info to reconstruct a base tree...
M	drivers/block/zloop.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/block/zloop.c
CONFLICT (content): Merge conflict in drivers/block/zloop.c
Patch failed at 0001 zloop: make the write pointer of full zones invalid'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"'

conflict:

diff --cc drivers/block/zloop.c
index 3f50321aa4a7,a975b1d07f1c..000000000000
--- a/drivers/block/zloop.c
+++ b/drivers/block/zloop.c
@@@ -472,19 -428,15 +472,27 @@@ static void zloop_rw(struct zloop_cmd *
  			zone->cond = BLK_ZONE_COND_IMP_OPEN;
  
  		/*
 -		 * Advance the write pointer of sequential zones. If the write
 -		 * fails, the wp position will be corrected when the next I/O
 -		 * copmpletes.
 +		 * Advance the write pointer, unless ordered zone append is in
 +		 * use. If the write fails, the write pointer position will be
 +		 * corrected when the next I/O starts execution.
  		 */
++<<<<<<< HEAD
 +		if (!is_append || !zlo->ordered_zone_append) {
 +			zone->wp += nr_sectors;
 +			if (zone->wp == zone_end) {
 +				zone->cond = BLK_ZONE_COND_FULL;
 +				zone->wp = ULLONG_MAX;
 +			}
 +		}
 +
 +		spin_unlock_irqrestore(&zone->wp_lock, flags);
++=======
+ 		zone->wp += nr_sectors;
+ 		if (zone->wp == zone_end) {
+ 			zone->cond = BLK_ZONE_COND_FULL;
+ 			zone->wp = ULLONG_MAX;
+ 		}
++>>>>>>> zloop: make the write pointer of full zones invalid
  	}
  
  	rq_for_each_bvec(tmp, rq, rq_iter)

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Dec 28, 2025

At least one diff in series https://patchwork.kernel.org/project/linux-block/list/?series=1023830 irrelevant now for [{'archived': False, 'project': 241}] search patterns

@blktests-ci blktests-ci Bot closed this Dec 28, 2025
@blktests-ci blktests-ci Bot deleted the series/1023830=>linus-master branch January 9, 2026 05:03
blktests-ci Bot pushed a commit that referenced this pull request Mar 18, 2026
To pick up the changes in these csets:

  5ca243f ("prctl: add arch-agnostic prctl()s for indirect branch tracking")
  28621ec ("rseq: Add prctl() to enable time slice extensions")

That don't introduced these new prctls:

  $ tools/perf/trace/beauty/prctl_option.sh > before.txt
  $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after.txt
  $ diff -u before.txt after.txt
  --- before.txt	2026-02-27 09:07:16.435611457 -0300
  +++ after.txt	2026-02-27 09:07:28.189816531 -0300
  @@ -73,6 +73,10 @@
   	[76] = "LOCK_SHADOW_STACK_STATUS",
   	[77] = "TIMER_CREATE_RESTORE_IDS",
   	[78] = "FUTEX_HASH",
  +	[79] = "RSEQ_SLICE_EXTENSION",
  +	[80] = "GET_INDIR_BR_LP_STATUS",
  +	[81] = "SET_INDIR_BR_LP_STATUS",
  +	[82] = "LOCK_INDIR_BR_LP_STATUS",
   };
   static const char *prctl_set_mm_options[] = {
   	[1] = "START_CODE",
  $

That now will be used to decode the syscall option and also to compose
filters, for instance:

  [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME
       0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee)
       0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670)
       7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10)
       7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970)
       8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10)
       8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970)
  ^C[root@five ~]#

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Please see tools/include/uapi/README for further details.

Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants