zloop fixes and improvements#374
Conversation
|
Upstream branch: 6da43bb |
83d3e2f to
00d5e5c
Compare
|
Upstream branch: f824272 |
2ca62c3 to
6fca0f5
Compare
00d5e5c to
d782508
Compare
|
Upstream branch: f824272 |
6fca0f5 to
3c6d4db
Compare
d782508 to
6099a4d
Compare
|
Upstream branch: e7c375b |
3c6d4db to
c76ab0e
Compare
6099a4d to
5121c4d
Compare
|
Upstream branch: e7c375b |
c76ab0e to
99d3cc4
Compare
5121c4d to
4458758
Compare
|
Upstream branch: 8b69055 |
99d3cc4 to
bf56958
Compare
4458758 to
6f43942
Compare
The write pointer of zones that are in the full condition is always invalid. Reflect that fact by setting the write pointer of full zones to ULLONG_MAX. Fixes: eb0570c ("block: new zoned loop block device driver") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
zloop_rw() will fail any regular write operation that targets a full sequential zone. The check for this is indirect and achieved by checking the write pointer alignment of the write operation. But this check is ineffective for zone append operations since these are alwasy automatically directed at a zone write pointer. Prevent zone append operations from being executed in a full zone with an explicit check of the zone condition. Fixes: eb0570c ("block: new zoned loop block device driver") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
The function zloop_rw() already checks early that a request is fully contained within the target zone. So this check does not need to be done again for regular writes to sequential zones. Furthermore, since zone append operations are always directed to the zone write pointer location, we do not need to check for their alignment to that value after setting it. So turn the "if" checking the write pointer alignment into an "else if". While at it, improve the comment describing the write pointer modification and how this value is corrected in case of error. Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
A zloop zoned block device declares to the block layer that it supports zone append operations. That is, a zloop device ressembles an NVMe ZNS devices supporting zone append. This native support is fine but it does not allow exercising the block layer zone write plugging emulation of zone append, as is done with SCSI or ATA SMR HDDs. Introduce the zone_append configuration parameter to allow creating a zloop device without native support for zone append, thus relying on the block layer zone append emulation. If not specified, zone append support is enabled by default. Otherwise, a value of 0 disables native zone append and a value of 1 enables it. Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
The zone append operation processing for zloop devices is similar to any other command, that is, the operation is processed as a command work item, without any special serialization between the work items (beside the zone mutex for mutually exclusive code sections). This processing is fine and gives excellent performance. However, it has a side effect: zone append operation are very often reordered and processed in a sequence that is very different from their issuing order by the user. This effect is very visible using an XFS file system on top of a zloop device. A simple file write leads to many file extents as the data writes using zone append are reordered and so result in the physical order being different than the file logical order. E.g. executing: $ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync $ xfs_bmap /mnt/test /mnt/test: 0: [0..4095]: 2162688..2166783 1: [4096..6143]: 2168832..2170879 2: [6144..8191]: 2166784..2168831 3: [8192..10239]: 2170880..2172927 4: [10240..12287]: 2174976..2177023 5: [12288..14335]: 2172928..2174975 6: [14336..20479]: 2177024..2183167 For 10 IOs, 6 extents are created. This is fine and actually allows to exercise XFS zone garbage collection very well. However, this also makes debugging/working on XFS data placement harder as the underlying device will most of the time reorder IOs, resulting in many file extents. Allow a user to mitigate this with the new ordered_zone_append configuration parameter. For a zloop device created with this parameter specified, the sector of a zone append command is set early, when the command is submitted by the block layer with the zloop_queue_rq() function, instead of in the zloop_rw() function which is exectued later in the command work item context. This change ensures that more often than not, zone append operations data end up being written in the same order as the command submission by the user. In the case of XFS, this leads to far less file data extents. E.g., for the previous example, we get a single file data extent for the written file. $ dd if=/dev/zero of=/mnt/test bs=1M count=10 && sync $ xfs_bmap /mnt/test /mnt/test: 0: [0..20479]: 2162688..2183167 Since we cannot use a mutex in the context of the zloop_queue_rq() function to atomically set a zone append operation sector to the target zone write pointer location and increment that the write pointer, a new per-zone spinlock is introduced to protect a zone write pointer access and modifications. To check a zone write pointer location and set a zone append operation target sector to that value, the function zloop_set_zone_append_sector() is introduced and called from zloop_queue_rq(). Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
Upstream branch: fd95357 |
In Documentation/admin-guide/blockdev/zoned_loop.rst, add the description of the zone_append and ordered_zone_append configuration arguments of zloop "add" command (device creation). Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
bf56958 to
a03d1ab
Compare
6f43942 to
ec9caac
Compare
|
Upstream branch: c2f2b01 Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=1023830 conflict: |
ec9caac to
4a5ddea
Compare
|
At least one diff in series https://patchwork.kernel.org/project/linux-block/list/?series=1023830 irrelevant now for [{'archived': False, 'project': 241}] search patterns |
To pick up the changes in these csets: 5ca243f ("prctl: add arch-agnostic prctl()s for indirect branch tracking") 28621ec ("rseq: Add prctl() to enable time slice extensions") That don't introduced these new prctls: $ tools/perf/trace/beauty/prctl_option.sh > before.txt $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after.txt $ diff -u before.txt after.txt --- before.txt 2026-02-27 09:07:16.435611457 -0300 +++ after.txt 2026-02-27 09:07:28.189816531 -0300 @@ -73,6 +73,10 @@ [76] = "LOCK_SHADOW_STACK_STATUS", [77] = "TIMER_CREATE_RESTORE_IDS", [78] = "FUTEX_HASH", + [79] = "RSEQ_SLICE_EXTENSION", + [80] = "GET_INDIR_BR_LP_STATUS", + [81] = "SET_INDIR_BR_LP_STATUS", + [82] = "LOCK_INDIR_BR_LP_STATUS", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ That now will be used to decode the syscall option and also to compose filters, for instance: [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME 0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee) 0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670) 7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10) 7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970) 8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10) 8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970) ^C[root@five ~]# This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Please see tools/include/uapi/README for further details. Cc: Deepak Gupta <debug@rivosinc.com> Cc: Paul Walmsley <pjw@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull request for series with
subject: zloop fixes and improvements
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1023830