Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paging filter #1

Merged
merged 3 commits into from
Nov 30, 2023
Merged

Paging filter #1

merged 3 commits into from
Nov 30, 2023

Conversation

rmalmain
Copy link
Owner

No description provided.

@rmalmain rmalmain merged commit 0089b52 into main Nov 30, 2023
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
359         BlockDriverState *bs = bitmap->bs;
 #0  0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
 #1  0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371
 AFLplusplus#2  0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
7073        QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
 #0  0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095
 AFLplusplus#2  0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20230731203338.27581-1-farosas@suse.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f187609)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.

However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().

Without this fix, the following crash was reported:

  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
  AFLplusplus#2  0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  AFLplusplus#3  0x00007efc738477f3 in __GI_abort () at abort.c:79
  AFLplusplus#4  0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
     file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
  AFLplusplus#5  0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
     function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
  AFLplusplus#6  0x0000560aebcd8dd6 in bdrv_register_buf ()
  AFLplusplus#7  0x0000560aeb97ed97 in ram_block_added.llvm ()
  AFLplusplus#8  0x0000560aebb8303f in ram_block_add.llvm ()
  AFLplusplus#9  0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
  AFLplusplus#10 0x0000560aebb2ac98 in vfio_region_mmap ()
  AFLplusplus#11 0x0000560aebb3ea0f in vfio_bars_register ()
  AFLplusplus#12 0x0000560aebb3c628 in vfio_realize ()
  AFLplusplus#13 0x0000560aeb90f0c2 in pci_qdev_realize ()
  AFLplusplus#14 0x0000560aebc40305 in device_set_realized ()
  AFLplusplus#15 0x0000560aebc48e07 in property_set_bool.llvm ()
  AFLplusplus#16 0x0000560aebc46582 in object_property_set ()
  AFLplusplus#17 0x0000560aebc4cd58 in object_property_set_qobject ()
  AFLplusplus#18 0x0000560aebc46ba7 in object_property_set_bool ()
  AFLplusplus#19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
  AFLplusplus#20 0x0000560aebb1fbaf in virtio_net_set_features ()
  AFLplusplus#21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
  AFLplusplus#22 0x0000560aebb47107 in virtio_load ()
  AFLplusplus#23 0x0000560aeb9ae7ce in vmstate_load_state ()
  AFLplusplus#24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
  AFLplusplus#25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
  AFLplusplus#26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
  AFLplusplus#27 0x0000560aebeace56 in coroutine_trampoline.llvm ()

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905145002.46391-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 92e2e6a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
812	    return con->hw_ops->ui_info != NULL;
(gdb) bt
#0  0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
#1  0x00005555558a44b1 in protocol_client_msg (vs=0x5555578c76c0, data=0x5555581e93f0 <incomplete sequence \373>, len=24) at ../ui/vnc.c:2585
AFLplusplus#2  0x00005555558a19ac in vnc_client_read (vs=0x5555578c76c0) at ../ui/vnc.c:1607
AFLplusplus#3  0x00005555558a1ac2 in vnc_client_io (ioc=0x5555581eb0e0, condition=G_IO_IN, opaque=0x5555578c76c0) at ../ui/vnc.c:1635

Fixes:
https://issues.redhat.com/browse/RHEL-2600

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Albert Esteve <aesteve@redhat.com>
(cherry picked from commit 48a35e1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 AFLplusplus#2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 AFLplusplus#3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 AFLplusplus#4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 AFLplusplus#5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
(cherry picked from commit ef796ee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
On LoongArch host,  we got an Aborted from tcg_out_mov().

qemu-x86_64 configure with '--enable-debug'.

> (gdb) b /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> Breakpoint 1 at 0x2576f0: file /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc, line 312.
> (gdb) run hello
[...]
> Thread 1 "qemu-x86_64" hit Breakpoint 1, tcg_out_mov (s=0xaaaae91760 <tcg_init_ctx>, type=TCG_TYPE_V128, ret=TCG_REG_V2,
>     arg=TCG_REG_V0) at /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> 312           g_assert_not_reached();
> (gdb) bt
> #0  tcg_out_mov (s=0xaaaae91760 <tcg_init_ctx>, type=TCG_TYPE_V128, ret=TCG_REG_V2, arg=TCG_REG_V0)
>     at /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> #1  0x000000aaaad0fee0 in tcg_reg_alloc_mov (s=0xaaaae91760 <tcg_init_ctx>, op=0xaaaaf67c20) at ../tcg/tcg.c:4632
> AFLplusplus#2  0x000000aaaad142f4 in tcg_gen_code (s=0xaaaae91760 <tcg_init_ctx>, tb=0xffe8030340 <code_gen_buffer+197328>,
>     pc_start=4346094) at ../tcg/tcg.c:6135
[...]
> (gdb) c
> Continuing.
> **
> ERROR:/home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312:tcg_out_mov: code should not be reached
> Bail out! ERROR:/home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312:tcg_out_mov: code should not be reached
>
> Thread 1 "qemu-x86_64" received signal SIGABRT, Aborted.
> 0x000000fff7b1c390 in raise () from /lib64/libc.so.6
> (gdb) q

Fixes: 16288de ("tcg/loongarch64: Lower basic tcg vec ops to LSX")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20231120065916.374045-1-gaosong@loongson.cn>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
…ock_status

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> AFLplusplus#2  bdrv_co_common_block_status_above
> AFLplusplus#3  bdrv_co_block_status_above
> AFLplusplus#4  bdrv_co_block_status
> AFLplusplus#5  cbw_co_snapshot_block_status
> AFLplusplus#6  bdrv_co_snapshot_block_status
> AFLplusplus#7  snapshot_access_co_block_status
> AFLplusplus#8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> AFLplusplus#9  bdrv_co_common_block_status_above
> AFLplusplus#10 bdrv_co_block_status_above
> AFLplusplus#11 block_copy_block_status
> AFLplusplus#12 block_copy_dirty_clusters
> AFLplusplus#13 block_copy_common
> AFLplusplus#14 block_copy_async_co_entry
> AFLplusplus#15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-id: 20240116154839.401030-1-f.ebner@proxmox.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 8a9be79)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain pushed a commit that referenced this pull request Apr 11, 2024
There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.

The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.

Unfortunately, this specific scenario is mishandled. A short example:
    1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
    2. Another operation (AFLplusplus#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
    3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation AFLplusplus#2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
    4. As a result of the above, the log is corrupt.

Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.

Signed-off-by: Ari Sundholm <ari@tuxera.com>
Cc: qemu-stable@nongnu.org
Message-ID: <20240109184646.1128475-1-megari@gmx.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a9c8ea9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rmalmain added a commit that referenced this pull request Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant