Skip to content

Commit

Permalink
monitor: only run coroutine commands in qemu_aio_context
Browse files Browse the repository at this point in the history
monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Cc: qemu-stable@nongnu.org
Fixes: 7bed899 ("device_core: use drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240118144823.1497953-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit effd60c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: omit changes to tests missing in 7.2)
  • Loading branch information
stefanhaRH authored and Michael Tokarev committed Jan 26, 2024
1 parent 219cea6 commit 8ec9059
Show file tree
Hide file tree
Showing 30 changed files with 201 additions and 173 deletions.
17 changes: 0 additions & 17 deletions monitor/qmp.c
Original file line number Diff line number Diff line change
Expand Up @@ -296,14 +296,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
qemu_coroutine_yield();
}

/*
* Move the coroutine from iohandler_ctx to qemu_aio_context for
* executing the command handler so that it can make progress if it
* involves an AIO_WAIT_WHILE().
*/
aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
qemu_coroutine_yield();

/* Process request */
if (req_obj->req) {
if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
Expand All @@ -330,15 +322,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
}

qmp_request_free(req_obj);

/*
* Yield and reschedule so the main loop stays responsive.
*
* Move back to iohandler_ctx so that nested event loops for
* qemu_aio_context don't start new monitor commands.
*/
aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
qemu_coroutine_yield();
}
}

Expand Down
24 changes: 23 additions & 1 deletion qapi/qmp-dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -206,9 +206,31 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
assert(!(oob && qemu_in_coroutine()));
assert(monitor_cur() == NULL);
if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
if (qemu_in_coroutine()) {
/*
* Move the coroutine from iohandler_ctx to qemu_aio_context for
* executing the command handler so that it can make progress if it
* involves an AIO_WAIT_WHILE().
*/
aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
qemu_coroutine_yield();
}

monitor_set_cur(qemu_coroutine_self(), cur_mon);
cmd->fn(args, &ret, &err);
monitor_set_cur(qemu_coroutine_self(), NULL);

if (qemu_in_coroutine()) {
/*
* Yield and reschedule so the main loop stays responsive.
*
* Move back to iohandler_ctx so that nested event loops for
* qemu_aio_context don't start new monitor commands.
*/
aio_co_schedule(iohandler_get_aio_context(),
qemu_coroutine_self());
qemu_coroutine_yield();
}
} else {
/*
* Actual context doesn't match the one the command needs.
Expand All @@ -232,7 +254,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
.errp = &err,
.co = qemu_coroutine_self(),
};
aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
&data);
qemu_coroutine_yield();
}
Expand Down
4 changes: 2 additions & 2 deletions tests/qemu-iotests/060.out
Original file line number Diff line number Diff line change
Expand Up @@ -421,8 +421,8 @@ QMP_VERSION
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "none0", "msg": "Preventing invalid write on metadata (overlaps with refcount table)", "offset": 65536, "node-name": "drive", "fatal": true, "size": 65536}}
write failed: Input/output error
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

=== Testing incoming inactive corrupted image ===

Expand All @@ -432,8 +432,8 @@ QMP_VERSION
qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a00 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

corrupt: false
*** done
4 changes: 2 additions & 2 deletions tests/qemu-iotests/071.out
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ QMP_VERSION
{"return": {}}
read failed: Input/output error
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== Testing blkverify on existing block device ===
Expand Down Expand Up @@ -84,9 +84,9 @@ wrote 512/512 bytes at offset 0
{"return": ""}
read failed: Input/output error
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
QEMU_PROG: Failed to flush the L2 table cache: Input/output error
QEMU_PROG: Failed to flush the refcount block cache: Input/output error
{"return": {}}

*** done
16 changes: 8 additions & 8 deletions tests/qemu-iotests/081.out
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ QMP_VERSION
read 10485760/10485760 bytes at offset 0
10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


== using quorum rewrite corrupted mode ==
Expand Down Expand Up @@ -67,8 +67,8 @@ QMP_VERSION
read 10485760/10485760 bytes at offset 0
10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

-- checking that the image has been corrected --
read 10485760/10485760 bytes at offset 0
Expand Down Expand Up @@ -106,17 +106,17 @@ QMP_VERSION
{"return": {}}
{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

Testing:
QMP_VERSION
{"return": {}}
{"return": {}}
{"return": {}}
{"error": {"class": "GenericError", "desc": "Cannot add a child to a quorum in blkverify mode"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


== dynamically removing a child from a quorum ==
Expand All @@ -125,31 +125,31 @@ QMP_VERSION
{"return": {}}
{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

Testing:
QMP_VERSION
{"return": {}}
{"return": {}}
{"error": {"class": "GenericError", "desc": "The number of children cannot be lower than the vote threshold 2"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

Testing:
QMP_VERSION
{"return": {}}
{"error": {"class": "GenericError", "desc": "blkverify=on can only be set if there are exactly two files and vote-threshold is 2"}}
{"error": {"class": "GenericError", "desc": "Cannot find device='drive0-quorum' nor node-name='drive0-quorum'"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

Testing:
QMP_VERSION
{"return": {}}
{"return": {}}
{"error": {"class": "GenericError", "desc": "The number of children cannot be lower than the vote threshold 2"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

*** done
12 changes: 6 additions & 6 deletions tests/qemu-iotests/087.out
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Testing:
QMP_VERSION
{"return": {}}
{"error": {"class": "GenericError", "desc": "'node-name' must be specified for the root node"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== Duplicate ID ===
Expand All @@ -18,8 +18,8 @@ QMP_VERSION
{"return": {}}
{"error": {"class": "GenericError", "desc": "node-name=disk is conflicting with a device id"}}
{"error": {"class": "GenericError", "desc": "Duplicate nodes with node-name='test-node'"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== aio=native without O_DIRECT ===
Expand All @@ -28,8 +28,8 @@ Testing:
QMP_VERSION
{"return": {}}
{"error": {"class": "GenericError", "desc": "aio=native was specified, but it requires cache.direct=on, which was not specified."}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== Encrypted image QCow ===
Expand All @@ -40,8 +40,8 @@ QMP_VERSION
{"return": {}}
{"return": {}}
{"error": {"class": "GenericError", "desc": "Use of AES-CBC encrypted IMGFMT images is no longer supported in system emulators"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== Encrypted image LUKS ===
Expand All @@ -52,8 +52,8 @@ QMP_VERSION
{"return": {}}
{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}


=== Missing driver ===
Expand All @@ -63,7 +63,7 @@ Testing: -S
QMP_VERSION
{"return": {}}
{"error": {"class": "GenericError", "desc": "Parameter 'driver' is missing"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

*** done
2 changes: 1 addition & 1 deletion tests/qemu-iotests/108.out
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,8 @@ OK: Reftable is where we expect it
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "create"}}
{"return": {}}
{ "execute": "quit" }
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

wrote 65536/65536 bytes at offset 0
64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
Expand Down
4 changes: 2 additions & 2 deletions tests/qemu-iotests/109
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ run_qemu()
_launch_qemu -drive file="${source_img}",format=raw,cache=${CACHEMODE},aio=${AIOMODE},id=src
_send_qemu_cmd $QEMU_HANDLE "{ 'execute': 'qmp_capabilities' }" "return"

_send_qemu_cmd $QEMU_HANDLE \
capture_events="$qmp_event" _send_qemu_cmd $QEMU_HANDLE \
"{'execute':'drive-mirror', 'arguments':{
'device': 'src', 'target': '$raw_img', $qmp_format
'mode': 'existing', 'sync': 'full'}}" \
"return"

_send_qemu_cmd $QEMU_HANDLE '' "$qmp_event"
capture_events="$qmp_event JOB_STATUS_CHANGE" _wait_event $QEMU_HANDLE "$qmp_event"
if test "$qmp_event" = BLOCK_JOB_ERROR; then
_send_qemu_cmd $QEMU_HANDLE '' '"status": "null"'
fi
Expand Down

0 comments on commit 8ec9059

Please sign in to comment.