-
|
I'm currently debugging a migration flow with SPDK and libvfio-user using VFIO Migration V1 and noticed that device_state is explicitly updated in the code path; could someone clarify the design rationale behind this update? static int
device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t reason)
{
int ret;
ret = call_reset_cb(vfu_ctx, reason);
if (ret < 0) {
return ret;
}
// reset device migration state here
if (vfu_ctx->migration != NULL) {
return handle_device_state(vfu_ctx, vfu_ctx->migration,
VFIO_DEVICE_STATE_RUNNING, false);
}
return 0;
}In my testing, this operation appears to inadvertently clear the high bit of the SAVING state, which risks disrupting the migration state machine, so I'm wondering whether this is intentional per the V1 spec (and if so, how the flag should be properly preserved) or if it might be an oversight that requires masking/workaround. Any insights into the intended behavior or references to relevant documentation would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 10 replies
-
Sadly I didn't document this well enough, but I think I did it this way because that's what the kernel did (and I belive it's what QEMU expected). If it's in SAVING state the guest isn't running so it can't be the guest triggering it, most likely it's the VMM resetting the device after a failed migration. Have you actually seen an error because of this?
How come? I presume you know this is eventually going away? |
Beta Was this translation helpful? Give feedback.
-
Ah sorry, I didn't pay enough attention to what happens after
If the guest reboots then the device state is reset, so any pre-copy state that has been captured so far must be discarded. So the current transition looks fine, however what the actual implications of this aren't clear. What's the problem this causes? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Thanks for explaining the problem in such great detail, it's now obvious what's broken. My earlier statement:
should be corrected to:
@Hooollin would you be interested in fixing this?