Skip to content

Commit

Permalink
migration: Unify reset of last_rb on destination node when recover
Browse files Browse the repository at this point in the history
When postcopy recover happens, we need to reset last_rb after each return of
postcopy_pause_fault_thread() because that means we just got the postcopy
migration continued.

Unify this reset to the place right before we want to kick the fault thread
again, when we get the command MIG_CMD_POSTCOPY_RESUME from source.

This is actually more than that - because the main thread on destination will
now be able to call migrate_send_rp_req_pages_pending() too, so the fault
thread is not the only user of last_rb now.  Move the reset earlier will allow
the first call to migrate_send_rp_req_pages_pending() to use the reset value
even if called from the main thread.

(NOTE: this is not a real fix to 0c26781 mentioned below, however it is just
 a mark that when picking up 0c26781 we'd better have this one too; the real
 fix will come later)

Fixes: 0c26781 ("migration: Sync requested pages after postcopy recovery")
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201102153010.11979-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  • Loading branch information
xzpeter authored and dagrh committed Nov 2, 2020
1 parent b139d11 commit cc5ab87
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 0 additions & 2 deletions migration/postcopy-ram.c
Original file line number Diff line number Diff line change
Expand Up @@ -903,7 +903,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
* the channel is rebuilt.
*/
if (postcopy_pause_fault_thread(mis)) {
mis->last_rb = NULL;
/* Continue to read the userfaultfd */
} else {
error_report("%s: paused but don't allow to continue",
Expand Down Expand Up @@ -985,7 +984,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
/* May be network failure, try to wait for recovery */
if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
/* We got reconnected somehow, try to continue */
mis->last_rb = NULL;
goto retry;
} else {
/* This is a unavoidable fault */
Expand Down
6 changes: 6 additions & 0 deletions migration/savevm.c
Original file line number Diff line number Diff line change
Expand Up @@ -2061,6 +2061,12 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
return 0;
}

/*
* Reset the last_rb before we resend any page req to source again, since
* the source should have it reset already.
*/
mis->last_rb = NULL;

/*
* This means source VM is ready to resume the postcopy migration.
* It's time to switch state and release the fault thread to
Expand Down

0 comments on commit cc5ab87

Please sign in to comment.