Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: validate transfer phase #92

Merged
merged 1 commit into from
Sep 13, 2022
Merged

Conversation

bennyz
Copy link
Member

@bennyz bennyz commented Feb 21, 2022

Do not update the phase if the incoming phase is invalid

Bug-Url: https://bugzilla.redhat.com/2092816

@bennyz bennyz added the storage label Feb 21, 2022
@bennyz bennyz requested a review from nirs February 21, 2022 12:33
validTransitions.put(FINALIZING_SUCCESS, EnumSet.of(FINALIZING_FAILURE, FINISHED_SUCCESS));
validTransitions.put(FINALIZING_FAILURE, EnumSet.of(FINISHED_FAILURE));
validTransitions.put(FINALIZING_CLEANUP, EnumSet.of(FINISHED_CLEANUP));
validTransitions.put(FINISHED_CLEANUP, EnumSet.of(FINISHED_FAILURE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I'm not sure the valid transitions are correct, but the validator is nice.

Copy link
Member Author

@bennyz bennyz Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I "guessed" this based on the code, I hope eventually the spec document will provide a clear understanding of which transitions are valid. For now, I guess it would be good enough to block transitions to FINALIZING_SUCCESS when a user incorrectly finalizes a transfer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very risky - if you got the valid transitions wrong, this will break
valid code in engine trying to change the phase.

Another issue is failing user request which should succeed. For example:

More than one finalize:

  1. user ask to finalize succeeds, start finalize flow
  2. user ask to finalize again - fail

More than one cancel:

  1. user ask to cancel succeeds, start finalize flow
  2. user ask to cancel again - fail

The system will be much easier to use if calling finalize() or cacnel() more than
once is allowed and does not change anything.

So this cannot be done using this validation. I think adding flags like:

  • finalized
  • canceled

When we handle finalize/cancel request we can consider the flags before we
modify the phase.

Copy link
Member Author

@bennyz bennyz Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is a draft to showcase a possible direction for validating, and I am in no rush to merge this. I still need to add unit tests and test this extensively, but even before that, we need to complete the spec, at least transitions part.

Multiple user requests will not fail (same goes for internal phase updates), but won't change the phase. But I am not sure if that is correct all the time, maybe in some cases the entire entity update should be discarded...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problematic flow I see is this:

  1. User ask to finalize
  2. System change phase to finalizing success
  3. System change state to the next phase
  4. User ask to finalize
  5. System changes state to finalizing success
  6. System remain stuck in finalizing success

And same for cancel.

I could reproduce this issue with virt-v2v:

  1. virt-v2v ask to finalize without closing imageio socket
  2. engine waits for imageio to remove the ticket
  3. imageio times out, because of the open socket
  4. virt-v2v request times out before engine replies
  5. engine wrongly advance to the next phase after imageio timeout
  6. virt-v2v cancel the transfer because of the finalize timeout
  7. engine get stuck in finalizing failure

This is a combinations of bugs in virt-v2v, imageio and engine
that cannot happen now because virt-v2v does close the socket now
before calling finalize, and imageio does not time out when there
are not active requests, and will remove the ticket immediately, but
this can happen with another clients for other reasons.

To fix these issues, we must ensure that requests to finalize
or cancel are handled exactly once.

Copy link
Member

@ahadas ahadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice approach

validTransitions.put(PAUSED_USER, EnumSet.of(RESUMING, CANCELLED_USER, CANCELLED_SYSTEM));
validTransitions.put(CANCELLED_USER, EnumSet.of(FINALIZING_CLEANUP));
validTransitions.put(CANCELLED_SYSTEM, EnumSet.of(FINALIZING_FAILURE));
validTransitions.put(FINALIZING_SUCCESS, EnumSet.of(FINALIZING_FAILURE, FINISHED_SUCCESS));
Copy link
Member

@ahadas ahadas Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the possible transition from FINALIZING_SUCCESS to FINALIZING_FAILURE seems suspicious but yeah, as Nir commented below - I take your word on those transitions :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's based on the code :)

if (isImageVerified) {
log.info("Image transfer '{}' was successful for {}", getCommandId(), getTransferDescription());
setCommandStatus(CommandStatus.SUCCEEDED);
} else {
updateEntityPhase(ImageTransferPhase.FINALIZING_FAILURE);
}

That said, the explicit transitions could very well be wrong

@bennyz bennyz force-pushed the validate-transfer-phase branch 2 times, most recently from 1bc13fc to 7f9a793 Compare July 28, 2022 08:02
@bennyz bennyz force-pushed the validate-transfer-phase branch 3 times, most recently from e4a7fc3 to 6c0fd3d Compare September 11, 2022 12:27
Copy link
Member

@ahadas ahadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor comment

@@ -26,17 +26,39 @@ public enum ImageTransferPhase implements Identifiable {
private String description;
private static final Map<Integer, ImageTransferPhase> valueToPhase = new HashMap<>();

// TODO: revisit this in the future to validate all transition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: revisit this in the future to validate all transition
// TODO: revisit this in the future to validate all transitions

Do not update the phase if the incoming phase is invalid. This only
affects external phase updates, when the user attempts to cancel or
finalize a transfer.

Bug-Url: bugzilla.redhat.com/2092816
@ahadas ahadas marked this pull request as ready for review September 13, 2022 12:14
@ahadas
Copy link
Member

ahadas commented Sep 13, 2022

/ost

@ahadas ahadas merged commit 622567a into oVirt:master Sep 13, 2022
@bennyz bennyz deleted the validate-transfer-phase branch October 7, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants