Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for new hotplug attachment pod to be ready #10275

Merged
merged 4 commits into from Aug 14, 2023

Conversation

awels
Copy link
Member

@awels awels commented Aug 11, 2023

What this PR does / why we need it:
Before deleting the old attachment pod, wait for the new attachment pod to be ready, so k8s will not detach the volume from the node since there will always be a pod using the volume from its perspective. This should prevent issues with certain CSI drivers that don't respect the device being in use before detaching it from the node. This will now break csi drivers that don't allow multiple pods on the same node to use the volume (this is a bug in that driver).

Fixed issue where when adding or removing a volume the existing volumes would still have the UID of the old attachment pod in the VMI status which caused errors to appear in the virt-handler logs about not being able to find the device or image. This can be observed with SyncFailed events on the VMI

Fixed issue where the cleanup would attempt to remove a volume that was already gone causing errors to appear in the virt-handler log. This can be observed with SyncFailed events on the VMI

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Ensure new hotplug attachment pod is ready before deleting old attachment pod

Before deleting old attachment pod, wait for new attachment
pod to be ready, so k8s should not detach the volume from the
node since there will always be a pod using the volume from
its perspective.

Fixed issue where when adding or removing a volume the existing
volumes would still have the UID of the old attachment pod in
the VMI status which caused errors to appear in the virt-handler
logs about not being able to find the device or image.

Fixed issue where the cleanup would attempt to remove a volume
that was already gone causing errors to appear in the virt-handler
log.

Signed-off-by: Alexander Wels <awels@redhat.com>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Aug 11, 2023
@awels
Copy link
Member Author

awels commented Aug 11, 2023

Looks like I have some flaky tests, investigating

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2023
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2023
@davidvossel
Copy link
Member

/hold

feel free to remove hold when flaky test is sorted out

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 11, 2023
Signed-off-by: Alexander Wels <awels@redhat.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2023
Copy link
Contributor

@prnaraya prnaraya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, I just left a couple small comments/nits.

@@ -1816,15 +1812,29 @@ func (c *VMIController) waitForFirstConsumerTemporaryPods(vmi *virtv1.VirtualMac
}

func (c *VMIController) needsHandleHotplug(hotplugVolumes []*virtv1.Volume, hotplugAttachmentPods []*k8sv1.Pod) bool {
if len(hotplugAttachmentPods) > 1 {
return true
}
// Determine if the ready volumes have changed compared to the current pod
for _, attachmentPod := range hotplugAttachmentPods {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are returning when len(hotplugAttachmentPods) > 1, is this loop still necessary? Since the number of pods in hotplugAttachmentPods at this point would only be either 0 or 1, you might be able to change this to an if/else (to check if length is 0 or 1), rather than a loop that only runs a single iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 507 to 514
} else {
} else if err != nil {
return err
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be kept as is. (If err is nil, it will return nil anyway).

} else {
     return err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Alexander Wels <awels@redhat.com>
@awels
Copy link
Member Author

awels commented Aug 12, 2023

/retest-required
/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2023
@awels
Copy link
Member Author

awels commented Aug 13, 2023

/test pull-kubevirt-e2e-k8s-1.26-sig-compute

@awels
Copy link
Member Author

awels commented Aug 13, 2023

/test pull-kubevirt-e2e-k8s-1.26-sig-storage
/test pull-kubevirt-e2e-k8s-1.27-sig-storage

Co-authored-by: Vasiliy Ulyanov <vulyanov@suse.de>
Signed-off-by: Alexander Wels <awels@redhat.com>
@awels
Copy link
Member Author

awels commented Aug 14, 2023

Agreed, I ran through a bunch of tests and I believe you are right not sure what we were testing there, but already exists cannot happen.

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2023
@awels
Copy link
Member Author

awels commented Aug 14, 2023

/test pull-kubevirt-e2e-kind-1.27-vgpu

@kubevirt-bot kubevirt-bot merged commit d4fdeef into kubevirt:main Aug 14, 2023
37 checks passed
@awels
Copy link
Member Author

awels commented Aug 14, 2023

/cherrypick release-1.0

@kubevirt-bot
Copy link
Contributor

@awels: new pull request created: #10292

In response to this:

/cherrypick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@watermelon-brother
Copy link

Similar Design like this pr #6728

@awels
Copy link
Member Author

awels commented Oct 27, 2023

Yes, with one difference, we ensure the new pod is in running state before deleting the old pod. IIRC that PR didn't do that. #10567 is an improvement adding a rate limit to how fast we switch attachment pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants