Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt-launcher, Fix Guest Agent updates causing an event handling deadlock #4253

Merged
merged 4 commits into from Sep 30, 2020

Conversation

oshoval
Copy link
Contributor

@oshoval oshoval commented Sep 24, 2020

When guest-agent is installed, it will send update events,
which will send events to a channel of virt-launcher.
This channel has two purposes only:

  1. Catch the first ADD event.
  2. Catch the delete event.

Since this channel is sampled for checking of delete event
only after the qemu PID is gone, this channel can get full, in case wrong events are sent to it.

In such case the event handler (eventCallback) will hang forever
(because it tries to send to a full channel).
Guest agent wont be restarted and events wont be handled.

Since we just need ADD events and DELETE events
we dont need to send events to this channel upon
guest agent updates.

Moreover there was a double send to the virt-handler
which isnt needed as well.

Add mechanism to avoid sending unnecessary events to the channel.

Found as part of working on
https://bugzilla.redhat.com/show_bug.cgi?id=1874812

Signed-off-by: Or Shoval oshoval@redhat.com

None

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 24, 2020
@oshoval oshoval changed the title virt-launcher, fix reboot loop causing event handling hang virt-launcher, Fix reboot loop causing event handling hang Sep 24, 2020
@oshoval oshoval changed the title virt-launcher, Fix reboot loop causing event handling hang WIP virt-launcher, Fix reboot loop causing event handling hang Sep 24, 2020
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 24, 2020
…lock

When guest-agent is installed, it will send update events,
which will send events to a channel of virt-launcher.
This channel has two meanings only:
1. Catch the first ADD event.
2. Catch the delete event.

Since this channel is sampled for checking of delete event
only after the qemu PID is gone, this channel can get full.

In such case the event handler (eventCallback) will hang forever.
Guest agent wont be restarted and events wont be handled.

Since we just need ADD events and DELETE events
we dont need to send events to this channel upon
guest agent updates.

Signed-off-by: Or Shoval <oshoval@redhat.com>
There is no need to send the event twice to virt-handler

Signed-off-by: Or Shoval <oshoval@redhat.com>
The log also is not right, it should be logged only
in case the osInfo indeed changed, and it doesnt add
important info beside that we are in this section,
and we have prints before and after it anyway.

Signed-off-by: Or Shoval <oshoval@redhat.com>
@oshoval oshoval changed the title WIP virt-launcher, Fix reboot loop causing event handling hang virt-launcher, Fix Guest Agent updates causing an event handling deadlock Sep 24, 2020
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 24, 2020
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

maybe we should make a wrapper function around the events <- event logic that ensures we only send the event to that channel if it is an ADD event or a MODIFY event with a domain with DeletionTimestamp != nil.

@oshoval
Copy link
Contributor Author

oshoval commented Sep 24, 2020

good catch.

maybe we should make a wrapper function around the events <- event logic that ensures we only send the event to that channel if it is an ADD event or a MODIFY event with a domain with DeletionTimestamp != nil.

This is exactly what i did at first and removed
d22c7ba
i will return it, thanks

Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check if we need to make this thread safe.

pkg/virt-launcher/notify-client/client.go Outdated Show resolved Hide resolved
Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify the closure since the caller knows the kind of event.

pkg/virt-launcher/notify-client/client.go Show resolved Hide resolved
Since the channel is meant only for the first add and delete
events, its better to protected it, because only the first
events are needed, and the delete event will be consumed
only after the qemu PID is gone.
So this channel should not receive unexpected events,
because it will cause a deadlock.

Signed-off-by: Or Shoval <oshoval@redhat.com>
@oshoval
Copy link
Contributor Author

oshoval commented Sep 29, 2020

/test pull-kubevirt-e2e-k8s-cnao-1.17

@openshift-ci-robot
Copy link
Collaborator

@oshoval: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test images

Use /test all to run all jobs.

In response to this:

/test pull-kubevirt-e2e-k8s-cnao-1.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@qinqon
Copy link
Contributor

qinqon commented Sep 29, 2020

/lgtm

Although the function updateEVents smell a little since it has mixed responsabilities, without proper naming.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2020
@oshoval
Copy link
Contributor Author

oshoval commented Sep 29, 2020

/lgtm

Although the function updateEVents smell a little since it has mixed responsabilities, without proper naming.

imho its in order to keep things simpler

@alonSadan
Copy link
Contributor

looks good. Thanks.
One thing I didn't understand is why a closure is needed and not a regular func, can you please explain?

@oshoval
Copy link
Contributor Author

oshoval commented Sep 29, 2020

looks good. Thanks.
One thing I didn't understand is why a closure is needed and not a regular func, can you please explain?

just in order to have the variables static (so they wont die between runs, because we need them to be persistent), and not use global for this purpose, so a closure gives this behavior

https://stackoverflow.com/questions/30558071/static-local-variable-in-go

@alonSadan
Copy link
Contributor

looks good. Thanks.
One thing I didn't understand is why a closure is needed and not a regular func, can you please explain?

just in order to have the variables static (so they wont die between runs, because we need them to be persistent), and not use global for this purpose, so a closure gives this behavior

https://stackoverflow.com/questions/30558071/static-local-variable-in-go

Thanks for explaining:)
/lgtm

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2020
@oshoval
Copy link
Contributor Author

oshoval commented Sep 29, 2020

Thanks
/retest

@kubevirt-bot kubevirt-bot merged commit 507ce92 into kubevirt:master Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants