Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Waiting event of vsock running before use it when use_vsock enabled #1918

Closed
wants to merge 1 commit into from

Conversation

BetaXOi
Copy link

@BetaXOi BetaXOi commented Jul 29, 2019

Fixes: #1917

kata-runtime needs wait qemu report event of vsock is running.
see BetaXOi/qemu@9b536b6

Signed-off-by: Ning Bo n.b@live.com

@grahamwhaley
Copy link
Contributor

Nice find @BetaXOi - pulling in some folks to review...

Gopkg.toml Outdated Show resolved Hide resolved
virtcontainers/qemu.go Outdated Show resolved Hide resolved
Gopkg.lock Outdated Show resolved Hide resolved
@BetaXOi
Copy link
Author

BetaXOi commented Jul 29, 2019

Hello everyone, this PR is a simple methed to fix #1917, it's a demo just.
I think we should wait the 'report vsock event' has been merged in Qemu first, then we talk about method of QMP version check next.

@devimc
Copy link

devimc commented Jul 29, 2019

thanks @BetaXOi I'll mark this as WIP and DNM

@devimc devimc added do-not-merge PR has problems or depends on another wip Work in Progress (PR incomplete - needs more work or rework) labels Jul 29, 2019
@BetaXOi
Copy link
Author

BetaXOi commented Jul 31, 2019

@jodh-intel @markdryan I find a new way to fix it without check Qemu version, but need modify govmm, please help review.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BetaXOi - thanks for raising, but the govmm change needs to be raised as a PR for https://github.com/intel/govmm.

vendor/github.com/intel/govmm/qemu/qmp.go Show resolved Hide resolved
vendor/github.com/intel/govmm/qemu/qmp.go Show resolved Hide resolved
@amshinde
Copy link
Member

@stefanha This fell off my radar for a bit. Can you take a look at this PR?

BetaXOi added a commit to BetaXOi/qemu that referenced this pull request Aug 1, 2019
Report vsock running event so that the upper application can
control boot sequence.
see kata-containers/runtime#1918

Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
BetaXOi pushed a commit to BetaXOi/govmm that referenced this pull request Aug 1, 2019
The upper hyervisor manager application maybe need to wait some
QMP event to control boot sequence, but the event we wanted maybe
not exist in some older version, so we need query all QMP ABI and
check the event is supported or not.

related: kata-containers/runtime#1918

Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
BetaXOi pushed a commit to BetaXOi/govmm that referenced this pull request Aug 1, 2019
The upper hyervisor manager application maybe need to wait some
QMP event to control boot sequence, but the event we wanted maybe
not exist in some older version, so we need query all QMP ABI and
check the event is supported or not.

related: kata-containers/runtime#1918

Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
@BetaXOi
Copy link
Author

BetaXOi commented Aug 2, 2019

I have updated the commit with revendor govmm, please help review

@jodh-intel
Copy link
Contributor

Thanks @BetaXOi. Travis is failing and if you look at the log:

Found 1 commit between commit bdbecabafd419bdb6a992502553e45c9cc10f4bf and branch master
ERROR: Commit bdbecabafd419bdb6a992502553e45c9cc10f4bf: Failed to find subsystem in subject: "Waiting event of vsock running before use it when use_vsock enabled"
ERROR: checkcommits failed. See the document below for help on formatting
commits for the project.

https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format

https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format gives a full explanation and examples but in your case, I'd change your commit message to the following to keep Travis happy:

vsock: Waiting event of vsock running before use it when use_vsock enabled

Wait for qemu to report that vsock is running. See BetaXOi/qemu@9b536b6.

Fixes: #1917.

Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>


q.Logger().Info("waiting event of vsock running")
for {
ev := <-eventCh
Copy link
Member

@amshinde amshinde Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ningbo9 @BetaXOi Is there a possibility of a race here ie you may have missed the vosck "running" event while you are querying for the schema? If so, I would suggest listening on the channel in a separate go-routine that is started as early as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current PR, query-qmp-schema is almost the first QMP command (because qmp_capabilities must be executed before any other QMP commands are executed) and is called immediately after LaunchQemu, and eventCh receives the event until we read it.

I have no idea how to get this process earlier.

…abled

If `kata-runtime` connect vsock device in guest but the device is not start
ready, the 'connect' will block and timeout after 2 sencond. This will cause
the boot is slower 2 second when use vsock than serial.
So we should wait 'VSOCK_RUNNING' QMP event before connect to avoid this case.

Fixes: kata-containers#1917

Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
@BetaXOi
Copy link
Author

BetaXOi commented Aug 5, 2019

@devimc I have updated the commit as suggested, the PR is ready to be merge, could you remove the 'wip' label and trigger the CI?

@grahamwhaley grahamwhaley removed the do-not-merge PR has problems or depends on another label Aug 5, 2019
@grahamwhaley grahamwhaley removed the wip Work in Progress (PR incomplete - needs more work or rework) label Aug 5, 2019
@grahamwhaley
Copy link
Contributor

/test

EventCh: eventCh,
Logger: newQMPLogger(),
// response of qmp.ExecuteQMPCapabilities bigger than default 64k in bufio.NewScanner()
MaxCapacity: 512 * 1024,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a constant ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, we don't need to modify this value. We only need to keep the default value. In special cases, we need to modify the value according to the actual situation, but we can't find a suitable value to adapt to any situation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, can 512 * 1024 be a constant and use it here
for example:

MaxCapacity: QMPCapabilitiesMaxCap,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the benefit of doing this is that it is convenient to do unit testing, right?
So, do I need to add an additional test case?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, defining constant is useful for unit testing, not sure if it's possible to add unit tests for this function, but if you can, go for it please

break Loop
}
case <-time.After(time.Duration(timeout)*time.Second - time.Since(timeStart)):
q.Logger().Info("waiting event of vsock running")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this log is not needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it in next commit

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can move it before for

@amshinde
Copy link
Member

amshinde commented Aug 6, 2019

Hi @BetaXOi, if I understand correctly, this patch depends on a qemu patch to send the qmp event when vsock is running.
Is the patch merged in the qemu upstream?

@ningbo9 @BetaXOi Is there a possibility of a race here ie you may have missed the vosck "running" event while you are querying for the schema? If so, I would suggest listening on the channel in a separate go-routine that is started as early as possible.

Also, can you address the above comment? This seems to have been marked as resolved.

@BetaXOi
Copy link
Author

BetaXOi commented Aug 7, 2019

Hi @BetaXOi, if I understand correctly, this patch depends on a qemu patch to send the qmp event when vsock is running.

Yes, if qemu does not support reporting the VSOCK_RUNNING event, kata will not wait for the event, the entire process rolls back to the original state, as I described in #1917.

Is the patch merged in the qemu upstream?

No, the patch has not been merged yet
https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg00492.html

@stefanha
Copy link

stefanha commented Aug 9, 2019

@amshinde Thanks for letting me know. I'll discuss this upstream in QEMU.

@amshinde
Copy link
Member

@BetaXOi Since there is discussion on the qemu patch on the right way to fix this, lets wait on qemu patch being merged first.
@BetaXOi Let us know if you plan to look at the vhost_vsock.ko problem that @stefanha has pointed out.
Marking this as "do-not-merge" till then.

@amshinde amshinde added the do-not-merge PR has problems or depends on another label Aug 14, 2019
@egernst
Copy link
Member

egernst commented Sep 12, 2019

Any updates here @amshinde @stefanha ?

@stefanha
Copy link

@BetaXOi Do you want to respond to my qemu-devel mailing list reply so we can make progress on this issue? https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01672.html

@raravena80
Copy link
Member

@BetaXOi @stefanha any updates on this PR? Thx!

@stefanha
Copy link

@raravena80 I haven't seen a response from @BetaXOi:
https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01672.html

@raravena80
Copy link
Member

@BetaXOi ping.

Your weekly Kata herder.

@BetaXOi
Copy link
Author

BetaXOi commented Dec 21, 2019

After some discussion through the mail, it may be possible to solve the problem with a reverse connection. Close this PR first, I will submit a new one.

FYI: https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg04233.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge PR has problems or depends on another
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sandbox creating is slower when use_vsock enabled
9 participants