New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snabbswitch traffic crash #574
Comments
|
@virtualopensystems-nnikolaev Understood. This sounds related to issues that @mwiget has seen recently too. You will need to accept a Github Invitation that I sent before I can assign the issue to you. Interesting development to see corporate-sponsored usernames on Github. I wonder if I could rename to coke-redbull-lukego and rake in some sponsorship bucks ;-) |
|
@virtualopensystems-nnikolaev btw: you guys could consider creating a branch called e.g. I definitely see it as a positive for contributors to get proper credit (both as individuals and companies) so that positive contributions to the upstream repo will be a valuable marketing tool. |
|
@virtualopensystems-nnikolaev This is a very good Issue you have written :). I am also interested in fixing this. Can I help somehow? |
|
@lukego, first we need to understand what it means for the kernel to send those Then - we should be able to detect that a re-initialization is happening and to shutdown our virtqs properly. I also very interesting that |
|
Here is a relevant qemu-devel thread suggesting that the intended vhost-user behavior in QEMU is not fully locked down with respect to guest reboot vs. guest driver unload/reload. Could be related to the issue we are seeing. @eugeneia is working on adding a CI test case where the guest reloads the |
|
I built and got the following: |
|
@eugeneia OK. Looks like an unrelated bug. I think we should cover this in the CI tests. I think we should also Dockerize the bootstrapping of kernels and images: I don't think we should ever be doing this stuff by hand anymore. You interested in handling that? There is a Docker feature called "volumes" that seems relevant: one container could build the artifacts (kernel and guest image) and then export them as a "volume" that another container (SnabbBot) could access. This should make it possible to mix-and-match the same SnabbBot with many different kernel/image combos. (Could also be that this is the wrong solution.) |
|
@lukego Can we move the docker discussion into a separate issue? |
|
Tracing down the issue, we found out that under high load the This behavior is not seen provided that the following It seems that the critical part is to enable X2APIC support in the guest, which forwards the interrupts to the KVM instead of using the QEMU emulated interrupt controller. |
|
@nnikolaev-virtualopensystems Does this mean this issue is “fixed”? Can we do anything further than ensuring the kernel was compiled with the right parameters? |
This issue is related to #560. However described is a method to reproduce it with Linux guest.
While sending traffic to a VM connected to snabbswitch it crashes with:
To reproduce, checkout the Linux Ubuntu Trusty kernel:
Enable the following in the `.config' file:
Then compile with
make bzImage -j8. The qemu command to run the test is:The snabbswitch ports file is:
The traffic is a TCP stream send from an external machine like that:
Follpwoing are the observations made till now:
-smp 1is ussed, the issue does not appear.cat debian.master/config/config.common.ubuntu debian.master/config/amd64/config.common.amd64 debian.master/config/amd64/config.flavour.generic > .configvhost_user: request set_vring_call 13. Which probably is a sign of the guest kernel trying to disable the interrupts. After sending this call several times in the end it sends:And the crash occurs.
get_featuresandset_vring_numare part of virtq initialization which probably means that the device is reset.@lukego we are investigating the issue further and will post more info and eventually patches. Please assign this issue to me.
The text was updated successfully, but these errors were encountered: