New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ovs-vswitchd deadlock #175
Comments
|
i also had a deadlock but sadly i don't know how to debug it. everything went back to normal after restart. |
|
I can confirm this bug, manifestation is exactly same. We can reproduce this bug easily, as it occurs several times a day. It is also in Openstack environment after update to Bionic, we are at 2.12.0-0ubuntu1 (compiled from latest ubuntu dsc) and 5.0.0-31-generic. Nothing helped so far.
Backtrace of stucked thread
Reported to mailing list of OVS as well, attached full backtrace. Now, any idea what we can do about this by first shot ? We can definitively reproduce, bug is occuring pretty frequently. From what i see, it seems thread is blocked for now apparent reason. What is also very strange is a way how we mitigate this problem - we simply do gcore coredump of ovs-vswitchd, which unblocks stucked thread and ovs-vswitchd continues to work normally. Even with this, its extremely annoying, especialy if you hit this while creating many VMs, etc... Anyone have suggestion on what we can do now with this ? |
|
Also to note, we tested 4.x kernels and even lower versions of OVS, nothing seems to help. So the problem will most propably be interaction with Ubuntu Bionic and some C libraries OVS is using. Will need to get some help from somebody faimiliar with OVS code to figure out what is going on here - we are ready to patch and test, but so far no idea where to begin |
|
@jharbott @ionutbiru anything new regards this from you ? |
|
This issue might be related and it seems that it not fixed yet in Ubuntu 18.04: |
|
@igsilya thanks for the lead, from what i see this can be the most likely cause. I will see about testing the patch in our environment. |
|
@igsilya @zdenekjanda That looks pretty interesting and would match with what we see here. I'll also trigger some testing at our side, but it will take a couple of days. |
|
So I did some testing with the patch for the glibc bug added to the Ubuntu Bionic version of it (2.27-3ubuntu1), but it did not resolve the ovs lockups for us. Going to try with newer glibc versions now, maybe there were some similar issues that got fixed. |
|
@jharbott i have tested the same and it didnt resolve the lockup. But, 2 days ago we have deployed 2.30-0ubuntu2 which is first version to have the patch in natively (brave step, added repos from eoan and installed libc-dev-bin libc6-dbg libc6-dev multiarch-support libc-bin libc6 libidn2-0 locales), and since then no occurences of the bug (before, we had stable like 10+ occurences a day). Issue is resolved, indeed its not just this patch but also somethin else, definitively i strongly sugest to avoid openvswitch with glibc 2.27-2.29. Will keep it few more days to see if it would appear again or if there are any issues with such mixed system. |
|
@zdenekjanda that is good news. We started a similar test with 2.29-0ubuntu2 from disco before I read your reply, we'll see on Monday whether that might already be good enough. If not, we will switch to eoan like you did. |
|
So 2.29-0ubuntu2 solves the issue for us, too. With that, I'm closing this issue, as it isn't caused by ovs, will work with Ubuntu to try and get the fix(es) into Bionic. |
|
Do you have pretty busy network traffic? we seem to also hit by the same issue on production, but cannot reproduce it in dev env. Curious to find the trigger.... |
|
https://sourceware.org/bugzilla/show_bug.cgi?id=23861 I think this bug report is pretty interesting.... |
This is similar to #153 but the backtrace that we are seeing is slightly different, happening on the "else" branch and thus not with multicast traffic. We haven't found a way to trigger this, but are seeing irregular occurances since we upgraded our OpenStack setup from Xenial to Bionic. The version of openvswitch is unchanged by this upgrade though, since on Xenial it was pulled in from the Queens UCA: 2.9.2-0ubuntu0.18.04.3, kernel 4.15.0-55.60.
The text was updated successfully, but these errors were encountered: