New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] LiveMigration fails because of same product_uuid on same model hardware servers #4025
Comments
@staedter per source code,if Seems you may try to update If after your modify (not sure Linux allows and does not revert your change), the info is not synced to NODE, kill the kubelet process in that NODE, the newly created kubelet will do it. But I suspect it will not survive after a NODE level rebooting. https://utcc.utoronto.ca/~cks/space/blog/linux/DMIDataInSysfs
update it in
|
Hello and thank you for the quick response. Manually editing the information in this path /sys/class/dmi/id/product_uuid is unfortunately not possible because the file AFAAIK is just an interface for certain kernel functions. In any case non of our methods to change, modify or override this file has actually worked, so that the migration went through successfully (mounting another file inplace of the original, was promising and updated the "more Information" section in Harvester GUI, but after a while it reverted back to the original and even in the meantime, the migrations did not succeed). The only way to edit it from inside the OS, would be to change those kernel files and recompile the whole kernel of the underlying OS, which is of course not a viable option for us. We are already in contact with our hardware vendor, to try to change the SMBIOS values that the kernel would read from the mainboard by reflashing the whole BIOS and it looks like a promising avenue for us because our vendor is very helpful and knowledgeable. But nonetheless I think that might not always be the case for every vendor, so that maybe this kind of edge case should be considered and caught from the harvester/hypervisor side as well. e.g.: Would it maybe possible to somehow make the default value read from the product_uuid overrideable with maybe a special annotation or something so that such a case would not require reflashing of the bare metal BIOSes on each server? |
It seems to be the first time for Harvester to encounter this, several machines are of same product_uuid, I tried to read them from my local PC and the VM on it, each is unique.
As the source code is in And it was ever discussed:
@guangbochen @bk201 We could also add this into Harvester requirements. |
Pre Ready-For-Testing Checklist
|
Automation e2e test issue: harvester/tests#844 |
We could fix the issue with our servers bei changing the SMBIOS information on our ASUS-Mainboards via the amidmi.exe tool. We had to create a FREEDOS-Bootstick and then use this command
After a reboot the /sys/class/dmi/id/product_uuid finally showed different results and kube-virt startet working correctly. Thank you for the help. Issue has been resolved |
Describe the bug
We are in the process of setting up an onprem Harvester bare-metal cluster and when we are trying to use the LiveMigration feature, we get the following error message in the events of the virtualmachine resource.
VirtualMachineInstance migration uid c9091bc7-60ce-49b4-84dd-83adb17fbd9d failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: Attempt to migrate guest to the same host 03000200-0400-0500-0006-000700080009')
This is in line with the fact that all our hosts have the same UUID in the "More Information" section of the host resource (e.g. two examples)
We have verified that on all nodes the contents of the SMBIOS interface files /sys/class/dmi/id/product_uuid is excactly the same value 03000200-0400-0500-0006-000700080009 but the contents of /sys/class/dmi/id/product_serial are consecutive numbers like 9000160160, 9000160161
We would like to know how we can fix this so, that concerning rancher all nodes are recognized as different nodes so that we can use the LiveMigration feature.
Best regards
Chris
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When I use the LiveMigration feature the chosen VM is migrated successfully to another node with a different UUID
Support bundle
I send the support bundle to harvester-support-bundle@suse.com with the correct issue ID
Environment
Additional context
As the screenshot shows we are also experiencing the AMD KVM issue described here #3900 but we are waiting for the release of the annouced patch.
The text was updated successfully, but these errors were encountered: