-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After update to RHEL 8.5 + latest virt:av from 8.5, ilibvirt IPI no longer works. #5401
Comments
The complete rpm transaction was:
|
In the system's log, the following messages are noticed (for workers):
|
this happens 100% when libvirt packages are updated from 7.0.0 to 7.6.0 (see versions above) and qemu-kvm was updated from 5.2.0 to 6.0.0 |
@luisarizmendi for awareness |
the bootstrap and masters are launched by Terraform with the freshly rebuilt installer. This still works. It is only the workers (launched by the 3-master cluster talking to libvirt) that no longer launch successfully. |
On a fresh cluster which failed to launch the workers, I see this:
|
|
And:
And then:
|
In the system's journal, I see these messages (didn't get those before when it worked):
|
Are there errors reported in the status of the worker Machines?
|
@staebler Yes, there are errors, let me get them to you.. This is really strange as I can set LIBVIRT_DEFAULT_URI to the same value as the one I set in my install-config and I can 'virsh start/stop/shutdown/whatever':
|
It first starts like this:
At that time, I am getting the following YAML
|
|
I'm waiting for the machine config to fail and will provide another yaml |
There's also one machineset (workers only):
|
I'm also seeing those messages in the system's log: `Nov 25 16:52:36 daltigoth libvirtd[1099001]: Operation not supported: can't update 'bridge' section of network 'ocp4d-c5tvf'
|
this seems somewhat similar to: The network created by libvirt ipi looks like this:
|
As this is an issue with creating workers, this does not appear to be an installer issue. I recommend opening an issue in https://github.com/openshift/cluster-api-provider-libvirt. |
@staebler this is interesting.. Where does the openshift installer get that provider from? All I'm doing is downloading the openshift-installer source code.. Is it amongst the dependencies that are downloaded by the go installer? |
The installer just sets up some infrastructure and give some configuration to the bootstrap VM to ultimately build the cluster. The various images that make up the cluster come from the release payload to which the installer belongs. For example, you can find details for the latest OCP release image (4.9.9) at https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.9.9/release.txt. |
Just ran into this as well - I'll look into Monday |
Hi @jaypoulz I was working with @cfergeau on this and we already have a BZ for this: In the interim, I've reverted from virt:av to virt:rhel |
@ElCoyote27 I encountered it in stock RHEL 8.5. I had to downgrade libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.aarch64 to 6.0.0-37.module+el8.5.0+12162+40884dd2.aarch64. So watch out for the latest 8.5 updates. CC @cfergeau |
@jaypoulz OMG, this sucks if we backported the problematic patch down to virt:rhel too. My 8.5 systems got the update on Feb 2nd, trying to confirm if this broke OCP for me too... |
Our systems got upgraded today, so the search for the breaking change was short. 😸 OpenShift installer was built on the latest 4.10 preview. We can use that to look up the libvirt-terraform version if need be. I'll try to get a reproducer Monday. |
My RHEL 8.5 system was running libvirt 6.0.0-37.module+el8.5.0+12162+40884dd2 |
I've updated https://bugzilla.redhat.com/show_bug.cgi?id=2038812 to provide the information you reported (and confirm your findings). |
I can confirm that the original issue I got on virt:av is back:
|
In the libvirtd log:
|
Is it possible that the installer uses terraform-provider-libvirt which in turn has this commit? dmacvicar/terraform-provider-libvirt@0d74474 Because if it is so, then it's actually terrraform-provider who swaps the arguments. BTW that commit is horribly wrong, let me comment on it. |
@zippy2 Hi Michael, the terraform part of the install process (bootstrap + master) works fine, it is the OCP-piloting-libvirt phase that broke recently in virt:rhel. at that point OCP is using the machine-config-api with an unencrypted libvirt priovate URI. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale. |
@zippy2 Looks like it, grepping dmacvicar in the source of 4.10.0, I see all this: |
In that case I'm not sure I can help. Sorry. Fixed packages were shipped ~3 months ago. I guess your best bet is to talk to developers of installer to fix their code. There's a solution suggested: dmacvicar/terraform-provider-libvirt@0d74474#commitcomment-68720367 Since terraform-provider talks directly to RPC, they have to take that extra step and check whether they are talking to a daemon that is fixed or not. Another advantage of using client library rather than talking on RPC directly. |
Nope, the installer is not using this commit
The problematic commit is dmacvicar/terraform-provider-libvirt@0d74474 which is described as installer/vendor/github.com/dmacvicar/terraform-provider-libvirt/libvirt/network_def.go Lines 83 to 92 in 0adfce2
|
The |
I've filed dmacvicar/terraform-provider-libvirt#950 which should fix the issue introduced in dmacvicar/terraform-provider-libvirt@0d74474 |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
The digitalocean patch was merged at about the time the stale bot was triggered on this issue :) I've updated the terraform-provider-libvirt PR dmacvicar/terraform-provider-libvirt#950 to make use of this. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi,
I have been using the ansible-based ocp_libvirt_ipi role for some time on RHEL (7.8, then 7.9 and 8.2 and 8.3).
The role leverages this code from the source code of the openshift installer.
Ever since patching my RHEL 8.5 hypervisors to the latest libvirt* packages from 'virt:av' stream, ocp_libvirt_ipi has been unable to deploy successfully: Terraform works but the freshly installed set of masters is unable to spawn 'workers'.
Broken cluster looks like this:
A working cluster looks like this (for me):
I have reproduced this with the code fro OCP 4.6, 4.7 and 4.8 and the results are the same.
The issue started occuring when the libvirt packages on my RHEL 8.5 hypervisors were updated from:
to:
The text was updated successfully, but these errors were encountered: