New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to move cluster with associated IPAddresses #9478
Comments
/triage accepted It looks like an IPAddress object with the same name already exists in your target cluster. Do you know how it got there? Did you try to run move before? |
because the I guess the fix would be to ignore |
We should definitely fix this up if there's a specific order expected. @schrej - do you have any opinions on a good solution here? |
For the in-cluster provider we'll definitely have to move the IPAddresses. Otherwise new addresses will be assigned in a random order, since the in-cluster provider, as the name suggests, only stores the state within the API. For providers interacting with external IPAM systems this is less of an issue, depending on implementation. Unfortunately I'm not very familiar with |
If we have to move them, I guess IPAddresses must be moved first, and then the claims. |
Thanks for the input @schrej - sounds like we might need to handle this case in One question on:
How does this cause the move to fail? The objects should be moved independently one after another and reconciliation should be paused. What's causing the spec to be mutated in this case? |
Probably because reconciliation for IPAddressClaim is not paused |
Oh that's a bug then - did it get the annotation? |
I have to recreate the cluster and try it again. |
Thank you! Just out of interest what infrastructure provider are you using (and what version)? I guess this issue could be in the reconciler not correctly processing a paused annotation. |
I'm building a provider for Proxmox, and I'm injecting static IPs with IPAM incluster. |
We don't have this issue in CAPV. Pausing on the IPAddressClaim isn't explicitly taken care of, but the claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers. |
Some comments looking at the threads
Move order is driven by owner references; see this recent thread about how it works
This is being discussed in #8388
+1 to this, I a machine is paused, it should not pick up a new address |
Not sure if that helps here. As far as I know the IPAddress resource is created by the IPAM provider not by CAPV. So it shouldn't help at all that CAPV is not reconciling at this point. |
@lubronzhan For awareness. I wonder if we have the same issue as well. But not sure if we use IPAM in the bootstrap cluster (yet). |
I remember this was taken care of by setting the move order or pausing the resources inside our own product or in CAPI. but i think it should be part of clusterctl I assume. |
Yeah, probably the |
In the incluster IPAM provider the IP Address Claims will be paused if the IP Pool is paused or if the cluster is paused. There was a PR to make the cluster pause reading more consistent here: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#167 (this change should be in v0.1.0-alpha.3 based on commit history). If that isn't working and you are using the latest version of the incluster provider then perhaps there is a bug in the provider, I'll have to try to reproduce the issue myself and see what might be going on. |
We added a change in the cluster-api-ipam-provider to respect the paused status of the parent cluster, but it needs the cluster annotation on the claim object to find the owning cluster. That work was done for CAPV but not any other infrastructure providers, as far as I know. What infrastructure provider are you using? |
well, the clusterName annotation doesn't work either for me, |
still failing: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#207 (comment) The |
Hi @mcbenjemaa! Sorry for the delay getting back on this. I mentioned the annotation but it looks like I was wrong. This got changed to Labels at some point. I was thinking of the old behavior. See: kubernetes-sigs/cluster-api-provider-vsphere@ba50ef2 Reading through the claim reconciliation code it looks like it is relying on the label for cluster name: https://github.com/kubernetes-sigs/cluster-api-ipam-provider-in-cluster/blob/b8ccdf8c362cd51ee120e50f8ec2b637f02e3755/internal/controllers/ipaddressclaim.go#L180 |
I was able to move my cluster (Proxmox w/ IPAM) by doing the following. On the destination cluster: # Disable the IPAM deployment
kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system
# Delete the mutating and validating webhooks
kubectl get mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration > mutatingwebhook.yaml
kubectl delete mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration
kubectl get validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration > webhook.yaml
kubectl delete validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration I then executed the After the destination cluster was updated, revert the few steps above to re-enable the IPAM controller kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system
kubectl apply -f mutatingwebhook.yaml
kubectl apply -f webhook.yaml Then I resumed the cluster reconciliation on destination, all of the Proxmox / IPAM resources synced, and both the workload cluster in CAPI, and the Proxmox cluster itself were all green. YMMV: I'm not sure if I'll see any unintended side effects by disabling some of the webhooks, but I'll be working through an upgrade of the management cluster in the next few weeks. |
/priority important-longterm |
What steps did you take and what happened?
I was trying to move a cluster from a bootstrap cluster to the new mgmt-cluster, which failed, because of the webhook.
logs:
What did you expect to happen?
I expect the clusterctl move succeed
Cluster API version
v1.5.1
Kubernetes version
v1.26.7
Anything else you would like to add?
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: