Fail to move cluster with associated IPAddresses #9478

mcbenjemaa · 2023-09-21T10:54:19Z

What steps did you take and what happened?

I was trying to move a cluster from a bootstrap cluster to the new mgmt-cluster, which failed, because of the webhook.

logs:

Creating IPAddress="mgmt-cluster-control-plane-9p4r4" Namespace="default"
Object already exists, updating IPAddress="mgmt-cluster-control-plane-9p4r4" Namespace="default"
Retrying with backoff Cause="error updating \"ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress\" default/mgmt-cluster-control-plane-9p4r4: admission webhook \"validation.ipaddress.ipam.cluster.x-k8s.io\" denied the request: spec: Forbidden: the spec of IPAddress is immutable

What did you expect to happen?

I expect the clusterctl move succeed

Cluster API version

v1.5.1

Kubernetes version

v1.26.7

Anything else you would like to add?

Error: [action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-pwhfv: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable, action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-6xk8h: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable, action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-9p4r4: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable]

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

The text was updated successfully, but these errors were encountered:

killianmuldoon · 2023-09-21T11:06:33Z

/triage accepted

It looks like an IPAddress object with the same name already exists in your target cluster. Do you know how it got there? Did you try to run move before?

mcbenjemaa · 2023-09-21T11:15:20Z

because the clusterctl move actually moves the IPAddressClaim before IPAddress,
and that makes it fail.

I guess the fix would be to ignore IPAddresses as they will re-created by the controller via IPAddressClaim.

killianmuldoon · 2023-09-21T11:18:08Z

We should definitely fix this up if there's a specific order expected.

@schrej - do you have any opinions on a good solution here?

schrej · 2023-09-21T20:02:16Z

For the in-cluster provider we'll definitely have to move the IPAddresses. Otherwise new addresses will be assigned in a random order, since the in-cluster provider, as the name suggests, only stores the state within the API. For providers interacting with external IPAM systems this is less of an issue, depending on implementation.

Unfortunately I'm not very familiar with clusterctl move, so I'll probably need to learn a bit before I can give a good answer, and I don't know when I'll have time to do so, sorry.
If the order can be influenced that might be the easiest solution. If there are annotations present for moved resources we could add exceptions to validation.

mcbenjemaa · 2023-09-22T09:21:31Z

If we have to move them, I guess IPAddresses must be moved first, and then the claims.

killianmuldoon · 2023-09-22T09:30:22Z

Thanks for the input @schrej - sounds like we might need to handle this case in clusterctl move.

One question on:

because the clusterctl move actually moves the IPAddressClaim before IPAddress, and that makes it fail.

How does this cause the move to fail? The objects should be moved independently one after another and reconciliation should be paused. What's causing the spec to be mutated in this case?

mcbenjemaa · 2023-09-22T09:32:05Z

How does this cause the move to fail? The objects should be moved independently one after another and reconciliation should be paused. What's causing the spec to be mutated in this case?

Probably because reconciliation for IPAddressClaim is not paused

killianmuldoon · 2023-09-22T09:32:46Z

Probably because reconciliation for IPAddressClaim is not paused

Oh that's a bug then - did it get the annotation?

mcbenjemaa · 2023-09-22T09:35:34Z

I have to recreate the cluster and try it again.

killianmuldoon · 2023-09-22T09:38:25Z

I have to recreate the cluster and try it again.

Thank you! Just out of interest what infrastructure provider are you using (and what version)? I guess this issue could be in the reconciler not correctly processing a paused annotation.

mcbenjemaa · 2023-09-22T09:39:54Z

I'm building a provider for Proxmox, and I'm injecting static IPs with IPAM incluster.

killianmuldoon · 2023-09-22T09:42:27Z

~~It looks to me that we may have this issue in CAPV aswell - need to dive into it.~~

We don't have this issue in CAPV. Pausing on the IPAddressClaim isn't explicitly taken care of, but the claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

fabriziopandini · 2023-09-25T20:53:18Z

Some comments looking at the threads

because the clusterctl move actually moves the IPAddressClaim before IPAddress, and that makes it fail

Move order is driven by owner references; see this recent thread about how it works

Pausing on the IPAddressClaim isn't explicitly taken care of

This is being discussed in #8388

The claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

+1 to this, I a machine is paused, it should not pick up a new address

sbueringer · 2023-11-16T12:35:23Z

The claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

+1 to this, I a machine is paused, it should not pick up a new address

Not sure if that helps here. As far as I know the IPAddress resource is created by the IPAM provider not by CAPV. So it shouldn't help at all that CAPV is not reconciling at this point.

sbueringer · 2023-11-16T12:36:05Z

@lubronzhan For awareness. I wonder if we have the same issue as well. But not sure if we use IPAM in the bootstrap cluster (yet).

lubronzhan · 2023-11-17T06:24:01Z

I remember this was taken care of by setting the move order or pausing the resources inside our own product or in CAPI. but i think it should be part of clusterctl I assume.
cc @christianang could you confirm? Thank you

mcbenjemaa · 2023-11-17T14:46:11Z

Yeah, probably the IPAddressClaim has no pausing.
and the order should also be re-considered.

christianang · 2023-11-17T23:32:59Z

In the incluster IPAM provider the IP Address Claims will be paused if the IP Pool is paused or if the cluster is paused. There was a PR to make the cluster pause reading more consistent here: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#167 (this change should be in v0.1.0-alpha.3 based on commit history).

If that isn't working and you are using the latest version of the incluster provider then perhaps there is a bug in the provider, I'll have to try to reproduce the issue myself and see what might be going on.

adobley · 2023-11-17T23:42:24Z

We added a change in the cluster-api-ipam-provider to respect the paused status of the parent cluster, but it needs the cluster annotation on the claim object to find the owning cluster.
kubernetes-sigs/cluster-api-ipam-provider-in-cluster#105

That work was done for CAPV but not any other infrastructure providers, as far as I know.
kubernetes-sigs/cluster-api-provider-vsphere#1857

What infrastructure provider are you using?
Do you have an example of one of your claims? Does it have the cluster annotation?

mcbenjemaa · 2023-11-22T13:58:42Z

well, the clusterName annotation doesn't work either for me,

mcbenjemaa · 2023-11-30T11:10:08Z

still failing: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#207 (comment)

The IPAddresses are not paused while the cluster-name annotation is there.

adobley · 2023-12-13T21:51:06Z

Hi @mcbenjemaa!

Sorry for the delay getting back on this. I mentioned the annotation but it looks like I was wrong. This got changed to Labels at some point. I was thinking of the old behavior. See: kubernetes-sigs/cluster-api-provider-vsphere@ba50ef2

Reading through the claim reconciliation code it looks like it is relying on the label for cluster name: https://github.com/kubernetes-sigs/cluster-api-ipam-provider-in-cluster/blob/b8ccdf8c362cd51ee120e50f8ec2b637f02e3755/internal/controllers/ipaddressclaim.go#L180

robwittman · 2024-01-04T02:36:28Z

I was able to move my cluster (Proxmox w/ IPAM) by doing the following. On the destination cluster:

# Disable the IPAM deployment
kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system 
# Delete the mutating and validating webhooks 
kubectl get mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration > mutatingwebhook.yaml
kubectl delete mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration
kubectl get validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration > webhook.yaml
kubectl delete validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration

I then executed the clusterctl move command, which was able to create all the required resources on the destination cluster. The last few delete steps still failed for me on the source cluster, but that cluster is ephemeral anyways.

After the destination cluster was updated, revert the few steps above to re-enable the IPAM controller

kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system 
kubectl apply -f mutatingwebhook.yaml
kubectl apply -f webhook.yaml

Then I resumed the cluster reconciliation on destination, all of the Proxmox / IPAM resources synced, and both the workload cluster in CAPI, and the Proxmox cluster itself were all green.

YMMV: I'm not sure if I'll see any unintended side effects by disabling some of the webhooks, but I'll be working through an upgrade of the management cluster in the next few weeks.

fabriziopandini · 2024-04-11T19:54:09Z

/priority important-longterm

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 21, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 21, 2023

mcbenjemaa mentioned this issue Nov 22, 2023

clusterctl move fails due to IPAddresses Exists kubernetes-sigs/cluster-api-ipam-provider-in-cluster#207

Closed

mcbenjemaa mentioned this issue Dec 5, 2023

REQUEST: New membership for mcbenjemaa kubernetes/org#4606

Closed

9 tasks

mcbenjemaa mentioned this issue Dec 14, 2023

✨ Make sure IPAddressClaim has a cluster name label. #9883

Closed

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024

fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 8, 2024

chrischdi added the area/ipam Issues or PRs related to ipam label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to move cluster with associated IPAddresses #9478

Fail to move cluster with associated IPAddresses #9478

mcbenjemaa commented Sep 21, 2023

killianmuldoon commented Sep 21, 2023

mcbenjemaa commented Sep 21, 2023

killianmuldoon commented Sep 21, 2023

schrej commented Sep 21, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023 •

edited

fabriziopandini commented Sep 25, 2023

sbueringer commented Nov 16, 2023

sbueringer commented Nov 16, 2023

lubronzhan commented Nov 17, 2023 •

edited

mcbenjemaa commented Nov 17, 2023

christianang commented Nov 17, 2023

adobley commented Nov 17, 2023

mcbenjemaa commented Nov 22, 2023 •

edited

mcbenjemaa commented Nov 30, 2023 •

edited

adobley commented Dec 13, 2023 •

edited

robwittman commented Jan 4, 2024

fabriziopandini commented Apr 11, 2024

Fail to move cluster with associated IPAddresses #9478

Fail to move cluster with associated IPAddresses #9478

Comments

mcbenjemaa commented Sep 21, 2023

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

killianmuldoon commented Sep 21, 2023

mcbenjemaa commented Sep 21, 2023

killianmuldoon commented Sep 21, 2023

schrej commented Sep 21, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023

mcbenjemaa commented Sep 22, 2023

killianmuldoon commented Sep 22, 2023 • edited

fabriziopandini commented Sep 25, 2023

sbueringer commented Nov 16, 2023

sbueringer commented Nov 16, 2023

lubronzhan commented Nov 17, 2023 • edited

mcbenjemaa commented Nov 17, 2023

christianang commented Nov 17, 2023

adobley commented Nov 17, 2023

mcbenjemaa commented Nov 22, 2023 • edited

mcbenjemaa commented Nov 30, 2023 • edited

adobley commented Dec 13, 2023 • edited

robwittman commented Jan 4, 2024

fabriziopandini commented Apr 11, 2024

killianmuldoon commented Sep 22, 2023 •

edited

lubronzhan commented Nov 17, 2023 •

edited

mcbenjemaa commented Nov 22, 2023 •

edited

mcbenjemaa commented Nov 30, 2023 •

edited

adobley commented Dec 13, 2023 •

edited