Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to move cluster with associated IPAddresses #9478

Open
mcbenjemaa opened this issue Sep 21, 2023 · 24 comments
Open

Fail to move cluster with associated IPAddresses #9478

mcbenjemaa opened this issue Sep 21, 2023 · 24 comments
Labels
area/ipam Issues or PRs related to ipam help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mcbenjemaa
Copy link
Member

What steps did you take and what happened?

I was trying to move a cluster from a bootstrap cluster to the new mgmt-cluster, which failed, because of the webhook.

logs:

Creating IPAddress="mgmt-cluster-control-plane-9p4r4" Namespace="default"
Object already exists, updating IPAddress="mgmt-cluster-control-plane-9p4r4" Namespace="default"
Retrying with backoff Cause="error updating \"ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress\" default/mgmt-cluster-control-plane-9p4r4: admission webhook \"validation.ipaddress.ipam.cluster.x-k8s.io\" denied the request: spec: Forbidden: the spec of IPAddress is immutable

What did you expect to happen?

I expect the clusterctl move succeed

Cluster API version

v1.5.1

Kubernetes version

v1.26.7

Anything else you would like to add?

Error: [action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-pwhfv: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable, action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-6xk8h: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable, action failed after 10 attempts: error updating "ipam.cluster.x-k8s.io/v1alpha1, Kind=IPAddress" default/mgmt-cluster-control-plane-9p4r4: admission webhook "validation.ipaddress.ipam.cluster.x-k8s.io" denied the request: spec: Forbidden: the spec of IPAddress is immutable]

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 21, 2023
@killianmuldoon
Copy link
Contributor

/triage accepted

It looks like an IPAddress object with the same name already exists in your target cluster. Do you know how it got there? Did you try to run move before?

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 21, 2023
@mcbenjemaa
Copy link
Member Author

because the clusterctl move actually moves the IPAddressClaim before IPAddress,
and that makes it fail.

I guess the fix would be to ignore IPAddresses as they will re-created by the controller via IPAddressClaim.

@killianmuldoon
Copy link
Contributor

We should definitely fix this up if there's a specific order expected.

@schrej - do you have any opinions on a good solution here?

@schrej
Copy link
Member

schrej commented Sep 21, 2023

For the in-cluster provider we'll definitely have to move the IPAddresses. Otherwise new addresses will be assigned in a random order, since the in-cluster provider, as the name suggests, only stores the state within the API. For providers interacting with external IPAM systems this is less of an issue, depending on implementation.

Unfortunately I'm not very familiar with clusterctl move, so I'll probably need to learn a bit before I can give a good answer, and I don't know when I'll have time to do so, sorry.
If the order can be influenced that might be the easiest solution. If there are annotations present for moved resources we could add exceptions to validation.

@mcbenjemaa
Copy link
Member Author

If we have to move them, I guess IPAddresses must be moved first, and then the claims.

@killianmuldoon
Copy link
Contributor

Thanks for the input @schrej - sounds like we might need to handle this case in clusterctl move.

One question on:

because the clusterctl move actually moves the IPAddressClaim before IPAddress, and that makes it fail.

How does this cause the move to fail? The objects should be moved independently one after another and reconciliation should be paused. What's causing the spec to be mutated in this case?

@mcbenjemaa
Copy link
Member Author

How does this cause the move to fail? The objects should be moved independently one after another and reconciliation should be paused. What's causing the spec to be mutated in this case?

Probably because reconciliation for IPAddressClaim is not paused

@killianmuldoon
Copy link
Contributor

Probably because reconciliation for IPAddressClaim is not paused

Oh that's a bug then - did it get the annotation?

@mcbenjemaa
Copy link
Member Author

I have to recreate the cluster and try it again.

@killianmuldoon
Copy link
Contributor

I have to recreate the cluster and try it again.

Thank you! Just out of interest what infrastructure provider are you using (and what version)? I guess this issue could be in the reconciler not correctly processing a paused annotation.

@mcbenjemaa
Copy link
Member Author

I'm building a provider for Proxmox, and I'm injecting static IPs with IPAM incluster.

@killianmuldoon
Copy link
Contributor

killianmuldoon commented Sep 22, 2023

It looks to me that we may have this issue in CAPV aswell - need to dive into it.

We don't have this issue in CAPV. Pausing on the IPAddressClaim isn't explicitly taken care of, but the claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

@fabriziopandini
Copy link
Member

Some comments looking at the threads

because the clusterctl move actually moves the IPAddressClaim before IPAddress, and that makes it fail

Move order is driven by owner references; see this recent thread about how it works

Pausing on the IPAddressClaim isn't explicitly taken care of

This is being discussed in #8388

The claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

+1 to this, I a machine is paused, it should not pick up a new address

@sbueringer
Copy link
Member

The claims are only reconciled as part of the flow for VSphereVM. As long as those are paused the reconciliation for IPAddressClaims is paused. This is the same pattern for other dependent objects in reconcilers.

+1 to this, I a machine is paused, it should not pick up a new address

Not sure if that helps here. As far as I know the IPAddress resource is created by the IPAM provider not by CAPV. So it shouldn't help at all that CAPV is not reconciling at this point.

@sbueringer
Copy link
Member

@lubronzhan For awareness. I wonder if we have the same issue as well. But not sure if we use IPAM in the bootstrap cluster (yet).

@lubronzhan
Copy link
Contributor

lubronzhan commented Nov 17, 2023

I remember this was taken care of by setting the move order or pausing the resources inside our own product or in CAPI. but i think it should be part of clusterctl I assume.
cc @christianang could you confirm? Thank you

@mcbenjemaa
Copy link
Member Author

Yeah, probably the IPAddressClaim has no pausing.
and the order should also be re-considered.

@christianang
Copy link
Contributor

In the incluster IPAM provider the IP Address Claims will be paused if the IP Pool is paused or if the cluster is paused. There was a PR to make the cluster pause reading more consistent here: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#167 (this change should be in v0.1.0-alpha.3 based on commit history).

If that isn't working and you are using the latest version of the incluster provider then perhaps there is a bug in the provider, I'll have to try to reproduce the issue myself and see what might be going on.

@adobley
Copy link

adobley commented Nov 17, 2023

We added a change in the cluster-api-ipam-provider to respect the paused status of the parent cluster, but it needs the cluster annotation on the claim object to find the owning cluster.
kubernetes-sigs/cluster-api-ipam-provider-in-cluster#105

That work was done for CAPV but not any other infrastructure providers, as far as I know.
kubernetes-sigs/cluster-api-provider-vsphere#1857

What infrastructure provider are you using?
Do you have an example of one of your claims? Does it have the cluster annotation?

@mcbenjemaa
Copy link
Member Author

mcbenjemaa commented Nov 22, 2023

well, the clusterName annotation doesn't work either for me,

@mcbenjemaa
Copy link
Member Author

mcbenjemaa commented Nov 30, 2023

still failing: kubernetes-sigs/cluster-api-ipam-provider-in-cluster#207 (comment)

The IPAddresses are not paused while the cluster-name annotation is there.

@adobley
Copy link

adobley commented Dec 13, 2023

Hi @mcbenjemaa!

Sorry for the delay getting back on this. I mentioned the annotation but it looks like I was wrong. This got changed to Labels at some point. I was thinking of the old behavior. See: kubernetes-sigs/cluster-api-provider-vsphere@ba50ef2

Reading through the claim reconciliation code it looks like it is relying on the label for cluster name: https://github.com/kubernetes-sigs/cluster-api-ipam-provider-in-cluster/blob/b8ccdf8c362cd51ee120e50f8ec2b637f02e3755/internal/controllers/ipaddressclaim.go#L180

@robwittman
Copy link

I was able to move my cluster (Proxmox w/ IPAM) by doing the following. On the destination cluster:

# Disable the IPAM deployment
kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system 
# Delete the mutating and validating webhooks 
kubectl get mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration > mutatingwebhook.yaml
kubectl delete mutatingwebhookconfiguration caip-in-cluster-mutating-webhook-configuration
kubectl get validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration > webhook.yaml
kubectl delete validatingwebhookconfiguration caip-in-cluster-validating-webhook-configuration

I then executed the clusterctl move command, which was able to create all the required resources on the destination cluster. The last few delete steps still failed for me on the source cluster, but that cluster is ephemeral anyways.

After the destination cluster was updated, revert the few steps above to re-enable the IPAM controller

kubectl edit deploy/caip-in-cluster-controller-manager -n caip-in-cluster-system 
kubectl apply -f mutatingwebhook.yaml
kubectl apply -f webhook.yaml

Then I resumed the cluster reconciliation on destination, all of the Proxmox / IPAM resources synced, and both the workload cluster in CAPI, and the Proxmox cluster itself were all green.

YMMV: I'm not sure if I'll see any unintended side effects by disabling some of the webhooks, but I'll be working through an upgrade of the management cluster in the next few weeks.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 8, 2024
@chrischdi chrischdi added the area/ipam Issues or PRs related to ipam label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Issues or PRs related to ipam help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.