Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor EIP handling #230

Merged
merged 1 commit into from
Mar 21, 2022
Merged

Refactor EIP handling #230

merged 1 commit into from
Mar 21, 2022

Conversation

detiber
Copy link
Member

@detiber detiber commented Mar 1, 2022

  • Reconcile Service/endpoint updates separately
  • Provide option to perform health check against non-EIP External Address for the assigned control plane node
  • Attempt to pro-actively migrate EIP in the case of node deletion or node becomes unschedulable

These changes have been running as part of an automated test suite against my fork of cluster-api-provider-packet. Prior to these changes the flake frequency of the tests involving upgrade workflows or control plane node remediation was much higher.

A test deployment with these changes can be found https://raw.githubusercontent.com/detiber/packet-ccm/test/deploy/template/deployment.yaml, which uses a pre-built image with these changes: quay.io/detiber/cloud-provider-equinix-metal:dev

I suspect there are still some improvements that can be made to improve handling of the EIP, since there still seem to be a few edge cases where the EIP takes a while to be migrated, but overall these changes seem to provide quite an improvement over the previous handling.

Dockerfile_builder Outdated Show resolved Hide resolved
metal/bgp.go Outdated Show resolved Hide resolved
metal/cloud.go Outdated Show resolved Hide resolved
metal/config.go Outdated Show resolved Hide resolved
metal/devices.go Outdated Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Outdated Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Show resolved Hide resolved
Comment on lines 650 to 543
for _, a := range node.Status.Addresses {
// Find the non EIP external address for the node to use for the health check
if a.Type == v1.NodeExternalIP && a.Address != controlPlaneEndpoint.Address {
controlPlaneHealthURL = fmt.Sprintf("https://%s:%d", a.Address, m.nodeAPIServerPort)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ClusterAPI, we are currently setting the EIP address as a loopback address on all control plane nodes to workaround the bootstrapping issue of the Service/Endpoints not yet being set, so kubeadm init doesn't fail.

This was causing an issue with the health checks not actually checking the proper endpoints in some circumstances...

@detiber
Copy link
Member Author

detiber commented Mar 2, 2022

Rebased now that #231 has been merged, but this PR also includes a fix from #232 as well.

@deitch
Copy link
Contributor

deitch commented Mar 3, 2022

Heh, a lot smaller now.

I will hold off until we resolve #232 and then we can address this one.

@deitch
Copy link
Contributor

deitch commented Mar 7, 2022

The merger of #232 should make this even smaller post-rebase?

@detiber detiber changed the title [WIP] Refactor EIP handling Refactor EIP handling Mar 7, 2022
@detiber
Copy link
Member Author

detiber commented Mar 7, 2022

@deitch rebased on latest.

@deitch
Copy link
Contributor

deitch commented Mar 8, 2022

@detiber I get most of it - looks good - except for the core. The existing one depends on reconcileNodes() being called once in a while (e.g. 30 seconds), and then checking the health of the EIP. If it failed, it then went on to check the control plane node IPs and move the EIP around, as needed.

In this proposed one, it appears to change that. It only checks the health in doHealthCheck(), which only is called by UpdateFunc:, and it only checks if the node on which UpdateFunc was called actually currently has the EIP assigned.

How does this regularly check the status of the EIP and update it if necessary? I get that it would work when UpdateFunc is called, but why would it update it with any regularity?

@deitch
Copy link
Contributor

deitch commented Mar 8, 2022

Also, it doesn't look like it checks the EIP address, only the node's?

@davidspek
Copy link

davidspek commented Mar 9, 2022

I was just running through another test of my PR that adds kube-vip support to the cluster-api provider and I noticed that the Equinix Metal CCM is assigning the EIP to one of the control plane nodes. However, I don't the think the CCM should be doing this. When using kube-vip to load-balance the control plane the EIP shouldn't be assigned to one of the nodes as far as I know. Also, when using the Cluster API the CCM won't be available on time since the EIP for the control plane must be accessible during the kubeadm bootstrapping. Expanding on that, the CCM won't actually be available until the user deploys a CNI.

So I believe there are 3 routes that could be taken here.

  1. The CCM takes care of deploying kube-vip somehow. However, this might be difficult to do at the right time during cluster bootstrapping.
  2. The CCM stops assigning the control plane EIP to one of the control plane nodes.
  3. We provide a way to disable the CCM from assigning the control plane EIP to one of the control plane nodes. Maybe we could do this by changing the tag on the EIP?

Edit:
It seems as though it is easily possible to not have the CCM take control of the control plane EIP, which would be the route to take when using kube-vip with the cluster API.

@deitch
Copy link
Contributor

deitch commented Mar 9, 2022

I was just running through another test of detiber/cluster-api-provider-packet#65 that adds kube-vip support to the cluster-api provider and I noticed that the Equinix Metal CCM is assigning the EIP to one of the control plane nodes. However, I don't the think the CCM should be doing this.

It only does this if you pass it the right tag for the EIP. If you do not, CCM ignores it. That is the flag that says, "you should control the EIP" or "you should ignore it".

if you are using kibe-vip in its mode of managing the control plane EIP, then the CCM should not have the tag set.

@davidspek
Copy link

@deitch Thanks for the explanation, I just found that as well and updated my PR to reflect the change.

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

@detiber I get most of it - looks good - except for the core. The existing one depends on reconcileNodes() being called once in a while (e.g. 30 seconds), and then checking the health of the EIP. If it failed, it then went on to check the control plane node IPs and move the EIP around, as needed.

In this proposed one, it appears to change that. It only checks the health in doHealthCheck(), which only is called by UpdateFunc:, and it only checks if the node on which UpdateFunc was called actually currently has the EIP assigned.

How does this regularly check the status of the EIP and update it if necessary? I get that it would work when UpdateFunc is called, but why would it update it with any regularity?

Because the resync interval on the informer is set, every time a resync happens UpdateFunc is called for each of the Nodes, so that keeps us doing a health check at most roughly every minute, which is approximately the same as the old timer loop happening every 60 seconds.

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

Also, it doesn't look like it checks the EIP address, only the node's?

Yeah, that was the bit that I suspected to get some push back on... In the case of Cluster API, we are configuring the EIP as a local loopback address: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/main/templates/cluster-template.yaml#L30-L36, this was needed to work around how kubeadm expects the address to be resolvable when bootstrapping (since the external service/endpoints would not be configured yet, since CPEM is not yet deployed).

This resulted in issues during upgrade, where the local etcd instance is removed from the cluster prior to the machine being deleted, which would result in the local apiserver being unresponsive. Not necessarily an issue if CPEM is running on the machine being deleted, but becomes an issue when it's one of the control plane nodes that isn't running CPEM that is also currently assigned the EIP. CPEM is attempting to health check the EIP, which resolves to localhost and passes, so doesn't attempt to re-assign the EIP.

@deitch
Copy link
Contributor

deitch commented Mar 10, 2022

Because the resync interval on the informer is set, every time a resync happens UpdateFunc is called for each of the Nodes, so that keeps us doing a health check at most roughly every minute, which is approximately the same as the old timer loop happening every 60 seconds.

Ah, I didn't realize that. So basically every update-interval for resync it automatically will call the health check? Then that makes sense.

@deitch
Copy link
Contributor

deitch commented Mar 10, 2022

Yeah, that was the bit that I suspected to get some push back on... In the case of Cluster API, we are configuring the EIP as a local loopback address: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/main/templates/cluster-template.yaml#L30-L36, this was needed to work around how kubeadm expects the address to be resolvable when bootstrapping (since the external service/endpoints would not be configured yet, since CPEM is not yet deployed).

Oh yes, I have suffered through that. I do wish kubeadm was a bit more configurable in that regards.

CPEM is attempting to health check the EIP, which resolves to localhost and passes, so doesn't attempt to re-assign the EIP.

So the reason you did it is so that we won't have a situation where the health check in CPEM thinks everything is healthy - because it is checking the EIP, which is on local loopback, but really it isn't OK when coming from the Internet?

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

Yeah, that was the bit that I suspected to get some push back on... In the case of Cluster API, we are configuring the EIP as a local loopback address: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/main/templates/cluster-template.yaml#L30-L36, this was needed to work around how kubeadm expects the address to be resolvable when bootstrapping (since the external service/endpoints would not be configured yet, since CPEM is not yet deployed).

Oh yes, I have suffered through that. I do wish kubeadm was a bit more configurable in that regards.

CPEM is attempting to health check the EIP, which resolves to localhost and passes, so doesn't attempt to re-assign the EIP.

So the reason you did it is so that we won't have a situation where the health check in CPEM thinks everything is healthy - because it is checking the EIP, which is on local loopback, but really it isn't OK when coming from the Internet?

Exactly

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

@deitch Do you think it would make sense to make using the EIP URL or not a configurable option?

I'm not sure how others may be using CPEM outside of Cluster API to know if it's a broader issue or just a localized one.

@deitch
Copy link
Contributor

deitch commented Mar 10, 2022

I don't understand.

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

@deitch rather than just avoiding the use of the EIP automatically (like this PR is doing) for health checks, or would it make sense to make that behavior opt-in, with the default continuing to use the EIP url for health checks?

@deitch
Copy link
Contributor

deitch commented Mar 10, 2022

Would it make sense to split the conversation? This PR is changing two things simultaneously: how we invoke health checks; what health checks test.

Does this PR have any value doing just the first, and then having a separate one focused just on the second? Or must it go together?

@deitch
Copy link
Contributor

deitch commented Mar 10, 2022

@deitch rather than just avoiding the use of the EIP automatically (like this PR is doing) for health checks, or would it make sense to make that behavior opt-in, with the default continuing to use the EIP url for health checks?

Now that I get it, yes, I think so. My concern is lack of consistent EIP behaviour by CPEM, but that can be acceptable I think.

@detiber
Copy link
Member Author

detiber commented Mar 10, 2022

Would it make sense to split the conversation? This PR is changing two things simultaneously: how we invoke health checks; what health checks test.

Does this PR have any value doing just the first, and then having a separate one focused just on the second? Or must it go together?

Both are really needed in order to unblock the CAPI v1 support for CAPP, so I'm not sure there is much value in breaking it up.

That said, more than happy to break it up if that is your preference.

@detiber
Copy link
Member Author

detiber commented Mar 14, 2022

This should be ready to go.

@deitch
Copy link
Contributor

deitch commented Mar 16, 2022

Reviewing now. @detiber can you add the new option env var to the README?

metal/config.go Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Show resolved Hide resolved
metal/eip_controlplane_reconciliation.go Outdated Show resolved Hide resolved
Comment on lines 103 to 120
FilterFunc: func(obj interface{}) bool {
n, _ := obj.(*v1.Node)
return m.nodeFilter(n)
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the apiServerPort might not be set the first time a node is added, will this work the next time? Is the FilterFunc called on every sync?

Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather nicely done, and I think simplifies a lot.

I mostly have some questions, README requests, and some small changes

@detiber
Copy link
Member Author

detiber commented Mar 17, 2022

I still need to address the README updates, but did quite a bit of cleanup based on the other comments.

Working on validating that the Server Side Apply changes didn't break anything, though 😬

@detiber
Copy link
Member Author

detiber commented Mar 17, 2022

Server side apply changes seem to be working good, will pick back up on the README changes tomorrow.

@deitch
Copy link
Contributor

deitch commented Mar 18, 2022

This all looks solid. Just needs the README updated and that nodeFitler fixed, and we should be good to go.

@detiber
Copy link
Member Author

detiber commented Mar 18, 2022

@deitch README is updated and we are no longer fitlering anything.

I also squashed down the commits and cleaned up the commit message.

@detiber detiber self-assigned this Mar 18, 2022
Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good, other than the single bug I think I might have found.

Comment here when that is patched, and I can run another series of tests on it.

metal/eip_controlplane_reconciliation.go Outdated Show resolved Hide resolved
- Reconcile Service/endpoint updates separately
- Provide option to perform health check against non-EIP External Address for the assigned control plane node
- Attempt to pro-actively migrate EIP in the case of node deletion or node becomes unschedulable

Signed-off-by: Jason DeTiberus <detiber@users.noreply.github.com>
Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's let CI go green. If you are comfortable with it @detiber, then once green, we can merge in.

@detiber
Copy link
Member Author

detiber commented Mar 21, 2022

@deitch sgtm, I've been testing this code pretty regularly from an EIP perspective with the automated testing I've been doing for CAPP

@deitch deitch merged commit f8c11af into kubernetes-sigs:master Mar 21, 2022
@deitch
Copy link
Contributor

deitch commented Mar 21, 2022

Really nice on this one, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants