-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(load-balancer): ignore nodes that don't use known provider IDs #530
Comments
This is not supported right now, I would actually expect HCCM to remove any non-Hetzner Cloud nodes from your cluster. We plan to add support for Robot servers, please subscribe to #523 if you are interested in this. |
Next to Hetzner's root servers we also include (non hetzner) edge servers in our cluster that run in different datacenters. Naturally we wouldn't want HCCM to remove these nodes. HCCM in it's current state it works fine, but we currently require to add the Would be easier if HHCM could just ignore nodes that don't have a valid |
What version of HCCM are you running? HCCM does remove any nodes that it can not associate with known servers. Or rather, In general, I would recommend you talk to |
I'm running the latest version (v1.18). What part of the codebase is responsible for removing the unassociated nodes? |
That would be the This in turn calls hcloud-cloud-controller-manager/hcloud/instances.go Lines 74 to 84 in 8775196
Which tries to lookup the Server in Hetzner Cloud API by ID or by Name: hcloud-cloud-controller-manager/hcloud/instances.go Lines 49 to 72 in 8775196
|
I see. Nodes are only removed when the lookup process returns no errors. Given that all our servers are configured with a ProviderID (if non is provided K3S uses a default btw), the error message Now, circling back to the initial issue: we aim for a seamless integration with the load balancer without the need for the // Extract HC server IDs of all K8S nodes assigned to the K8S cluster.
for _, node := range nodes {
id, err := providerIDToServerID(node.Spec.ProviderID)
if err != nil {
// return changed, fmt.Errorf("%s: %w", op, err)
continue
}
k8sNodeIDs[id] = true
k8sNodeNames[id] = node.Name
} This change ensures that nodes without a valid Hetzner Cloud ID are simply skipped, rather than causing the entire process to fail. It's a more graceful way to handle such scenarios and provides better flexibility for mixed-node environments like ours. I genuinely believe this adjustment will not only benefit us but also other users who might have similar hybrid setups. It's about enhancing the adaptability of HCCM without compromising its core principles. |
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs. |
v1.19.0 added support for Robot servers in HCCM, so users that only want to have Hetzner Cloud & Robot servers in the same cluster, check out Clusters with Robot Servers.
Please talk to sig-cloud-provider about this. The current behavior in regards to non-Hetzner servers is not guaranteed and comes from an upstream Kubernetes library. Making the changes you suggested sounds good for two reasons:
Especially now that we have Events in place, this will be properly communicate to cluster operators. The code around this changed a bit, I would suggest the following changes (as well as tests to verify them): // Extract HC server IDs of all K8S nodes assigned to the K8S cluster.
for _, node := range nodes {
id, isCloudServer, err := providerid.ToServerID(node.Spec.ProviderID)
if err != nil {
+ if errors.Is(providerid.UnknownPrefixError) {
+ // ProviderID has unknown prefix, cluster might have non-hccm nodes that can not be added to the
+ // Load Balancer. Emitting an event and ignoring that Node in this reconciliation loop.
+ l.Recorder.Eventf(node, corev1.EventTypeWarning, "UnknownProviderIDPrefix", "Node could not be added to Load Balancer for service %s because the provider ID does not match any known format", svc.Name))
+ continue
+ }
return changed, fmt.Errorf("%s: %w", op, err)
}
if isCloudServer {
k8sNodeIDsHCloud[id] = true
} else {
k8sNodeIDsRobot[int(id)] = true
}
k8sNodes[id] = node
} In addition we need to modify package providerid
struct UnknownPrefixError {
ProviderID string
}
func (e *UnknownPrefixError) Error() string {
return fmt.Sprintf("providerID does not have one of the the expected prefixes (%s, %s, %s): %s", prefixCloud, prefixRobot, prefixRobotLegacy, e.ProviderID)
} And update default:
+ return 0, false, &UnknownPrefixError{ providerID }
- return 0, false, fmt.Errorf("providerID does not have one of the the expected prefixes (%s, %s, %s): %s", prefixCloud, prefixRobot, prefixRobotLegacy, providerID)
} |
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs. |
@apricote Sorry, I seemed to have missed your reply.
I'm a bit confused; the changes in question are not in the upstream library right? I'm willing to work on a PR to get this in. |
There are two issues here:
|
Thanks for clarifying. I did some digging and it seems this is already being discussed (kubernetes/cloud-provider#35) upstream. I'll try to make some time in the weekend / evening hours to work on a PR for the LB part. |
I'm running a cluster with cloud and baremetel nodes. I've disabled networking, but still want to use support for load balancers. Currently if an node doesn't have a valid
hcloud://
provider-id theEnsureLoadBalancer
method early returns on this line:https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/51b83ecf12a1673f35b7d8f35ee0659c3e2d59bb/internal/hcops/load_balancer.go#L599C11-L599C18
Instead of early returning couldn't we just continue and exclude the node from selection?
The text was updated successfully, but these errors were encountered: