Make the Node Controller optional #156

displague · 2021-03-16T13:49:35Z

In some settings, the LoadBalancer helper functionality of CPEM is desired while the Node labeling is not.

Introduce a flag to disable the Node controller.
The Service controller must be allowed to function independently of any labels, node spec values, or annotations that the Node controller would produce.

Originally from equinix/terraform-equinix-metal-anthos-on-baremetal#56 (comment)

displague · 2021-04-17T05:59:43Z

@deitch - we've been talking about node annotations to propagate information needed for the LoadBalancer (BGP settings), this issue might be affected by that. Presumably we have existing dependencies that this issue would have to work out.

If the Service controller depends on the Node controller, we can't offer a toggle to turn off the node controller (or stop setting the providerID, specifically).

deitch · 2021-04-18T10:34:35Z

I don't understand this. Why do we want to disable the node controller, and node labeling? The idea of the node controller being distinct from the services one is something we constructed internally, but they do go together.

What is the need?

displague · 2021-04-18T11:22:47Z

@deitch In Anthos, a baremetal CCM node controller wants to take this responsibility, but it can't if the node providerId has been assumed by another CCM node controller.

equinix/terraform-equinix-metal-anthos-on-baremetal#56 (comment) discusses the need, this idea was based on your suggestion :-)

deitch · 2021-04-18T11:33:35Z

Oh yes, now I remember. Just because I suggested it doesn't mean I would have any memory of it.

I didn't love the approach, but it seemed the only possible one (short of Anthos actually working nicely with official cloud provider CCMs).

What precisely would we want CCM to do and not to do?

displague · 2021-04-18T12:26:12Z

What precisely would we want CCM to do and not to do?

That's the question. If we made the node controller optional (a simple flag that keeps the node controller out of the manager), what would break? I think those are the issues we need to solve for.

We may have to assume that with the node controller functionality intentionally disabled, conventional node annotations (https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type) could be the responsibility of another CCM. Perhaps in this case, EM uses an alternative label/annotation name to identify the instance-type, topology, etc.

We probably shouldn't guess about what a competing CCM wants to manage. We may need more information here.

One of the challenges would be in providing BGP information to the cluster.

Referring to the CCM list of responsibilities:

* Node controller - responsible for updating kubernetes nodes using cloud APIs and deleting kubernetes nodes that were deleted on your cloud.
* Service controller - responsible for loadbalancers on your cloud against services of type LoadBalancer.
* Route controller - responsible for setting up network routes on your cloud
* any other features you would like to implement if you are running an out-of-tree provider.

I wonder if there is lexical wiggle room to migrate node annotation functionality to an "any other features" controller, or perhaps use the "router controller" to serve this purpose (I don't know what facilities this controller is expected to offer and if this would be a good fit).

Perhaps Metadata controller - responsible for updating kubernetes nodes (secrets, and/or configmaps) with BGP and IP configuration and secrets discovered through Equinix Metal metadata.

The metadata service is not available (for now) without public addresses, so this may not be a great solution. We can't even assume 1 node in the cluster would have public addresses. Then again, the EM API functionality in this CCM is broken without public addresses, so layer2 only is not a supported workflow.

Perhaps BGP controller - responsible for updating kubernetes nodes (annotations, and secrets, and/or configmaps) with BGP configuration and secrets discovered through the Equinix Metal API.

displague · 2021-04-18T12:28:43Z

Maybe we can wait this problem out :-) equinix/terraform-equinix-metal-anthos-on-baremetal#54 (comment)

(I think we should still figure this out, in the meantime, since it may be related to the BGP annotations and will help us keep SoC and independence in our controllers)

deitch · 2021-04-22T11:22:55Z

This also came up in the CAPP/CPEM upgrade discussion (cc @detiber ). There are two distinct conversations going on here:

enabling some flexibility around what the provider ID should be. I instinctively dislike this, but I recognize that it isn't all that hard to do, and doesn't go against the internal design of CP. It simply moves it from hard-coding to default+options
re-architecting the various controllers, what each one does, what is required vs optional, etc.

Anthos, IIRC, has had a bit of a hard time with this, and actually ended up copying and modifying their own versions of each CSP's CCM. That is not a route I would want to go down; I would sooner work with whichever SIG managed cloud-provider and see if we can standardize these capabilities.

We should be open to being more flexible than the official CP standards, as long as we don't actually go against it.

If we can come up with a better design, all for it.

k8s-triage-robot · 2024-01-18T22:59:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-17T23:51:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-19T00:46:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-19T00:46:05Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

displague added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 16, 2021

displague mentioned this issue Mar 16, 2021

Make Cloud Provider Equinix Metal optional equinix/terraform-equinix-metal-anthos-on-baremetal#55

Open

displague mentioned this issue Apr 17, 2021

Update Packet CCM to latest version of Cloud Provider Equinix Metal equinix/terraform-equinix-metal-anthos-on-baremetal#54

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 17, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Node Controller optional #156

Make the Node Controller optional #156

displague commented Mar 16, 2021

displague commented Apr 17, 2021 •

edited

Loading

deitch commented Apr 18, 2021

displague commented Apr 18, 2021 •

edited

Loading

deitch commented Apr 18, 2021

displague commented Apr 18, 2021

displague commented Apr 18, 2021 •

edited

Loading

deitch commented Apr 22, 2021

k8s-triage-robot commented Jan 18, 2024

k8s-triage-robot commented Feb 17, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

Make the Node Controller optional #156

Make the Node Controller optional #156

Comments

displague commented Mar 16, 2021

displague commented Apr 17, 2021 • edited Loading

deitch commented Apr 18, 2021

displague commented Apr 18, 2021 • edited Loading

deitch commented Apr 18, 2021

displague commented Apr 18, 2021

displague commented Apr 18, 2021 • edited Loading

deitch commented Apr 22, 2021

k8s-triage-robot commented Jan 18, 2024

k8s-triage-robot commented Feb 17, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

displague commented Apr 17, 2021 •

edited

Loading

displague commented Apr 18, 2021 •

edited

Loading

displague commented Apr 18, 2021 •

edited

Loading