Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Node Controller optional #156

Closed
displague opened this issue Mar 16, 2021 · 11 comments
Closed

Make the Node Controller optional #156

displague opened this issue Mar 16, 2021 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@displague
Copy link
Member

In some settings, the LoadBalancer helper functionality of CPEM is desired while the Node labeling is not.

  • Introduce a flag to disable the Node controller.
  • The Service controller must be allowed to function independently of any labels, node spec values, or annotations that the Node controller would produce.

Originally from equinix/terraform-equinix-metal-anthos-on-baremetal#56 (comment)

@displague
Copy link
Member Author

displague commented Apr 17, 2021

@deitch - we've been talking about node annotations to propagate information needed for the LoadBalancer (BGP settings), this issue might be affected by that. Presumably we have existing dependencies that this issue would have to work out.

If the Service controller depends on the Node controller, we can't offer a toggle to turn off the node controller (or stop setting the providerID, specifically).

@deitch
Copy link
Contributor

deitch commented Apr 18, 2021

I don't understand this. Why do we want to disable the node controller, and node labeling? The idea of the node controller being distinct from the services one is something we constructed internally, but they do go together.

What is the need?

@displague
Copy link
Member Author

displague commented Apr 18, 2021

@deitch In Anthos, a baremetal CCM node controller wants to take this responsibility, but it can't if the node providerId has been assumed by another CCM node controller.

equinix/terraform-equinix-metal-anthos-on-baremetal#56 (comment) discusses the need, this idea was based on your suggestion :-)

@deitch
Copy link
Contributor

deitch commented Apr 18, 2021

Oh yes, now I remember. Just because I suggested it doesn't mean I would have any memory of it.

I didn't love the approach, but it seemed the only possible one (short of Anthos actually working nicely with official cloud provider CCMs).

What precisely would we want CCM to do and not to do?

@displague
Copy link
Member Author

What precisely would we want CCM to do and not to do?

That's the question. If we made the node controller optional (a simple flag that keeps the node controller out of the manager), what would break? I think those are the issues we need to solve for.


We may have to assume that with the node controller functionality intentionally disabled, conventional node annotations (https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type) could be the responsibility of another CCM. Perhaps in this case, EM uses an alternative label/annotation name to identify the instance-type, topology, etc.

We probably shouldn't guess about what a competing CCM wants to manage. We may need more information here.


One of the challenges would be in providing BGP information to the cluster.

Referring to the CCM list of responsibilities:

* Node controller - responsible for updating kubernetes nodes using cloud APIs and deleting kubernetes nodes that were deleted on your cloud.
* Service controller - responsible for loadbalancers on your cloud against services of type LoadBalancer.
* Route controller - responsible for setting up network routes on your cloud
* any other features you would like to implement if you are running an out-of-tree provider.

I wonder if there is lexical wiggle room to migrate node annotation functionality to an "any other features" controller, or perhaps use the "router controller" to serve this purpose (I don't know what facilities this controller is expected to offer and if this would be a good fit).

Perhaps Metadata controller - responsible for updating kubernetes nodes (secrets, and/or configmaps) with BGP and IP configuration and secrets discovered through Equinix Metal metadata.

The metadata service is not available (for now) without public addresses, so this may not be a great solution. We can't even assume 1 node in the cluster would have public addresses. Then again, the EM API functionality in this CCM is broken without public addresses, so layer2 only is not a supported workflow.

Perhaps BGP controller - responsible for updating kubernetes nodes (annotations, and secrets, and/or configmaps) with BGP configuration and secrets discovered through the Equinix Metal API.

@displague
Copy link
Member Author

displague commented Apr 18, 2021

Maybe we can wait this problem out :-) equinix/terraform-equinix-metal-anthos-on-baremetal#54 (comment)

(I think we should still figure this out, in the meantime, since it may be related to the BGP annotations and will help us keep SoC and independence in our controllers)

@deitch
Copy link
Contributor

deitch commented Apr 22, 2021

This also came up in the CAPP/CPEM upgrade discussion (cc @detiber ). There are two distinct conversations going on here:

  • enabling some flexibility around what the provider ID should be. I instinctively dislike this, but I recognize that it isn't all that hard to do, and doesn't go against the internal design of CP. It simply moves it from hard-coding to default+options
  • re-architecting the various controllers, what each one does, what is required vs optional, etc.

Anthos, IIRC, has had a bit of a hard time with this, and actually ended up copying and modifying their own versions of each CSP's CCM. That is not a route I would want to go down; I would sooner work with whichever SIG managed cloud-provider and see if we can standardize these capabilities.

We should be open to being more flexible than the official CP standards, as long as we don't actually go against it.

If we can come up with a better design, all for it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 17, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants