Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load balancer behavior changed for unmanaged nodes #112313

Open
yangl900 opened this issue Sep 8, 2022 · 16 comments
Open

Load balancer behavior changed for unmanaged nodes #112313

yangl900 opened this issue Sep 8, 2022 · 16 comments
Assignees
Labels
area/provider/azure Issues or PRs related to azure provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.

Comments

@yangl900
Copy link
Contributor

yangl900 commented Sep 8, 2022

What happened?

Before commit aecaac0 the LB behavior for unmanaged nodes (labeled with kubernetes.azure.com/managed: false) is that:

  • cloud provider will ignore the VMSS and VMSS VMs when ensure hosts in pool
  • if the VMSS already joined LB backend pool, it will remain in the pool

So basically unmanaged nodes are ignored from cloud provider operation. This makes sense because it is "unmanaged". However, starting from commit aecaac0, we will reconcile the unmanaged VMs and remove them from LB.

I believe the original intention is to remove nodes with "node.kubernetes.io/exclude-from-external-load-balancers" label, however, the ShouldNodeExcludedFromLoadBalancer() include both unmanaged nodes and exclude-lb nodes.

This caused a behavior change from v1.20.15 because the commit was cherry-picked into 1.20 branch. I don't think we should force unmanaged nodes gets removed from LB. They are "unmanaged", so why we manage them?

What did you expect to happen?

Nodes with label kubernetes.azure.com/managed: false will not be forcefully removed from LB backends.

How can we reproduce it (as minimally and precisely as possible)?

Label a node with kubernetes.azure.com/managed: false and see it's getting removed from LB

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here
v1.20.15

Cloud provider

Azure

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Ubuntu 18.04

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@yangl900 yangl900 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 8, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 8, 2022
@yangl900
Copy link
Contributor Author

yangl900 commented Sep 8, 2022

/sig cloud-provider
/area provider/azure

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/provider/azure Issues or PRs related to azure provider and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 8, 2022
@yangl900
Copy link
Contributor Author

yangl900 commented Sep 8, 2022

@feiskyer @nilo19 could you please take look if my understanding is correct?

@Harsh0018511
Copy link

Hi @yangl900
I am a newbie in this domain but wanted to learn this topic and work on this issue. So, if possible could you please direct me to some useful resources so that I can learn and work on this issue.
/assign

@feiskyer
Copy link
Member

We made that change because the LoadBalancers (including both the External LB and Internal LB) are fully managed by cloud provider. Refer the documents on https://cloud-provider-azure.sigs.k8s.io/topics/cross-resource-group-nodes/, "unmanaged nodes will not be part of the load balancer managed by cloud provider".

If putting those nodes into cloud provider managed LBs is required, is it possible moving them to managed nodes?

@yangl900
Copy link
Contributor Author

@feiskyer I don't think there is is any technical blocker for a "unmanaged" cloud node to join load balancer. There are legitimate reasons that some node cannot be managed by AKS. For example, if someone wants to manage their own OS images, using unmanaged nodes is the only option.

@yangl900
Copy link
Contributor Author

btw @feiskyer I understand LB is managed by cloud provide, and I don't expect LB to reconcile the backend pool for these "unmanaged" nodes. But if these "unmanaged" or "externally managed" nodes were configured to be the backend pool externally, why cloud provider force removing them?

For self-hosted k8s, there is actually a workaround. If we configure the cloud provider to say "LB is pre-configured", then it will not manage the LB at all. But this workaround won't work for AKS.

@nckturner
Copy link
Contributor

/assign @bridgetkromhout

@nckturner
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 12, 2022
@yangl900
Copy link
Contributor Author

yangl900 commented Dec 7, 2022

Hi @bridgetkromhout - any update for this issue? This blocks people from using "unmanged" or self-managed nodes.

@bridgetkromhout
Copy link
Member

I'm asking @feiskyer to weigh in - but as far as I can tell, the best option for self-managed would be CAPZ - see https://capz.sigs.k8s.io/ for details.

@bridgetkromhout
Copy link
Member

Clarity as to when this bug was fixed: kubernetes-sigs/cloud-provider-azure#851

@feiskyer
Copy link
Member

The original commit was introduced because it would make the LB management much easier. There are two problems here to allow unmanaged Node in LB:

  • The biggest issue of unmanaged Nodes are the mappings between unmanaged Node name and the LB backend pool ipconfigurations. Without the correct mappings, we don't know the ownership of LB ipconfigurations.
  • NotReady Nodes would be removed from LB backend pool per cloud-provider/controllers/service/controller.go, while this controller is cloud agnostic, it knows nothing about Azure unmanaged Nodes here.

As suggested above, it it possible to use CAPZ for self-managed cluster?

@yywandb
Copy link

yywandb commented Jan 9, 2023

hi @feiskyer , I’m working with @yangl900 on this.

Our use case is we have an AKS cluster (managed by CAPZ) and we want to BYON to that cluster so that we can use our custom image. CAPZ does not currently offer this functionality. The issue linked above, CAPZ #826, seems to try to address this.

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Projects
None yet
Development

No branches or pull requests

8 participants