Load balancer behavior changed for unmanaged nodes #112313

yangl900 · 2022-09-08T08:05:06Z

What happened?

Before commit aecaac0 the LB behavior for unmanaged nodes (labeled with kubernetes.azure.com/managed: false) is that:

cloud provider will ignore the VMSS and VMSS VMs when ensure hosts in pool
if the VMSS already joined LB backend pool, it will remain in the pool

So basically unmanaged nodes are ignored from cloud provider operation. This makes sense because it is "unmanaged". However, starting from commit aecaac0, we will reconcile the unmanaged VMs and remove them from LB.

I believe the original intention is to remove nodes with "node.kubernetes.io/exclude-from-external-load-balancers" label, however, the ShouldNodeExcludedFromLoadBalancer() include both unmanaged nodes and exclude-lb nodes.

This caused a behavior change from v1.20.15 because the commit was cherry-picked into 1.20 branch. I don't think we should force unmanaged nodes gets removed from LB. They are "unmanaged", so why we manage them?

What did you expect to happen?

Nodes with label kubernetes.azure.com/managed: false will not be forcefully removed from LB backends.

How can we reproduce it (as minimally and precisely as possible)?

Label a node with kubernetes.azure.com/managed: false and see it's getting removed from LB

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

v1.20.15

Cloud provider

Azure

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Ubuntu 18.04

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

yangl900 · 2022-09-08T08:06:57Z

/sig cloud-provider
/area provider/azure

yangl900 · 2022-09-08T08:09:32Z

@feiskyer @nilo19 could you please take look if my understanding is correct?

Harsh0018511 · 2022-09-12T12:49:46Z

Hi @yangl900
I am a newbie in this domain but wanted to learn this topic and work on this issue. So, if possible could you please direct me to some useful resources so that I can learn and work on this issue.
/assign

feiskyer · 2022-09-14T00:23:44Z

We made that change because the LoadBalancers (including both the External LB and Internal LB) are fully managed by cloud provider. Refer the documents on https://cloud-provider-azure.sigs.k8s.io/topics/cross-resource-group-nodes/, "unmanaged nodes will not be part of the load balancer managed by cloud provider".

If putting those nodes into cloud provider managed LBs is required, is it possible moving them to managed nodes?

yangl900 · 2022-09-27T06:16:12Z

@feiskyer I don't think there is is any technical blocker for a "unmanaged" cloud node to join load balancer. There are legitimate reasons that some node cannot be managed by AKS. For example, if someone wants to manage their own OS images, using unmanaged nodes is the only option.

yangl900 · 2022-09-28T06:23:39Z

btw @feiskyer I understand LB is managed by cloud provide, and I don't expect LB to reconcile the backend pool for these "unmanaged" nodes. But if these "unmanaged" or "externally managed" nodes were configured to be the backend pool externally, why cloud provider force removing them?

For self-hosted k8s, there is actually a workaround. If we configure the cloud provider to say "LB is pre-configured", then it will not manage the LB at all. But this workaround won't work for AKS.

nckturner · 2022-10-12T16:15:14Z

/assign @bridgetkromhout

nckturner · 2022-10-12T16:19:38Z

/triage accepted

yangl900 · 2022-12-07T06:36:03Z

Hi @bridgetkromhout - any update for this issue? This blocks people from using "unmanged" or self-managed nodes.

bridgetkromhout · 2022-12-20T18:07:27Z

I'm asking @feiskyer to weigh in - but as far as I can tell, the best option for self-managed would be CAPZ - see https://capz.sigs.k8s.io/ for details.

bridgetkromhout · 2022-12-20T18:23:23Z

Clarity as to when this bug was fixed: kubernetes-sigs/cloud-provider-azure#851

feiskyer · 2022-12-21T00:38:51Z

The original commit was introduced because it would make the LB management much easier. There are two problems here to allow unmanaged Node in LB:

The biggest issue of unmanaged Nodes are the mappings between unmanaged Node name and the LB backend pool ipconfigurations. Without the correct mappings, we don't know the ownership of LB ipconfigurations.
NotReady Nodes would be removed from LB backend pool per cloud-provider/controllers/service/controller.go, while this controller is cloud agnostic, it knows nothing about Azure unmanaged Nodes here.

As suggested above, it it possible to use CAPZ for self-managed cluster?

yywandb · 2023-01-09T18:40:15Z

hi @feiskyer , I’m working with @yangl900 on this.

Our use case is we have an AKS cluster (managed by CAPZ) and we want to BYON to that cluster so that we can use our custom image. CAPZ does not currently offer this functionality. The issue linked above, CAPZ #826, seems to try to address this.

k8s-triage-robot · 2024-01-20T04:11:49Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot · 2024-04-19T04:26:46Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-05-19T04:52:22Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

yangl900 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 8, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 8, 2022

k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/provider/azure Issues or PRs related to azure provider and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 8, 2022

k8s-ci-robot assigned Harsh0018511 Sep 12, 2022

k8s-ci-robot assigned bridgetkromhout Oct 12, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 12, 2022

bridgetkromhout mentioned this issue Dec 21, 2022

Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) kubernetes-sigs/cluster-api-provider-azure#826

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balancer behavior changed for unmanaged nodes #112313

Load balancer behavior changed for unmanaged nodes #112313

yangl900 commented Sep 8, 2022

yangl900 commented Sep 8, 2022

yangl900 commented Sep 8, 2022

Harsh0018511 commented Sep 12, 2022

feiskyer commented Sep 14, 2022

yangl900 commented Sep 27, 2022

yangl900 commented Sep 28, 2022

nckturner commented Oct 12, 2022

nckturner commented Oct 12, 2022

yangl900 commented Dec 7, 2022

bridgetkromhout commented Dec 20, 2022

bridgetkromhout commented Dec 20, 2022

feiskyer commented Dec 21, 2022

yywandb commented Jan 9, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Apr 19, 2024

k8s-triage-robot commented May 19, 2024

Load balancer behavior changed for unmanaged nodes #112313

Load balancer behavior changed for unmanaged nodes #112313

Comments

yangl900 commented Sep 8, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

yangl900 commented Sep 8, 2022

yangl900 commented Sep 8, 2022

Harsh0018511 commented Sep 12, 2022

feiskyer commented Sep 14, 2022

yangl900 commented Sep 27, 2022

yangl900 commented Sep 28, 2022

nckturner commented Oct 12, 2022

nckturner commented Oct 12, 2022

yangl900 commented Dec 7, 2022

bridgetkromhout commented Dec 20, 2022

bridgetkromhout commented Dec 20, 2022

feiskyer commented Dec 21, 2022

yywandb commented Jan 9, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Apr 19, 2024

k8s-triage-robot commented May 19, 2024