Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse instance-groups for internal load balancers #313

Closed
wants to merge 1 commit into from

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Jan 26, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Based on docs for internal loadbalancer here 1, backend services 2 and instances in instance-groups 3, following restrictions apply,

  • Internal LB can load balance to VMs in same region, but different subnets
  • Instance groups for the backend service must contain instance of the same subnet
  • An instance can only belong to one load balanced instance group
  • It is probably useful use-case to have nodes for the cluster belong to more than one subnet. And the current setup fails to create an internal load balancer with nodes in multiple subnets.
I1023 22:05:24.070949       1 gce_loadbalancer_internal.go:478] ensureInternalInstanceGroup(k8s-ig--27083f8254ed83c2, us-west1-b): adding nodes: [jstuev-5hzjp-m-1.c.openshift-dev-installer.internal jstuev-5hzjp-w-b-54qkc.c.openshift-dev-installer.internal]
E1023 22:05:25.385077       1 gce_loadbalancer.go:156] Failed to EnsureLoadBalancer(jstuev-5hzjp, openshift-ingress, router-default, a79c0f796db9e4157af2e2658433b3f6, us-west1), err: googleapi: Error 400: Resource 'projects/openshift-dev-installer/zones/us-west1-b/instances/jstuev-5hzjp-w-b-54qkc' is expected to be in the subnetwork 'projects/openshift-dev-installer/regions/us-west1/subnetworks/jstubyo-master-subnet' but is in the subnetwork 'projects/openshift-dev-installer/regions/us-west1/subnetworks/jstubyo-worker-subnet'., wrongSubnetwork

Also the use-case that some of these nodes (machines) might be part of some internal load balancer that is not managed by k8s is also pretty valid.
for example, you might have the machines hosting the control-plane (kube-apiserver) want to be part of a separate ILB that provides access to the apiserver through LB not managed by the k8s Service type Load Balancer.
But the current setup fails to create an interal load balancer like

r: failed to ensure load balancer: googleapi: Error 400: INSTANCE_IN_MULTIPLE_LOAD_BALANCED_IGS - Validation failed for instance 'projects/openshift-dev-installer/zones/us-west1-a/instances/jstuev-t285j-m-0': instance may belong to at most one load-balanced instance group.

So the subnet limitation should be automatically handled by the k8s cloud provider, but for now allowing users to create the IGs for instances that require this special setup should definietly help, and k8s cloud provider can just use those as-is, while maintaining the membership and lifecycle for ones created by it.

This change finds pre-existing instance-groups that ONLY contain instances that belong to the cluster, uses them for the backend service. And only ensures instance-groups for remaining ones.

Does this PR introduce a user-facing change?:

Reuse pre-existing instance groups that contain cluster nodes for internal load balancers's backend

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

The work is based on kubernetes/kubernetes#84466

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 26, 2022
@Fedosin Fedosin force-pushed the internal_lb branch 4 times, most recently from e556732 to 9f6ed06 Compare January 28, 2022 15:58
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2022
Based on docs for internal loadbalancer here [1], backend services [2]
and instances in instance-groups [3], following restrictions apply:

- Internal LB can load balance to VMs in same region, but different
  subnets
- Instance groups for the backend service must contain instance of
  the same subnet
- An instance can only belong to one load balanced instance group
- It is probably useful use-case to have nodes for the cluster belong
  to more than one subnet. And the current setup fails to create an
  internal load balancer with nodes in multiple subnets.

This change finds pre-existing instance-groups that ONLY contain
instances that belong to the cluster, uses them for the backend
service. And only ensures instance-groups for remaining ones.

[1] https://cloud.google.com/load-balancing/docs/internal
[2] https://cloud.google.com/load-balancing/docs/backend-service#restrictions_and_guidance
[3] https://cloud.google.com/compute/docs/instance-groups/creating-groups-of-unmanaged-instances#addinstances

Co-authored-by: Abhinav Dahiya <abhinav.dahiya@redhat.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 28, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fedosin
To complete the pull request process, please assign saad-ali after the PR has been reviewed.
You can assign the PR to them by writing /assign @saad-ali in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Fedosin Fedosin closed this Apr 19, 2022
@jprzychodzen
Copy link
Contributor

@Fedosin Is it still needed? If yes, could you reopen?

@JoelSpeed
Copy link

@Fedosin I believe this bug is still relevant? Can we get it reopened please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants