Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Expose VIP in the specified VLAN #1762

Closed
Tracked by #2134
janeczku opened this issue Jan 3, 2022 · 21 comments
Closed
Tracked by #2134

[Feature] Expose VIP in the specified VLAN #1762

janeczku opened this issue Jan 3, 2022 · 21 comments
Assignees
Labels
area/load-balancer area/network kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR require-ui/small estimate 1-2 working days
Milestone

Comments

@janeczku
Copy link
Contributor

janeczku commented Jan 3, 2022

Harvester 1.0.0
Rancher 2.6.3

Describe the bug

  1. Install Harvester on servers with two network interfaces, harvester-mgmt and harvester-vlan
  2. Configure the "Default Network Interface name of the VLAN network" in Harvester -> Settings and select harvester-vlan
  3. Create a VLAN network in Harvester, e.g. VLAN ID 65
  4. Configure a VIP pool with a network range within VLAN 65 in Harvester -> Settings, e.g. cidr-default: 10.65.0.0/24
  5. Create RKE2 guest cluster, selecting the VLAN network 65 and default namespace
  6. Create a service type loadbalancer with IPAM->Pool

Expected Result:

Loadbalancer VIP is reachable from clients on the VLAN 65 network.

Actual Result:

Loadbalancer VIP is not reachable from the VLAN 65 network. The root cause appears that Kube-VIP is announcing the VIP on the wrong interface:

kubectl logs -n harvester-system kube-vip-9z9g4
level=info msg="Broadcasting ARP update for 10.65.0.172 (3c:ec:ef:2d:32:e4) via harvester-mgmt"
@janeczku janeczku added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Jan 3, 2022
@janeczku
Copy link
Contributor Author

janeczku commented Jan 3, 2022

Related discussion #1239 (comment)

@guangbochen guangbochen added this to the v1.0.1 milestone Jan 4, 2022
@guangbochen guangbochen added the priority/0 Must be fixed in this release label Jan 4, 2022
@yaocw2020
Copy link
Contributor

By version 1.0.0, the VIPs are only reachable directly in the LAN of the harvester hosts., not across VLAN, so you need to add the routes to prove the reachability between the VLAN network and the LAN of the harvester hosts.

@janeczku
Copy link
Contributor Author

janeczku commented Jan 5, 2022

@yaocw2020 We are not trying to reach VIP across VLAN. We just expect the VIP to be reachable from within the VLAN attached to the VMs/cluster. There is no other local network besides the harvester-mgmt network. We are following recommendation of using a trunk port on the switch.

Could you please describe the Harvester configuration / network architecture required to announce the VIP on a given VLAN available on the switch ?

@yaocw2020
Copy link
Contributor

yaocw2020 commented Jan 5, 2022

The kube-vip sends the gratuitous arp without a VLAN tag from the VIP interface. The arp broadcasts in the LAN of the harvester hosts, not in any VLAN network because it has no VLAN tag, which means the VIPs are only reachable directly in the LAN of the harvester hosts. This seems to not be related to the network architecture.

@janeczku
Copy link
Contributor Author

janeczku commented Jan 7, 2022

@yaocw2020
So what is the solution to make available the VIP in the guest cluster VLAN network ? Announcing VIP in the management network is not compatible with enterprise requirements for network segmentation.

I made a test configuring a VIP pool in the harvester-mgmt network range and this resulted in unstable connections: #1790

@janeczku
Copy link
Contributor Author

janeczku commented Jan 7, 2022

Additional information:

Loadbalancer VIP is not reachable from other hosts in the VLAN network:

ubuntu:~$ ping 10.65.0.226
PING 10.65.0.226 (10.65.0.226) 56(84) bytes of data.
^C
--- 10.65.0.226 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3055ms

ubuntu:~$ arping 10.65.0.226
ARPING 10.65.0.226 from 10.65.0.186 enp1s0
^CSent 5 probes (5 broadcast(s))
Received 0 response(s)

@abonillabeeche
Copy link

@yaocw2020 it sounds like this request may add complexity to kube-vip and may? need a separate/kube-vip infrastructure to support this use case which is absolutely necessary.
ie, an rke2 cluster is deployed on network vlan65, and should solely expose access to that app via the LB on that vlan.

@yaocw2020
Copy link
Contributor

@yaocw2020 it sounds like this request may add complexity to kube-vip and may? need a separate/kube-vip infrastructure to support this use case which is absolutely necessary. ie, an rke2 cluster is deployed on network vlan65, and should solely expose access to that app via the LB on that vlan.

How about adding the kube-vip into the rancher marketplace or embedding it into rke2? The kube-vip deployed in the guester cluster will allow users to epose the app via the LB on the VLAN without the route from harvester host to VLAN network.

@abonillabeeche
Copy link

Does this belong into https://github.com/harvester/load-balancer-harvester ?

@abonillabeeche
Copy link

@yaocw2020 it sounds like this request may add complexity to kube-vip and may? need a separate/kube-vip infrastructure to support this use case which is absolutely necessary. ie, an rke2 cluster is deployed on network vlan65, and should solely expose access to that app via the LB on that vlan.

How about adding the kube-vip into the rancher marketplace or embedding it into rke2? The kube-vip deployed in the guester cluster will allow users to epose the app via the LB on the VLAN without the route from harvester host to VLAN network.

Unfortunately, that approach moves us away from the "Cloud Provider" experience.

@yaocw2020
Copy link
Contributor

yaocw2020 commented Jan 12, 2022

Because the address of the Harvester load balancer is exposed by the arp broadcasting in the LAN of harvester hosts, if we want to access it from the VLAN network, we should prove the network path Client in the VLAN network --> Harvester hosts --> guest cluster in the VLAN network. It may be out of the scope of the Harvester to configure the network of the client and the guest cluster.

@yaocw2020 yaocw2020 changed the title [BUG] Guest cluster VIP is not working [Feature] Expose VIP in the specified VLAN Jan 20, 2022
@yaocw2020 yaocw2020 added area/load-balancer kind/enhancement Issues that improve or augment existing functionality and removed kind/bug Issues that are defects reported by users or that we know have reached a real release labels Jan 20, 2022
@yaocw2020
Copy link
Contributor

yaocw2020 commented Jan 20, 2022

@janeczku @abonillabeeche I convert this issue to a feature instead of a bug.

@harvesterhci-io-github-bot

Pre Ready-For-Testing Checklist

  • If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
    The HEP PR is at:

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at:

    • Does the PR include the explanation for the fix or the feature?

    • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
      The PR for the YAML change is at:
      The PR for the chart change is at:

  • If labeled: area/ui Has the UI issue filed or ready to be merged?
    The UI issue/PR is at:

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
    The documentation/KB PR is at:

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#852

@irishgordo irishgordo self-assigned this Jun 8, 2023
@irishgordo
Copy link

validation issue
@yaocw2020
I'm noticing with v1.2.0rc-1 and v2.7.4 Rancher that the Load Balancer seems to be hung in a pending state.

Screenshot from 2023-06-08 17-58-06

And what would be the best test plan to follow for the validation?

@TachunLin
Copy link

TachunLin commented Jun 9, 2023

Test Plan

After discussion, we come out the following test plan

Test Path

  • For external Rancher integration
  • For embedded Rancher integration

Test environment

  • Prepare an external VLAN network, can retrieve external IP address (e.g vlan1216, 10.84.99.x)
  • Create external vlan network

Test steps (External Rancher)

  1. Import Harvester from external Rancher
  2. Create cloud credential in Rancher
  3. Provision an RKE2 guest cluster rke2 and use vlan network vlan1216
  4. Update the ui-dashboard-index to https://harvester-dev.oss-cn-hangzhou.aliyuncs.com/support-mcm-mode/index.html in Rancher global setting
  5. Set the source to External
  6. Access Harvester in Rancher virtualization management
  7. Create an IP pool in Harvester
  8. In the selector page, select the Project, Namespace to the correct value and rke2 in guest kubernetes cluster
  9. Access RKE2 guest cluster in Rancher
  10. Create a nginx deployments
  11. Create a load balancer in Services
  12. Select Pool as the ipam mode
  13. Check the load balancer status content in yaml
  14. Get the allocated VIP address information of load balancer
  15. Create another VM vm2 on Harvester using the same vlan network vlan1216
  16. ssh to vm2
  17. Ping the VIP address

Test steps (Embedded Rancher)

  1. Enable rancher-manager-support in settings
  2. Create a cloud credential and select local
  3. Provision an RKE2 guest cluster rke2 and use vlan network vlan1216
  4. Create an IP pool in Harvester
  5. In the selector page, select the rke2 in guest kubernetes cluster
  6. Access RKE2 guest cluster in Rancher
  7. Create a load balancer in Services
  8. Select Pool as the ipam mode
  9. Check the load balancer status content in yaml
  10. Get the allocated VIP address information of load balancer
  11. Create another VM vm2 on Harvester using the same vlan network vlan1216
  12. ssh to vm2
  13. Ping the VIP address

@TachunLin
Copy link

TachunLin commented Jun 12, 2023

  • When we select Project Default, Namespace All , the corresponding Kubernetes display list did not display accordingly
    (In this case, we should expect the Guest kubernetes cluster to display None)
    image
    image

  • And when we select the correct Project, Namespace and kubernetest cluster
    image

  • The load balancer on Rancher would keep in Pending state
    image

  • We need to set project, namespace and kubernetes to All to make the load balancer works

Test Information

Suggest we can enhance the select display relationship filter and the bind mechanism.
After discussed, move to implement for the follow up release candidate testing

@iosifnicolae2
Copy link

iosifnicolae2 commented Jun 21, 2023

We also need this functionality.

This is our use-case:

  • we have one NIC with two interfaces (mgmt-br has internet access and mgmt-br.4000 is the internal network)
    by default Harvester creates a default route to mgmt-br.4000 -> everything works great, but we don’t have internet access
  • in order to have internet access, I’ve configured DHCP on mgmt-br and the I’ve created a route: default via 144.76.X.X dev mgmt-br proto dhcp

The problem:

we’re getting this error in harvester pod: Failed to dial steve aggregation server: dial tcp 10.53.44.90:443: i/o timeout
kube-vip allocates the VIP on mgmt-br instead of mgmt-br.4000 (#1762)

Possible cause:

updating the default route to another interface breaks harvester networking
Any ideas on how can we have internet access on the node?
unfortunetly, it’s not possible to have internet on mgmt-br.4000 due to some limitations of Hetzner

What I've tried:

  • configure kube-vip by specifing .spec.values.kube-vip.env.vip_interface = mgmt-br.4000 - the VIP is deployed on the mgmt-br interface (most likely the interface is picked based on the default IP route) - it doesn't work

Context

  • the internal ip of the node is identified incorrectly, most likely flannel uses the first interface to identify the node internal ip and by doing so it's getting a public ip
kubectl get node -o wide
NAME                 STATUS   ROLES                       AGE   VERSION           INTERNAL-IP      EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION                 CONTAINER-RUNTIME
dedicated-server-1   Ready    control-plane,etcd,master   46h   v1.24.11+rke2r1   144.76.X.X   <none>        Harvester v1.1.2   5.14.21-150400.24.60-default   containerd://1.6.15-k3s1

Solution

We've managed to fix the issue by adding:

    - "content": |
        node-ip: 10.1.1.10
        node-external-ip: 144.76.XX.XX
      "encoding": ""
      "group": 0
      "owner": 0
      "ownerstring": ""
      "path": "/etc/rancher/rke2/config.yaml.d/90-harvester-network.yaml"
      "permissions": 384

@yaocw2020
Copy link
Contributor

@iosifnicolae2 You are right. Modifying the default route will destroy the whole harvester network.
In your case, I suggest using another NIC to access the outside network.
Moreover, if you want to access the load balancer IP in the guest cluster, we do an enhancement in the harvester v1.2.0 where you can create a guest RKE2 cluster in other accessible VLAN(including VLAN 1) and the load balance IP will be exposed in the same VLAN network.

@TachunLin
Copy link

TachunLin commented Jun 28, 2023

Verified fixed on v1.2.0-rc2 with Rancher v2.7.5-rc5. Close this issue.

Result

  • We can create load balancer using Pool mode for Harvester IP pool and running correctly.
    image

  • The load balancer can retrieve VIP on the external vlan network and can be routed from other vm host.
    image

  • The correct value of project, namespace, network and cluster can be added to the load balancer yaml content

    apiVersion: loadbalancer.harvesterhci.io/v1beta1
    kind: LoadBalancer
    metadata:
      annotations:
        loadbalancer.harvesterhci.io/cluster: rke2-v1265
        loadbalancer.harvesterhci.io/namespace: default
        loadbalancer.harvesterhci.io/network: default/vlan1216
        loadbalancer.harvesterhci.io/project: local/p-z7l54
      creationTimestamp: '2023-06-28T08:34:49Z'
      finalizers:
        - wrangler.cattle.io/harvester-lb-controller
      generation: 2
      labels:
        cloudprovider.harvesterhci.io/cluster: rke2-v1265
        cloudprovider.harvesterhci.io/serviceName: lb-pool3
        cloudprovider.harvesterhci.io/serviceNamespace: default
     ...
    status:
      address: 10.84.99.1
      allocatedAddress:
        gateway: 10.84.99.254
        ip: 10.84.99.1
        ipPool: pool1
        mask: 255.255.255.0
      conditions:
        - lastUpdateTime: '2023-06-28T08:34:49Z'
          status: 'True'
          type: Ready
    
  • When we select Project: Default, Namespace All , the corresponding Kubernetes guest cluster can be filtered to the correct value None
    image

Test Information

  • Test Environment: 4 nodes Harvester bare machines
  • Harvester version: v1.2.0-rc2
  • Rancher version: v2.7.5-rc5
  • cloud-provider version: 0.2.0

Verify Steps

  1. Import Harvester from external Rancher

  2. Create cloud credential in Rancher

  3. Provision an RKE2 guest cluster rke2 and use vlan network vlan1216

  4. Update the ui-dashboard-index to https://harvester-dev.oss-cn-hangzhou.aliyuncs.com/support-mcm-mode/index.html in Rancher global setting

  5. Set the ui-offline-preferred to Remote

  6. Access Harvester in Rancher virtualization management

  7. Create an IP pool in Harvester

  8. In the selector page, select the Project, Namespace and Guest kubernetes cluster to the correct value

  9. Select the correct vlan network in Network field (e.g default/vlan1216)
    image

  10. Access RKE2 guest cluster in Rancher

  11. Ensure the cloud provider version is greater than 0.2.0

  12. If not, please follow below section to update cloud provider to the latest version

  13. Create a nginx deployments

  14. Create a load balancer in Services

  15. Select Pool as the ipam mode

  16. Check the load balancer status content in yaml

  17. Get the allocated VIP address information of load balancer

  18. Create another VM vm2 on Harvester using the same vlan network vlan1216

  19. ssh to vm2

  20. Ping the VIP address

Steps to update cloud-provider version

  1. Access rke2 cluster in Rancher
  2. Get the kubeconfig of the rke2 guest cluster
  3. Copy the content and replace in your local kubeconfig location (e.g ~/.kube/config)
  4. Add harvester helm repo
helm repo add harvester https://charts.harvesterhci.io/
  1. Update helm repo
helm repo update
  1. Update cloud provider version (replace cluster name with your rke2 guest cluster)
helm upgrade harvester-cloud-provider harvester/harvester-cloud-provider -n kube-system --set cloudConfigPath=/var/lib/rancher/rke2/etc/config-files/cloud-provider-config --set global.cattle.clusterName=<cluster-name>
  1. Go to Workload -> Deployments

  2. Remove the previous version of cloud provider
    image

  3. Wait until the new cloud provider is in Active state
    image

Additional Context

Please ensure cloud provider version >= 0.2.0

@LucasSaintarbor
Copy link
Contributor

Hi @yaocw2020 @n313893254 @TachunLin, just following up on any issue with a require/doc label for the 1.2.0 release. Does this issue require any of the following?

  • New documentation/KB
  • Updated documentation/KB
  • Release note
  • None of the above

If it does, may you please open a PR or share details on what needs to be created or updated?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/load-balancer area/network kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR require-ui/small estimate 1-2 working days
Projects
None yet
Development

No branches or pull requests