Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using different network for loadbalancer #1558

Closed
ayetkin opened this issue May 18, 2023 · 5 comments
Closed

Using different network for loadbalancer #1558

ayetkin opened this issue May 18, 2023 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ayetkin
Copy link

ayetkin commented May 18, 2023

/kind bug

What steps did you take and what happened:
In our OpenStack environment, networks assigned to projects are announced with BGP, so we have direct access to the network we defined for loadbalancers and instances. Therefore, in the OpenStackMachineTemplate.spec.template.spec.networks section, we give the network we announced with the current BGP with the network filter so that we can directly access the nodes without bastion host or giving floating ip. We also give the same network in the OpenStackCluster.spec.network section. In the setup we have done in this way, the VIP that the loadbalancer receives is unfortunately not announced with BGP. We asked this issue to octavia and neutron by opening the following issues.

Octavia: https://storyboard.openstack.org/#!/story/2010758
Neutron: https://bugs.launchpad.net/neutron/+bug/2020001

When we want to overcome this problem on the Cluster Api side, we giving a FIP network that we announced from L3 to the edges for the loadbalancer and setting the disableAPIServerFloatingIP: true , but at this point the loadbalancer is created, but when adding the first control plane instance as a member, we get the following error.

Reconciling load balancer member failed: error create lbmember: Misssing input for argument [Address]

When we made different experiments, we saw that the OpenStackCluster.spec.network and the network filter in the OpenStackMachineTemplate.spec.template.spec.networks are not the same, we get this error every time.

We also tried to ensure that the loadbalancer VIP ip we experienced by Octavia and Neutron received a FIP from the external network to the BGP announcement problem, but at this stage, we got the following error.

External network 47906ae8-fb7f-4817-91db-7272174296ac is not reachable from subnet 3fcf3df4-0884-4af5-be8b-e38627afd3f5. Therefore, cannot associate Port 1eea4f1e-83da-4f56-bc1e-869e0ca09f08 with a Floating IP. Neutron server returns request_ids: ['req-ce9bd175-7740-4d1f-94ad-651898a0decd']

It is obvious that this error says that the BGP network and external network we gave to the instances and loadbalancer on the openstack side are inaccessible, btw when we connect the two networks to each other through a router, we think that the FIP assigned to the loadbalancer is still inaccessible with the asymmetric route.

What did you expect to happen:
When two different networks are given to the instances and the loadbalancer on the CAPI Openstack Provider side,
We think that the Misssing input for argument [Address] error is a coding-related bug.

OpenStackCluster

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackCluster
metadata:
  name: devops-k8s-test
spec:
  apiServerLoadBalancer:
    enabled: true
  cloudName: openstack
  dnsNameservers:
    - x.x.x.x
    - y.y.y.y
  externalNetworkId: xxxxxxxx-xxxxx-xxxxx-xxxxxx-xxxxx  # admin-fip-provider-net-01
  identityRef:
    kind: Secret
    name: "devops-k8s-test-cloud-config"
  managedSecurityGroups: false
  disableAPIServerFloatingIP: true
  network:
    name: admin-fip-provider-net-01

OpenStackMachineTemplate

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackMachineTemplate
metadata:
  name: devops-k8s-test-control-plane
spec:
  template:
    spec:
      cloudName: openstack
      flavor: capi-controlplane-default
      identityRef:
        kind: Secret
        name: devops-k8s-test-cloud-config
      image: ubuntu-2004-kube-v1.25.8
      sshKeyName: iacops
      securityGroups:
        - name: default
      networks:
        - filter:
            name: devopstest-k8s-net
      rootVolume:
        diskSize: 60
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackMachineTemplate
metadata:
  name: devops-k8s-test-worker-pool
spec:
  template:
    spec:
      cloudName: openstack
      flavor: capi-worker-small
      identityRef:
        kind: Secret
        name: devops-k8s-test-cloud-config
      image: ubuntu-2004-kube-v1.25.8
      sshKeyName: iacops
      securityGroups:
        - name: default
      networks:
        - filter:
            name: devopstest-k8s-net
      rootVolume:
        diskSize: 60

Environment:

  • Cluster API Provider OpenStack version: v0.7.1

  • Cluster-API version: v1.4.2

  • OpenStack version: Wallaby (cluster installed via kolla-ansible)

  • Kubernetes version (use kubectl version): v1.25.6

  • OS (e.g. from /etc/os-release): Ubuntu 2004

  • OS Version: Ubuntu 20.04.2 LTS Hosts. (Kernel:5.4.0-90-generic)

  • Octavia Version: Wallaby - 8.0.1.dev35

  • Neutron Version: 18.1.2.dev118 [“neutron-server”, “neutron-dhcp-agent”, “neutron-openvswitch-agent”, “neutron-l3-agent”, “neutron-bgp-dragent”, “neutron-metadata-agent”]

  • There exist 5 controller+network node.

  • OpenvSwitch used in DVR mode and router HA is disabled. (l3_ha = false)

  • We are using a single centralized neutron router for connecting all tenant networks to provider network.

  • We are using bgp_dragent to announce unique tenant networks.

  • Tenant network type: vxlan

  • External network type: vlan

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 18, 2023
@mdbooth
Copy link
Contributor

mdbooth commented May 18, 2023

A quick look at the code suggests that

Reconciling load balancer member failed: error create lbmember: Misssing input for argument [Address]

is due to

ip := instanceNS.IP(openStackCluster.Status.Network.Name)
loadbalancerService, err := loadbalancer.NewService(scope)
:

    ip := instanceNS.IP(openStackCluster.Status.Network.Name)

Here we are assuming that:

  • The server has an IP address on the cluster network
  • This IP address is the one we will use for the LB member

It looks like your workers don't have an interface on this network, though, which is why this is breaking. Can you think of a way to do what you need within these constraints?

I would like to improve this situation, btw, but it's going to require architecture changes, possibly even outside of the scope of just CAPO (i.e. a platform-independent API loadbalancer provider). This is an awesome write up which I will try to ensure is re-used when we're looking at use cases, but at first glance I don't think we're going to be able to fix this quickly.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants