Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Existing load balancer is not looked up correctly #85805

Open
dkistner opened this issue Dec 2, 2019 · 2 comments

Comments

@dkistner
Copy link

@dkistner dkistner commented Dec 2, 2019

What happened:
I have an Azure Kubernetes cluster (v1.16.3) and I want to make use of an existing load balancer which is located in another resource group as the other cluster resources.

I use the in-tree k8s cloud-controller-manager and configured it with the following flags:

/hyperkube cloud-controller-manager
  --cloud-provider=azure
  --cloud-config=<path-to-cloud-provider-config>
  ...

The passed cloud provider config contains the following fields:

cloud: AZUREPUBLICCLOUD
aadClientId: "<client-id>"
aadClientSecret: "<client-secret>"
tenantId: "<tenant-id>"
subscriptionId: "<subscription-id>"
location: "westeurope"
resourceGroup: "<cluster-resource-group>"
loadBalancerSku: "standard"
loadBalancerName: "testlb"
loadBalancerResourceGroup: "<external-resource-group>"
...

The existing load balancer has standard sku and is deployed into the same subscription and region as the other cluster resources. The load balancer has one frontend ip configuration assigned and no backend pool configured. The nodes of the cluster are plane machines (no vmss, not assigned to an availability set) and distributed accross availability zones. The configured user has the sufficient privileges to access the resource group which contains the load balancer.

If I create a service of type LoadBalancer in this cluster, the public ip creation stay in pending state forever.
The logs of the cloud-controller-manager contain this log entry with code InvalidResourceReference for the service creation. It seems that azure-provider does not react correctly on the value of loadBalancerResourceGroup in the cloud provider config.

This is the log entry of the cloud controller manager:

E1202 15:09:32.896802       1 service_controller.go:255] error processing service <namespace>/<lb-service-name> (will retry): failed to ensure load balancer: Code="InvalidResourceReference" Message="Resource /subscriptions/<subscription-id>/resourceGroups/<CLUSTER-RESOURCE-GROUP>/providers/Microsoft.Network/loadBalancers/TESTLB referenced by resource /subscriptions/<subscription-id>/resourceGroups/<external-resource-group>/providers/Microsoft.Network/loadBalancers/testlb was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[{"code":"NotFound","message":"Resource /subscriptions/<subscription-id>/resourceGroups/<CLUSTER-RESOURCE-GROUP>/providers/Microsoft.Network/loadBalancers/TESTLB not found."}]

If I pass an existing load balancer in the cluster resource group it seems to work.

What you expected to happen:
Azure cloud provider lookup existing load balancer in the correct resource group.

Environment:

  • Kubernetes version (use kubectl version): v1.16.3
  • Cloud provider or hardware configuration: azure

/sig cloud-provider
/area provider/azure

@denniszielke

This comment has been minimized.

Copy link

@denniszielke denniszielke commented Dec 3, 2019

There seems we have a bug here in the cloud provider.
In this setup here I have the Cluster in CLUSTER_RG and the LoadBalancer in LB_RG with
/etc/kubernetes/azure.json

    "loadBalancerResourceGroup": "LB_RG",
    "loadBalancerName": "standardload",
    "resourceGroup": "CLUSTER_RG",

When provisioning a service we can see in the requests that are sent for the ARM network RP that the values for the outboundrules are calculated correctly:

{
  "outboundRules": [
    {
      "id": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/LB_RG/providers/Microsoft.Network/loadBalancers/standardload/outboundRules/outbound"
    }
  ]
}

However we can see that the value for loadbalancingrules are calculated incorrectly and do not respect the value for the loadBalancerResourceGroup and are set to CLUSTER_RG

{
  "loadBalancingRules": [
    {
      "name": "a06d540585dc74c988c16ae6d198de23-TCP-80",
      "properties": {
        "frontendIPConfiguration": {
          "id": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/CLUSTER_RG/providers/Microsoft.Network/loadBalancers/standardload/frontendIPConfigurations/a06d540585dc74c988c16ae6d198de23"
        },
        "backendAddressPool": {
          "id": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/CLUSTER_RG/providers/Microsoft.Network/loadBalancers/standardload/backendAddressPools/CLUSTER_RG"
        },
        "probe": {
          "id": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/CLUSTER_RG/providers/Microsoft.Network/loadBalancers/standardload/probes/a06d540585dc74c988c16ae6d198de23-TCP-80"
        },
        "protocol": "Tcp",
        "loadDistribution": "Default",
        "frontendPort": 80,
        "backendPort": 80,
        "enableFloatingIP": true,
        "enableTcpReset": true,
        "disableOutboundSnat": false
      }
    }
  ]
}

@feiskyer

This comment has been minimized.

Copy link
Member

@feiskyer feiskyer commented Dec 4, 2019

Thanks for reporting the issue. Confirmed it's a bug from Azure cloud provider.
/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.