New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure LB Availability Set assumptions too restrictive #97375
Azure LB Availability Set assumptions too restrictive #97375
Comments
/triage accepted |
/assign |
Notes on OpenShift's Azure infrastructure in ARM templates here, in case that helps clarify the infra we use today. Those are OpenShift 4.6 docs, and 4.6 is based on Kubernetes 1.19, and things worked fine. The trouble we've been having when we took the same Azure-infra approach and tried to use it with Kubernetes 1.20, after openshift#471. |
@wking could you share an example of resourceID (please replace your subscription/resourceName with fake ones)? |
Example CI run here from the NIC-renaming openshift/installer#4490. Installer logs here include:
I'm hoping that the subscription name is not sensitive, because we've been dumping them in public logs for a long time now 🤞. Should be IDs for any other resources you're interested in in those logs as well. |
Looks like this is about the master node. Are you using standard LB or a basic one? For standard LB, the master node would be excluded from the LB by default. |
Are you referring to the built-in logic of Kube to never target master nodes with a service type LoadBalancer? |
That used to be a problem (#65618), but my understanding is that it has since been fixed. |
Can we re-open this? kubernetes-sigs/cloud-provider-azure#443 was not sufficient to get the cloud-provider happy on our infra, which again, contains no Availability Sets. There are some more notes about our infrastructure here (with discussion in rhbz#1794839), in case that helps. |
@staebler, who has a better grasp on this than me, also just opened #97467. Sounds like fixing that broader issue might obsolete this ticket, because "the cloud-provider expects Availability Sets which OpenShift infra doesn't have in order to figure out which nodes to remove" doesn't matter if the cloud-provider isn't removing any OpenShift nodes in the first place. |
What happened: #96111 introduced new dependency on Availability Sets which is too restrictive for some existing installations: https://github.com/kubernetes/kubernetes/pull/96111/files#diff-0414c3aba906b2c0cdb2f09da32bd45c6bf1df71cbb2fc55950743c99a4a5fe4R1071
What you expected to happen: Azure cloud provider to successfully manage LBs, like it had been doing before. Instead, it died with:
How to reproduce it (as minimally and precisely as possible): Fails every time in OpenShift CI. Can you point us to documentation around the infrastructure options that the Azure cloud provider expects for LB management? https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#load-balancer-selection-modes doesn't say anything about NIC names, and it doesn't have specifics around what things should look like without Availability Sets, or whether "without Availability Sets" is not supported (when previous iterations of the cloud provider had no problem with our lack of Availability Sets).
Documenting cloud-provider assumptions about host hardware would have also helped avoid #97352; CC @nilo19
/sig cloud-provider
/area provider/azure
The text was updated successfully, but these errors were encountered: