Delay in processing HNS LB policies on kube-proxy start on Windows nodes results in unreachable services #109162
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/network
Categorizes an issue or PR as relevant to SIG Network.
sig/windows
Categorizes an issue or PR as relevant to SIG Windows.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
What happened?
When starting windows nodes with a high number of HNS LB policies/rules on the cluster, there is a delay in processing them. This leaves services unreachable during the delay, which takes about half a minute per policy. This can be substatial given enough rules.
This occurs when restarting kube-proxy and rebooting the host. Once the system does reach a state where all the policylists are processed, incremental updates to the services are handled fine (ie. endpoint changes).
What did you expect to happen?
HNS policies should not cause a large delay for Windows nodes.
How can we reproduce it (as minimally and precisely as possible)?
With a large number of HNS policies in place, restart kube-proxy on a Windows node.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: