-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topology Aware Hints fail without explanation #103888
Comments
/sig network |
@aojea I'm using 1.21.3 installed using deb packages from https://apt.kubernetes.io kubernetes-xenial/main using kubeadm. Here is the output of "kubectl version":
output of "kubectl get nodes -o wide" with IPs obscured:
|
Some background: what I'm trying to do is keep traffic directed to my database proxy (maxscale) on the same node as the client. In this scenario there is no concern about load balance because the proxy isn't the bottleneck (it's the underlying database). The default traffic spreading behavior is suboptimal because of the extra network hop and (more importantly) all database clients likely experience failed queries when a single maxscale pod has an unexpected failure---this results in all my clients missing health tests and being restarted no matter how many of them I have. To deal with these issues, I'm trying to recreate what one could do using the topologyKeys option:
It looks like with the new topology aware hints api you have to do this by making each node its own zone as the above config shows. I've also considered using internalTrafficPolicy=local, but if I understand correctly that would result in pods on node X not being able to connect to the database when X's maxscale pod is being redeployed, which could be disruptive. Another option is to bundle maxscale as a sidecar within my pods, but I have a very "dense" setup with many little app pods running on each node. In addition to configuration overheads, the bundling-with-pod approach would result in about 40-50 instances of maxscale running on these 5 nodes instead of just 5. This could work poorly for various reasons (multiplication of database connections, monitoring activity etc). It would be nice if internalTrafficPolicy had a "try-local" setting that was equivalent to the topologyKeys option above. That would be a simple way to do what I want that doesn't involve co-opting zone logic for an unintended purpose (if I proceed with the above setup, I won't be able to use zones in the intended way). thanks for hearing me out! |
Would you use ``` to quota your YAML and logs? It is hard to read.
|
Silly me! I misread "kube-proxy" as "kube-scheduler". I didn't notice this because I didn't get to the point of worrying about kube-proxy. The problem I was reporting above is that the endpointslice controller isn't adding hints to my endpointslice. That's still the case after adding the feature gate to kube-proxy. The only change is that now I have kube-proxy telling me that there are no zone hints on the endpointslice. Here are my relevant manifests, better formatted: The maxscale service:
Endpointslice:
Nodes with their region labels
About the safeguards
About the constraints that could be relevant
|
Thanks for the detailed bug report! I'm testing out some theories and hope to have an update soon. This is awfully close to that 20% "overload threshold", but need to confirm that's what's causing this. |
/assign |
I wrote a test to recreate this but unfortunately the test consistently passed: 621ea5d. Maybe there's something else going on here. Can you increase the log level on kube-controller-manager to see if there are any helpful logs from the EndpointSlice controller? There are a couple places we remove hints and return early in the AddHints function, neither of those seem like they should apply here, but really not sure what else could be happening. Maybe after increasing the log level on kube-controller-manager you can toggle the hints annotation on the Service to trigger some EndpointSlice controller syncs for that Service? |
@robscott I've tried --v=9 and --v=5. This is all I see after toggling the annotation:
That makes it seem like the issue is that it considers the endpointslice not worth updating despite the change in annotation, so I tried creating a new service instead (like the old but with name "maxscale-local"). This is the output at verbosity level 5:
Interestingly, when I redeploy the daemonset I see log entries like this:
I take it that the "insufficient endpoints" condition is triggered by the pods restarting. To recap:
thanks for looking into this! |
I have the same issue, i found that annotations should be with
But it doesn't help, i was digging into code, and found that
nil in my case. I don't understand a calculation behind this method
How amout of cpuRatio per zone depend on the traffic routing?
from doc: |
ping @robscott |
Thanks for the reminder on this one! I've been digging through the code and also don't have a great answer for what's happening here. I think that means we need more logging as a starting. I'll work on adding that. |
/triage accepted |
@dbourget do you mind checking that you indeed enable the kubectl -n kube-system get pod $(kubectl -n kube-system get pod |awk NF=/kube-apiserver/) -o yaml |grep -i topology
kubectl apply -f - <<EOF
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: test
addressType: IPv4
endpoints:
- addresses:
- 10.244.3.49
hints:
forZones:
- name: abc
ports:
- name: ""
port: 80
protocol: TCP
EOF
kubectl get endpointslice.discovery.k8s.io/test -o yaml |grep -A4 hints
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@llhuii thanks for following up, and sorry for the slow response.
That's one line per host except for a host that was rebuilt and doesn't currently have the feature enabled, but did have it back when I was testing this. Second output is:
Does this tell you something interesting? |
@dbourget some fixes were made in v1.23, can you try with that version? I'd also recommend removing |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
I believe this has been fixed, please reopen if not. /close |
@robscott: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
After enabling the topology aware hint feature gate, labeling nodes with zones, adding the annotation, and satisfying all the conditions specified in the documentation, the EndPointSlice Controller does not add any hints to relevant endpoints. Nothing is logged to explain the behavior.
Service:
kube-controller-manager manifest (truncated):
similar for apiserver and kube-scheduler
each node has a label with its own name as zone, like so:
unique endpointslice corresponding to above service:
other notes:
The only potentially relevant log item I see from the kube components is many lines like this from kube-proxy pods:
W0724 00:27:37.725694 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
I hope I'm just doing something wrong.
The text was updated successfully, but these errors were encountered: