Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology Aware Hints fail without explanation #103888

Closed
dbourget opened this issue Jul 24, 2021 · 20 comments
Closed

Topology Aware Hints fail without explanation #103888

dbourget opened this issue Jul 24, 2021 · 20 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@dbourget
Copy link

dbourget commented Jul 24, 2021

After enabling the topology aware hint feature gate, labeling nodes with zones, adding the annotation, and satisfying all the conditions specified in the documentation, the EndPointSlice Controller does not add any hints to relevant endpoints. Nothing is logged to explain the behavior.

Service:

apiVersion: v1
kind: Service
metadata:
  name: maxscale
  annotations:
    service.kubernetes.io/topology-aware-hints: auto
spec:
  type: ClusterIP
  selector:
    app: maxscale
  ports:
    - protocol: TCP
      port: 3308
      targetPort: 3308 
      name: mysql-split
    - protocol: TCP
      port: 3307
      targetPort: 3307 
      name: mysql-slave
    - protocol: TCP
      port: 3306
      name: mysql-master
      targetPort: 3306 

kube-controller-manager manifest (truncated):

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.11.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --port=0
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --use-service-account-credentials=true
    - --feature-gates=TopologyAwareHints=true

....

similar for apiserver and kube-scheduler

each node has a label with its own name as zone, like so:

frege     Ready    control-plane,master   219d   v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes-host=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=frege,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=fast,topology.kubernetes.io/zone=frege

unique endpointslice corresponding to above service:

Name:         maxscale-5pgg4
Namespace:    default
Labels:       endpointslice.kubernetes.io/managed-by=endpointslice-controller.k8s.io
              kubernetes.io/service-name=maxscale
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-07-24T01:28:15Z
AddressType:  IPv4
Ports:
  Name          Port  Protocol
  ----          ----  --------
  mysql-master  3306  TCP
  mysql-split   3308  TCP
  mysql-slave   3307  TCP
Endpoints:
  - Addresses:  10.11.175.69
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-lght2
    NodeName:   quokka
    Zone:       quokka
  - Addresses:  10.11.152.224
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-p6jt2
    NodeName:   papaki
    Zone:       papaki
etc..

other notes:

  • this cluster currently has 5 nodes in total, each with a zone label that is its name as above
  • all nodes have similar allocatable cpus (4 show 32 with "describe node", one 24)
  • the endpoints are members of a daemonset so there are 5 endpoints for 5 nodes
  • after enabling the feature gate in apiserver, kube-scheduler, and controller-manager, I made sure that all 3 pods restarted for each node, but kubelet hasn't been restarted. neither has kube-proxy.

The only potentially relevant log item I see from the kube components is many lines like this from kube-proxy pods:

W0724 00:27:37.725694 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice

I hope I'm just doing something wrong.

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 24, 2021
@aojea
Copy link
Member

aojea commented Jul 25, 2021

/sig network
/cc @robscott
What kubernetes version are you using?

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 25, 2021
@dbourget
Copy link
Author

dbourget commented Jul 25, 2021

@aojea I'm using 1.21.3 installed using deb packages from https://apt.kubernetes.io kubernetes-xenial/main using kubeadm.

Here is the output of "kubectl version":

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

output of "kubectl get nodes -o wide" with IPs obscured:

NAME      STATUS   ROLES                  AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
frege     Ready    control-plane,master   220d   v1.21.3   X   <none>        Ubuntu 20.04.1 LTS   5.4.0-77-generic     docker://20.10.2
papaki    Ready    control-plane,master   26d    v1.21.3   X   <none>        Ubuntu 18.04.5 LTS   4.15.0-147-generic   docker://20.10.2
possum    Ready    control-plane,master   156d   v1.21.3   X   <none>        Ubuntu 18.04.3 LTS   4.15.0-147-generic   docker://20.10.3
quokka    Ready    control-plane,master   32d    v1.21.3   X   <none>        Ubuntu 18.04.5 LTS   4.15.0-147-generic   docker://20.10.7
russell   Ready    control-plane,master   163d   v1.21.3   X   <none>        Ubuntu 20.04 LTS     5.4.0-66-generic     docker://20.10.3

@dbourget
Copy link
Author

dbourget commented Jul 25, 2021

Some background: what I'm trying to do is keep traffic directed to my database proxy (maxscale) on the same node as the client. In this scenario there is no concern about load balance because the proxy isn't the bottleneck (it's the underlying database). The default traffic spreading behavior is suboptimal because of the extra network hop and (more importantly) all database clients likely experience failed queries when a single maxscale pod has an unexpected failure---this results in all my clients missing health tests and being restarted no matter how many of them I have.

To deal with these issues, I'm trying to recreate what one could do using the topologyKeys option:

  topologyKeys: ["kubernetes.io/hostname", "*"]

It looks like with the new topology aware hints api you have to do this by making each node its own zone as the above config shows.

I've also considered using internalTrafficPolicy=local, but if I understand correctly that would result in pods on node X not being able to connect to the database when X's maxscale pod is being redeployed, which could be disruptive.

Another option is to bundle maxscale as a sidecar within my pods, but I have a very "dense" setup with many little app pods running on each node. In addition to configuration overheads, the bundling-with-pod approach would result in about 40-50 instances of maxscale running on these 5 nodes instead of just 5. This could work poorly for various reasons (multiplication of database connections, monitoring activity etc).

It would be nice if internalTrafficPolicy had a "try-local" setting that was equivalent to the topologyKeys option above. That would be a simple way to do what I want that doesn't involve co-opting zone logic for an unintended purpose (if I proceed with the above setup, I won't be able to use zones in the intended way).

thanks for hearing me out!

@pacoxu
Copy link
Member

pacoxu commented Jul 26, 2021

Would you use ``` to quota your YAML and logs? It is hard to read.

  • --use-service-account-credentials=true
  • --feature-gates=TopologyAwareHints=true

....

similar for apiserver and kube-scheduler

https://kubernetes.io/docs/tasks/administer-cluster/enabling-topology-aware-hints/#enable-topology-aware-hints

To enable service topology hints, enable the TopologyAwareHints feature gate for the kube-apiserver, kube-controller-manager, and kube-proxy
Have you configured TopologyAwareHints=true for kube-proxy? You don't mention it in your description.

@dbourget
Copy link
Author

dbourget commented Jul 26, 2021

Silly me! I misread "kube-proxy" as "kube-scheduler". I didn't notice this because I didn't get to the point of worrying about kube-proxy. The problem I was reporting above is that the endpointslice controller isn't adding hints to my endpointslice. That's still the case after adding the feature gate to kube-proxy. The only change is that now I have kube-proxy telling me that there are no zone hints on the endpointslice.

Here are my relevant manifests, better formatted:

The maxscale service:

Name:              maxscale
Namespace:         default
Labels:            <none>
Annotations:       service.kubernetes.io/topology-aware-hints: auto
Selector:          app=maxscale
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.99.175.138
IPs:               10.99.175.138
Port:              mysql-split  3308/TCP
TargetPort:        3308/TCP
Endpoints:         10.11.152.250:3308,10.11.175.119:3308,10.11.191.57:3308 + 2 more...
Port:              mysql-slave  3307/TCP
TargetPort:        3307/TCP
Endpoints:         10.11.152.250:3307,10.11.175.119:3307,10.11.191.57:3307 + 2 more...
Port:              mysql-master  3306/TCP
TargetPort:        3306/TCP
Endpoints:         10.11.152.250:3306,10.11.175.119:3306,10.11.191.57:3306 + 2 more...
Session Affinity:  None
Events:            <none>

Endpointslice:

Name:         maxscale-5pgg4
Namespace:    default  
Labels:       endpointslice.kubernetes.io/managed-by=endpointslice-controller.k8s.io
              kubernetes.io/service-name=maxscale
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-07-24T19:13:25Z
AddressType:  IPv4
Ports:
  Name          Port  Protocol
  ----          ----  --------
  mysql-master  3306  TCP
  mysql-split   3308  TCP
  mysql-slave   3307  TCP
Endpoints:
  - Addresses:  10.11.175.119
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-g7f8r
    NodeName:   quokka
    Zone:       quokka
  - Addresses:  10.11.152.250
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-sfdqb
    NodeName:   papaki
    Zone:       papaki
  - Addresses:  10.11.191.57
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-n8qw5
    NodeName:   possum
    Zone:       possum
  - Addresses:  10.11.211.156
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-56bjv
    NodeName:   frege
    Zone:       frege
  - Addresses:  10.11.58.226
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/maxscale-phttg
    NodeName:   russell
    Zone:       russell
Events:         <none>

Nodes with their region labels

NAME      STATUS   ROLES                  AGE    VERSION   LABELS
frege     Ready    control-plane,master   221d   v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes-host=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=frege,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=fast,topology.kubernetes.io/zone=frege
papaki    Ready    control-plane,master   27d    v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=papaki,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=medium,topology.kubernetes.io/zone=papaki
possum    Ready    control-plane,master   157d   v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=possum,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=slow,topology.kubernetes.io/zone=possum
quokka    Ready    control-plane,master   33d    v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=quokka,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=medium,topology.kubernetes.io/zone=quokka
russell   Ready    control-plane,master   164d   v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datamaster=yes,kubernetes.io/arch=amd64,kubernetes.io/hostname=russell,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,speed=fast,topology.kubernetes.io/zone=russell

About the safeguards

  1. Sufficient number of endpoints. I have 5 for 5 nodes.
  2. Impossible to achieve balanced allocation. 4 nodes have 32 allocable CPUs and one has 24. Could that be it?
  3. One or more Nodes has insufficient information. Nodes all have zones as shown above.
  4. One or more endpoints does not have a zone hint <== this is for kube-proxy. this is what I'm trying to fix.
  5. A zone is not represented in hints <== ditto

About the constraints that could be relevant

  1. Topology Aware Hints are not used when either externalTrafficPolicy or internalTrafficPolicy is set to Local on a Service. <= this is not in use as shown above.
  2. The EndpointSlice controller ignores unready nodes as it calculates the proportions of each zone. nodes are ready

@robscott
Copy link
Member

Thanks for the detailed bug report! I'm testing out some theories and hope to have an update soon. This is awfully close to that 20% "overload threshold", but need to confirm that's what's causing this.

@robscott
Copy link
Member

/assign

robscott added a commit to robscott/kubernetes that referenced this issue Jul 26, 2021
@robscott
Copy link
Member

robscott commented Jul 26, 2021

I wrote a test to recreate this but unfortunately the test consistently passed: 621ea5d. Maybe there's something else going on here. Can you increase the log level on kube-controller-manager to see if there are any helpful logs from the EndpointSlice controller? There are a couple places we remove hints and return early in the AddHints function, neither of those seem like they should apply here, but really not sure what else could be happening.

Maybe after increasing the log level on kube-controller-manager you can toggle the hints annotation on the Service to trigger some EndpointSlice controller syncs for that Service?

@dbourget
Copy link
Author

@robscott I've tried --v=9 and --v=5. This is all I see after toggling the annotation:

I0726 20:15:23.065046       1 endpointslice_controller.go:344] About to update endpoint slices for service "default/maxscale"
I0726 20:15:23.065074       1 endpoints_controller.go:413] About to update endpoints for service "default/maxscale"
I0726 20:15:23.065455       1 endpoints_controller.go:520] endpoints are equal for default/maxscale, skipping update
I0726 20:15:23.065464       1 endpoints_controller.go:381] Finished syncing service "default/maxscale" endpoints. (399.393µs)
I0726 20:15:23.077506       1 endpointslice_controller.go:318] Finished syncing service "default/maxscale" endpoint slices. (12.454619ms)
I0726 20:15:23.077604       1 graph_builder.go:632] GraphBuilder process object: discovery.k8s.io/v1/EndpointSlice, namespace default, name maxscale-2fx42, uid e27efdec-4d50-46d5-ac1a-784ee97d742a, event type update, virtual=false

That makes it seem like the issue is that it considers the endpointslice not worth updating despite the change in annotation, so I tried creating a new service instead (like the old but with name "maxscale-local"). This is the output at verbosity level 5:

I0726 20:23:03.089676       1 graph_builder.go:632] GraphBuilder process object: v1/Service, namespace default, name maxscale-local, uid 876e6092-b1e7-46f5-b6be-3ad904b844a5, event type add, virtual=false
I0726 20:23:03.089703       1 endpointslice_controller.go:344] About to update endpoint slices for service "default/maxscale-local"
I0726 20:23:03.089759       1 endpoints_controller.go:413] About to update endpoints for service "default/maxscale-local"
I0726 20:23:03.090502       1 endpoints_controller.go:553] Update endpoints for default/maxscale-local, ready: 15 not ready: 0
I0726 20:23:03.096325       1 graph_builder.go:632] GraphBuilder process object: v1/ConfigMap, namespace ingress-nginx, name ingress-controller-leader-nginx, uid 6219184a-429c-45db-bd25-371a098d331b, event type update, virtual=false
I0726 20:23:03.127457       1 graph_builder.go:632] GraphBuilder process object: discovery.k8s.io/v1/EndpointSlice, namespace default, name maxscale-local-plsq6, uid f998e5ef-abe5-47d9-b775-44f5a3cf6919, event type add, virtual=false
I0726 20:23:03.127483       1 endpoints_controller.go:381] Finished syncing service "default/maxscale-local" endpoints. (37.742698ms)
I0726 20:23:03.127490       1 graph_builder.go:632] GraphBuilder process object: v1/Endpoints, namespace default, name maxscale-local, uid 1e7e556c-35de-4bfa-95f8-0c071a026032, event type add, virtual=false
I0726 20:23:03.127484       1 endpointslicemirroring_controller.go:273] syncEndpoints("default/maxscale-local")
I0726 20:23:03.127515       1 endpointslicemirroring_controller.go:270] Finished syncing EndpointSlices for "default/maxscale-local" Endpoints. (39.582µs)
I0726 20:23:03.127543       1 endpointslice_controller.go:318] Finished syncing service "default/maxscale-local" endpoint slices. (37.853387ms)

Interestingly, when I redeploy the daemonset I see log entries like this:

I0726 20:22:13.793393       1 endpoints_controller.go:413] About to update endpoints for service "default/maxscale-rest"
I0726 20:22:13.793401       1 endpointslice_controller.go:344] About to update endpoint slices for service "default/maxscale"
I0726 20:22:13.793353       1 deployment_controller.go:357] "Pod deleted" pod="default/maxscale-5nr7h"
I0726 20:22:13.793452       1 endpointslice_controller.go:344] About to update endpoint slices for service "default/maxscale-rest"
I0726 20:22:13.793637       1 endpoints_controller.go:520] endpoints are equal for default/maxscale-rest, skipping update
I0726 20:22:13.793652       1 endpoints_controller.go:381] Finished syncing service "default/maxscale-rest" endpoints. (265.796µs)
I0726 20:22:13.793661       1 endpoints_controller.go:520] endpoints are equal for default/maxscale, skipping update
I0726 20:22:13.793667       1 endpoints_controller.go:381] Finished syncing service "default/maxscale" endpoints. (293.181µs)
I0726 20:22:13.793762       1 endpointslice_controller.go:318] Finished syncing service "default/maxscale-rest" endpoint slices. (311.076µs)
I0726 20:22:13.793886       1 topologycache.go:91] Insufficient endpoints, removing hints from default/maxscale Service
I0726 20:22:13.793903       1 endpointslice_controller.go:318] Finished syncing service "default/maxscale" endpoint slices. (507.257µs)

I take it that the "insufficient endpoints" condition is triggered by the pods restarting.

To recap:

  1. Toggling the annotation doesn't seem to trigger an update of the endpointslice.
  2. Creating a new service doesn't result in an endpointslice with hints. There is no comment about insufficient endpoints or anything that explains the lack of hints (at v=5).
  3. I do get a complaint about insufficient pods when I'm in the middle of a rolling update and some pods are down.

thanks for looking into this!

@alexandrst88
Copy link

alexandrst88 commented Jul 27, 2021

I have the same issue, i found that annotations should be with Auto value, #100728.

service.kubernetes.io/topology-aware-hints: Auto

But it doesn't help, i was digging into code, and found that

allocations := t.getAllocations(totalEndpoints)
, seems like return nil in my case. I don't understand a calculation behind this method
func (t *TopologyCache) getAllocations(numEndpoints int) map[string]Allocation {
.

How amout of cpuRatio per zone depend on the traffic routing?

	if t.cpuRatiosByZone == nil || len(t.cpuRatiosByZone) < 2 || len(t.cpuRatiosByZone) > numEndpoints {
		return nil
	}

from doc:
Insufficient number of endpoints: If there are less endpoints than zones in a cluster, the controller will not assign any hints.

@thockin
Copy link
Member

thockin commented Aug 5, 2021

ping @robscott

@robscott
Copy link
Member

robscott commented Aug 5, 2021

Thanks for the reminder on this one! I've been digging through the code and also don't have a great answer for what's happening here. I think that means we need more logging as a starting. I'll work on adding that.

@robscott
Copy link
Member

robscott commented Aug 5, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 5, 2021
@llhuii
Copy link

llhuii commented Sep 30, 2021

@dbourget do you mind checking that you indeed enable the TopologyAwareHints feature for kube-apiserver with below commands?

kubectl -n kube-system get pod $(kubectl -n kube-system get pod |awk NF=/kube-apiserver/) -o yaml |grep -i topology

kubectl apply -f - <<EOF
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: test
addressType: IPv4
endpoints:
- addresses:
  - 10.244.3.49
  hints:
    forZones:
    - name: abc
ports:
- name: ""
  port: 80
  protocol: TCP
EOF

kubectl get endpointslice.discovery.k8s.io/test -o yaml |grep -A4 hints

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2021
@dbourget
Copy link
Author

@llhuii thanks for following up, and sorry for the slow response.

$kubectl -n kube-system get pod $(kubectl -n kube-system get pod |awk NF=/kube-apiserver/) -o yaml |grep -i topology`

      - --feature-gates=TopologyAwareHints=true,ServiceTopology=true
      - --feature-gates=TopologyAwareHints=true,ServiceTopology=true
      - --feature-gates=TopologyAwareHints=true,ServiceTopology=true
      - --feature-gates=TopologyAwareHints=true,ServiceTopology=true

That's one line per host except for a host that was rebuilt and doesn't currently have the feature enabled, but did have it back when I was testing this.

Second output is:

  hints:
    forZones:
    - name: abc
kind: EndpointSlice
metadata:
--
      {"addressType":"IPv4","apiVersion":"discovery.k8s.io/v1","endpoints":[{"addresses":["10.244.3.49"],"hints":{"forZones":[{"name":"abc"}]}}],"kind":"EndpointSlice","metadata":{"annotations":{},"name":"test","namespace":"default"},"ports":[{"name":"","port":80,"protocol":"TCP"}]}
  creationTimestamp: "2022-01-12T17:20:34Z"
  generation: 1
  name: test
  namespace: default

Does this tell you something interesting?

@robscott
Copy link
Member

@dbourget some fixes were made in v1.23, can you try with that version? I'd also recommend removing ServiceTopology=true from --feature-gates as it has been removed in later k8s versions.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2022
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 11, 2022
@robscott
Copy link
Member

I believe this has been fixed, please reopen if not.

/close

@k8s-ci-robot
Copy link
Contributor

@robscott: Closing this issue.

In response to this:

I believe this has been fixed, please reopen if not.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants