Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skip watch Headless Service in server-side. #122394

Closed
Sakuralbj opened this issue Dec 19, 2023 · 16 comments · Fixed by #122541 or #123905 · May be fixed by #126769
Closed

Support skip watch Headless Service in server-side. #122394

Sakuralbj opened this issue Dec 19, 2023 · 16 comments · Fixed by #122541 or #123905 · May be fixed by #126769
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@Sakuralbj
Copy link
Contributor

What would you like to be added?

Support watch service with field-selector spec.clusterIP!=None to skip headless service in kube-apiserver. And kubelet/kube-proxy can skip watch headless-service in the future.
image

Now, endpoints will be added labels service.kubernetes.io/headless for headless service by endpoints-controller, and kube-proxy will ignore event for it by label-selector. https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-proxy/app/server.go#L913. But headless service haven't be added label and kube-proxy still watch for it.

For kubelet, it watch service to add service-env for pods , but it ignores headless service when build service-env-map(https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L549). So kubelet don't need to watch headless service either.

Why is this needed?

In a large-cluster with more then 5000 nodes, when we run kubeflow task( tfjob:pod:headlessService=1:10:10) with qps 10, it can save huge network bandwidth and cpu for kube-apiserver.

@Sakuralbj Sakuralbj added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 19, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 19, 2023
@neolit123
Copy link
Member

/sig network api-machinery

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 19, 2023
@aojea
Copy link
Member

aojea commented Dec 29, 2023

In a large-cluster with more then 5000 nodes, when we run kubeflow task( tfjob:pod:headlessService=1:10:10) with qps 1

how many headless Services does that kubeflow task creates?

@aojea
Copy link
Member

aojea commented Dec 29, 2023

/sig scalability
this falls into that area too

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Dec 29, 2023
@Sakuralbj
Copy link
Contributor Author

In a large-cluster with more then 5000 nodes, when we run kubeflow task( tfjob:pod:headlessService=1:10:10) with qps 1

how many headless Services does that kubeflow task creates?

Under steady-state conditions, the cluster maintains 30,000 headless services. During daily new tasks, an additional 10,000 headless services will be created. The number of nodes in the cluster is around 7,000

@aojea
Copy link
Member

aojea commented Dec 30, 2023

interesting, I can see lately a lot of consumption of headless Services specially in AI/ML workloads, instead of labelling Services we can add a field selector on the ClusterIP field, WDYT @wojtek-t @thockin ?

/cc @alculquicondor @ahg-g

@aojea
Copy link
Member

aojea commented Dec 31, 2023

Submitted #122541 , we are already having a selector for PodIP #50091 so I think is ok having a field selector for the clusterIP field, that will bring the benefit of being able to build informers that filter headless services by using spec.clusterIP!="None"
/assign

@Sakuralbj
Copy link
Contributor Author

Submitted #122541 , we are already having a selector for PodIP #50091 so I think is ok having a field selector for the clusterIP field, that will bring the benefit of being able to build informers that filter headless services by using spec.clusterIP!="None" /assign

@aojea
We firstly consider support the function by field-selector for clusterIP, compared to add label to headless service as we discussed in #122502 , the main disadvantages is:

kube-apiserver needs to support field-selector with clusterIP firstly, and then kubelet/kube-proxy can be updated. Otherwise, clients performing List-watch operations will receive errors like this, which is not suitable for updating on existing clusters.
image
When using the way for adding labels for headless-service, kube-proxy does not need to be updated (As it always list-watch with label-selector service.kubernetes.io/headless not exists) .

@Sakuralbj
Copy link
Contributor Author

@aojea If adding a field-selector for clusterIP is considered to be ok. I think we can add IndexFields for service spec.clusterIP as well (such as node metadata.name),it can reduce events-count converted from kube-apiserver common channel to every cacheWatcher,reduce cpu-usage for kube-apiserver.

@aojea
Copy link
Member

aojea commented Jan 1, 2024

. I think we can add IndexFields for service spec.clusterIP as well (such as node metadata.name),it can reduce events-count converted from kube-apiserver common channel to every cacheWatcher,reduce cpu-usage for kube-apiserver.

That is on sig-scalability area and better @wojtek-t to decide in that, I can see how Headless services are more used lately in Kubernetes , special fo AI/ML workloads, so I think is a fair request to implement this field selector and allow to filter server side headless service, but I do not know the tradeoffs of using IndexFields

@wojtek-t
Copy link
Member

wojtek-t commented Jan 1, 2024

instead of labelling Services we can add a field selector on the ClusterIP field, WDYT @wojtek-t @thockin ?

+1

I would be ok with adding IndexFields for headless services too [re tradeoff, it adds a bit of memory usage, but it is fairly low - there was a similar thing happening recently in https://github.com//issues/120778 that has a lot of good analysis done]

@aojea
Copy link
Member

aojea commented Jan 1, 2024

I would be ok with adding IndexFields for headless services too [re tradeoff, it adds a bit of memory usage, but it is fairly low - there was a similar thing happening recently in https://github.com//issues/120778 that has a lot of good analysis done]

I can see that IndexFields implementation is feature gated, I think we can iterate here and go with the fieldselector first and evaluate later the IndexFields addition

@alculquicondor
Copy link
Member

The documentation currently says that headless services are labeled. Is this not accurate?

https://kubernetes.io/docs/reference/labels-annotations-taints/#servicekubernetesioheadless

@aojea
Copy link
Member

aojea commented Jan 2, 2024

The documentation currently says that headless services are labeled. Is this not accurate?

https://kubernetes.io/docs/reference/labels-annotations-taints/#servicekubernetesioheadless

The control plane adds this label to an Endpoints object when the owning Service is headless.

@alexzielenski
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2024
@aojea
Copy link
Member

aojea commented Mar 12, 2024

/reopen

We had to revert the fix https://github.com/kubernetes/kubernetes/pull/123896/files

@k8s-ci-robot k8s-ci-robot reopened this Mar 12, 2024
@k8s-ci-robot
Copy link
Contributor

@aojea: Reopened this issue.

In response to this:

/reopen

We had to revert the fix https://github.com/kubernetes/kubernetes/pull/123896/files

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
7 participants