support pod namespace index in cache #120778

ahutsunshine · 2023-09-20T16:03:30Z

What would you like to be added?

Why doesn't the k8s community have a feature to list pods by namespace index when listing pods from apiserver cache?
When I try to implement the pod namespace index, I see "we don't support multiple trigger functions defined". What's the reason?

Why is this needed?

In our cluster, the number of some resources like pods are very large(100k+). When listing a namespace resources like pods from apiserver cache, the apiserver process will firstly get all the resources in all the namespaces and then filter the resources by namespace predicate even it's a very small request. It's heavy for filtering namespaced resource even the resource number in the namespace is very small with many client requests. It consumes lots of cpu & memory that's not expected.

neolit123 · 2023-09-20T16:22:12Z

/sig api-machinery

tamilselvan1102 · 2023-09-21T05:18:29Z

refer #61343 . You may get the details that why pod request does not hit apiserver cache

tamilselvan1102 · 2023-09-21T05:44:56Z

if you satisfied, please close the issue

ahutsunshine · 2023-09-21T15:01:28Z

@tamilselvan1102 Perhaps you misunderstand me. I mean:

When hitting the apiserver cache, why doesn't the k8s community have a feature to list pods by namespace index? Currently it has to list all the pods in all namespaces and filter the target pods by the namespace.
In the k8s code, we see we don't support multiple trigger functions defined. Why don't we support multiple pod trigger functions?

jiahuif · 2023-09-21T16:49:40Z

Could you give an output of any client or kubectl command, show what metrics you are looking for, and their values.
/cc @wojtek-t

ahutsunshine · 2023-09-22T12:59:02Z

@jiahuif @wojtek-t @tamilselvan1102

Case Description Clearly:
We use client-go to get pods in a namespace in user side. We want to improve the performance in apiserver side. The case is the operation of LIST pods in a namespace from apiservers cache (means hitting the apiserver cache) costs lots of cpu when all the pods in all namespaces reaches 100k+ in apiserver side. Because the apiserver will list from 100k+ pods from cache and filter the target pods with the request namespace and then response to the client. It's a heavy request for filtering namespace pods even the pod number in the namespace is very small.

Why doesn't the community build namespace index for pods in apiserver side so that it can significantly improve the cpu cost and decrease the query latency?

Try to implement:
When we try to add a namespace indexer for pods, I find the community code doesn't support multiple indexer. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L349 It shows cacher pods doesn't support more than one IndexerFunc.

var indexedTrigger *indexedTriggerFunc
	if config.IndexerFuncs != nil {
		// For now, we don't support multiple trigger functions defined
		// for a given resource.
		if len(config.IndexerFuncs) > 1 {
			return nil, fmt.Errorf("cacher %s doesn't support more than one IndexerFunc: ", reflect.TypeOf(obj).String())
		}
		for key, value := range config.IndexerFuncs {
			if value != nil {
				indexedTrigger = &indexedTriggerFunc{
					indexName:   key,
					indexerFunc: value,
				}
			}
		}
	}

Question:
Why don't we support multiple pod trigger functions? What are the concerns for the community?

ahutsunshine · 2023-09-25T14:47:47Z

Anyone take a look and help answer?

alexzielenski · 2023-09-26T20:04:22Z

IIRC this cache is used when watch cache is enabled.

/cc @wojtek-t
Could you please lend some context if you can to our ability to index this cache by other fields?

cici37 · 2023-10-05T16:38:46Z

Hi @ahutsunshine, this might be a good topic to be discussed in sig meetings. Please feel free to bring it there :)

/cc @p0lyn0mial

wojtek-t · 2023-10-06T07:05:31Z

Could you please lend some context if you can to our ability to index this cache by other fields?

You can define index by namespace already today - that works.
What doesn't work is adding this index if there is already some other index defined for a given resource [so more than 1 index].

There aren't any strong technical reasons why we can't do that - the reasoning was basically that:
(a) it adds complexity to already really complex piece of code
(b) we don't have a good rationale to justify it

If you can provide some good data (bechmark or sth) showing that would help a lot for some real-world scenario, then we should consider adding it.

ahutsunshine · 2023-10-09T05:18:08Z

I'm doing the namespace indexer implement and will do a benchmark and will share, because we already met the performance issue without namespace indexer when hitting apiserver cache.

jiahuif · 2023-10-10T20:31:31Z

/triage accepted

AxeZhan · 2023-10-12T05:41:27Z

/cc

DrAuYueng · 2023-10-18T11:23:57Z

/cc

ahutsunshine · 2023-11-10T06:31:05Z

I complete the benchmark for pod namespace indexers. It shows excellent performance in terms of latency, CPU, and memory in large clusters (like with 100k+ pods) for listing pods within a single namespace with namespace indexers. This implementation significantly reduces cpu and memory consumption, especially when dealing with a small number of pods.
Will share the benchmark document later.

ahutsunshine · 2023-11-15T16:19:17Z

@wojtek-t @cici37 @jiahuif @AxeZhan @DrAuYueng I would like to share the pod namespace indexer benchmark. Feel free to comment and recommend. https://docs.google.com/document/d/1bTxGqlH0tqrpH16NxC2SIUTGMN3_1zNKRiRdIW9Y5VU/edit?usp=sharing

wojtek-t · 2023-11-15T20:27:17Z

Thanks for writing that down - I added couple comments.

So there are couple questions that I have now:

(1) Can you briefly describe what usage pattern you have so that you're seeing a bunch of namespace-scoped LISTs?
(a) Who/what is sending them?
(b) How many QPS for these you actually observe

(2) The latency impact doesn't convince me - I don't think it really matter much if something takes 50ms or 5ms
So we're down to CPU/MEM impact that can justify this change.
Those are much clearer and more convincing, but what I'm missing is the impact on smaller/less list-intensive clusters.

So I'm interested in cases like:

there are a lot of pods and high churn of those, but there aren't really namespace-scoped lists
there is a medium number of pods (say 1000), no/low traffic - do we see any impact on cpu/memory?

ahutsunshine · 2023-11-16T15:22:25Z

@wojtek-t Thank you for your feedback. I've addressed your comments in the document.

(1) In our scenarios, the tekton pipeline controller in our k8s platform performs numerous list pod operations. With thousands of pipelines and over 7000 pipeline namespaces (e.g. pip-cd), each containing a few pods, listing these pods across various namespaces significantly impacts CPU and memory resources. Our dashboard indicates a QPS range of 10-40, and these requests typically persist for 1-2 minutes or more.

(2) Even small list requests, taking 5ms or 50ms individually, can lead to noticeable CPU and memory usage when multiplied and lasting several minutes. In my benchmark document, I simulated a similar scenario where the test client runs for approximately 5 minutes at a fixed QPS. For instance, set QPS as 10 and list 100 pods that means the client sends 10 list requests per second, targeting 100 pods in a specific namespace. Subsequently, I analyzed the p50, p75, p90, and p99 latency of all requests while collecting CPU and memory data in a stable status.

there is a medium number of pods (say 1000), no/low traffic - do we see any impact on cpu/memory?

The below pictures show the impacts of cpu & memory with different QPS(Our test way is our client runs for 5 minutes continuously with the certain QPS and we statistic cpu & memory when stable and test with different QPS).

ahutsunshine · 2023-11-21T05:40:19Z

@wojtek-t Any other concerns for the feature?

wojtek-t · 2023-11-23T20:13:30Z

The below pictures show the impacts of cpu & memory with different QPS(Our test way is our client runs for 5 minutes continuously with the certain QPS and we statistic cpu & memory when stable and test with different QPS).

I guess what I'm missing is the case where there is 0 QPS. Statiscally majority of clusters in the world are very small or even idle and I would like to understand if the impact for those isn't non-negligible.
So 0 QPS is one thing, the other is some very small cluster with say 10 pods.

ahutsunshine · 2023-11-24T03:40:16Z

@wojtek-t For small clusters, the namespace indexer feature might not be noticeable in small clusters, but for large clusters with 100k+ pods or even 1000k+ pods, it will show perfect performance. Actually some companies like Ant Group (famous for Alipay), they also implement the feature for optimization in their internal Sigma k8s platform from the public blogs where their pod count exceeds 1000k+. Listing small pods like 10 pods in one namespace will significantly strains cpu resources by filtering from 1000k+ pods. In scenarios with high request volumes, the absence of the namespace indexer could result in excessive cpu usage, potentially causing incidents.
If you have some concerns or more recommendations, I would like to join the sig meeting to have more discussion if having.

wojtek-t · 2023-11-24T14:06:52Z

@wojtek-t For small clusters, the namespace indexer feature might not be noticeable in small clusters,

But I want to see data for that - I want to see that e.g. with 0 QPS we're not observing visibly more memory.
And that for small clusters we're not observing memory increase [or rather that memory increase is negligible].

ahutsunshine · 2023-11-28T03:28:51Z

@wojtek-t Sorry for the late response. Do you mean the memory data between with indexer and without indexer (all with 0 qps)? If yes, we have already compared and record it in the document. The cpu and memory usage is almost the same in stable status whether pod namespace indexer feature is enabled or not.

wojtek-t · 2023-11-28T08:25:10Z

Yes - that's what i was asking for - I somehow missed that. Thanks - that makes sense.

I guess I would be fine with adding the indexer, but I wouldn't do that for that many resources for now as you have in your PR. I would rather start just with pods and really prove that for quite some time before extending to others.

ahutsunshine · 2023-11-28T08:57:13Z

Thanks @wojtek-t. That's really a good news. I can update my PR to enable the pod namespace indexer firstly.

ahutsunshine added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 20, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 20, 2023

ahutsunshine changed the title ~~support pod namespace index~~ support pod namespace index in cache Sep 20, 2023

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 20, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 10, 2023

ahutsunshine mentioned this issue Nov 15, 2023

support namespace indexer for namespaced resources like pods #121906

Merged

wojtek-t mentioned this issue Jan 1, 2024

Support skip watch Headless Service in server-side. #122394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support pod namespace index in cache #120778

support pod namespace index in cache #120778

ahutsunshine commented Sep 20, 2023 •

edited

neolit123 commented Sep 20, 2023

tamilselvan1102 commented Sep 21, 2023 •

edited

tamilselvan1102 commented Sep 21, 2023

ahutsunshine commented Sep 21, 2023

jiahuif commented Sep 21, 2023

ahutsunshine commented Sep 22, 2023 •

edited

ahutsunshine commented Sep 25, 2023

alexzielenski commented Sep 26, 2023 •

edited

cici37 commented Oct 5, 2023

wojtek-t commented Oct 6, 2023

ahutsunshine commented Oct 9, 2023

jiahuif commented Oct 10, 2023

AxeZhan commented Oct 12, 2023

DrAuYueng commented Oct 18, 2023

ahutsunshine commented Nov 10, 2023 •

edited

ahutsunshine commented Nov 15, 2023

wojtek-t commented Nov 15, 2023

ahutsunshine commented Nov 16, 2023 •

edited

ahutsunshine commented Nov 21, 2023

wojtek-t commented Nov 23, 2023

ahutsunshine commented Nov 24, 2023

wojtek-t commented Nov 24, 2023

ahutsunshine commented Nov 28, 2023

wojtek-t commented Nov 28, 2023

ahutsunshine commented Nov 28, 2023

support pod namespace index in cache #120778

support pod namespace index in cache #120778

Comments

ahutsunshine commented Sep 20, 2023 • edited

What would you like to be added?

Why is this needed?

neolit123 commented Sep 20, 2023

tamilselvan1102 commented Sep 21, 2023 • edited

tamilselvan1102 commented Sep 21, 2023

ahutsunshine commented Sep 21, 2023

jiahuif commented Sep 21, 2023

ahutsunshine commented Sep 22, 2023 • edited

ahutsunshine commented Sep 25, 2023

alexzielenski commented Sep 26, 2023 • edited

cici37 commented Oct 5, 2023

wojtek-t commented Oct 6, 2023

ahutsunshine commented Oct 9, 2023

jiahuif commented Oct 10, 2023

AxeZhan commented Oct 12, 2023

DrAuYueng commented Oct 18, 2023

ahutsunshine commented Nov 10, 2023 • edited

ahutsunshine commented Nov 15, 2023

wojtek-t commented Nov 15, 2023

ahutsunshine commented Nov 16, 2023 • edited

ahutsunshine commented Nov 21, 2023

wojtek-t commented Nov 23, 2023

ahutsunshine commented Nov 24, 2023

wojtek-t commented Nov 24, 2023

ahutsunshine commented Nov 28, 2023

wojtek-t commented Nov 28, 2023

ahutsunshine commented Nov 28, 2023

ahutsunshine commented Sep 20, 2023 •

edited

tamilselvan1102 commented Sep 21, 2023 •

edited

ahutsunshine commented Sep 22, 2023 •

edited

alexzielenski commented Sep 26, 2023 •

edited

ahutsunshine commented Nov 10, 2023 •

edited

ahutsunshine commented Nov 16, 2023 •

edited