Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support pod namespace index in cache #120778

Open
ahutsunshine opened this issue Sep 20, 2023 · 25 comments
Open

support pod namespace index in cache #120778

ahutsunshine opened this issue Sep 20, 2023 · 25 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ahutsunshine
Copy link
Contributor

ahutsunshine commented Sep 20, 2023

What would you like to be added?

  1. Why doesn't the k8s community have a feature to list pods by namespace index when listing pods from apiserver cache?
  2. When I try to implement the pod namespace index, I see "we don't support multiple trigger functions defined". What's the reason?

Why is this needed?

In our cluster, the number of some resources like pods are very large(100k+). When listing a namespace resources like pods from apiserver cache, the apiserver process will firstly get all the resources in all the namespaces and then filter the resources by namespace predicate even it's a very small request. It's heavy for filtering namespaced resource even the resource number in the namespace is very small with many client requests. It consumes lots of cpu & memory that's not expected.

@ahutsunshine ahutsunshine added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 20, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 20, 2023
@ahutsunshine ahutsunshine changed the title support pod namespace index support pod namespace index in cache Sep 20, 2023
@neolit123
Copy link
Member

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 20, 2023
@tamilselvan1102
Copy link

tamilselvan1102 commented Sep 21, 2023

refer #61343 . You may get the details that why pod request does not hit apiserver cache

@tamilselvan1102
Copy link

if you satisfied, please close the issue

@ahutsunshine
Copy link
Contributor Author

@tamilselvan1102 Perhaps you misunderstand me. I mean:

  1. When hitting the apiserver cache, why doesn't the k8s community have a feature to list pods by namespace index? Currently it has to list all the pods in all namespaces and filter the target pods by the namespace.
  2. In the k8s code, we see we don't support multiple trigger functions defined. Why don't we support multiple pod trigger functions?

@jiahuif
Copy link
Member

jiahuif commented Sep 21, 2023

Could you give an output of any client or kubectl command, show what metrics you are looking for, and their values.
/cc @wojtek-t

@ahutsunshine
Copy link
Contributor Author

ahutsunshine commented Sep 22, 2023

@jiahuif @wojtek-t @tamilselvan1102

Case Description Clearly:
We use client-go to get pods in a namespace in user side. We want to improve the performance in apiserver side. The case is the operation of LIST pods in a namespace from apiservers cache (means hitting the apiserver cache) costs lots of cpu when all the pods in all namespaces reaches 100k+ in apiserver side. Because the apiserver will list from 100k+ pods from cache and filter the target pods with the request namespace and then response to the client. It's a heavy request for filtering namespace pods even the pod number in the namespace is very small.

Why doesn't the community build namespace index for pods in apiserver side so that it can significantly improve the cpu cost and decrease the query latency?

Try to implement:
When we try to add a namespace indexer for pods, I find the community code doesn't support multiple indexer. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L349 It shows cacher pods doesn't support more than one IndexerFunc.

var indexedTrigger *indexedTriggerFunc
	if config.IndexerFuncs != nil {
		// For now, we don't support multiple trigger functions defined
		// for a given resource.
		if len(config.IndexerFuncs) > 1 {
			return nil, fmt.Errorf("cacher %s doesn't support more than one IndexerFunc: ", reflect.TypeOf(obj).String())
		}
		for key, value := range config.IndexerFuncs {
			if value != nil {
				indexedTrigger = &indexedTriggerFunc{
					indexName:   key,
					indexerFunc: value,
				}
			}
		}
	}

Question:
Why don't we support multiple pod trigger functions? What are the concerns for the community?

@ahutsunshine
Copy link
Contributor Author

Anyone take a look and help answer?

@alexzielenski
Copy link
Contributor

alexzielenski commented Sep 26, 2023

IIRC this cache is used when watch cache is enabled.

/cc @wojtek-t
Could you please lend some context if you can to our ability to index this cache by other fields?

@cici37
Copy link
Contributor

cici37 commented Oct 5, 2023

Hi @ahutsunshine, this might be a good topic to be discussed in sig meetings. Please feel free to bring it there :)

/cc @p0lyn0mial

@wojtek-t
Copy link
Member

wojtek-t commented Oct 6, 2023

Could you please lend some context if you can to our ability to index this cache by other fields?

You can define index by namespace already today - that works.
What doesn't work is adding this index if there is already some other index defined for a given resource [so more than 1 index].

There aren't any strong technical reasons why we can't do that - the reasoning was basically that:
(a) it adds complexity to already really complex piece of code
(b) we don't have a good rationale to justify it

If you can provide some good data (bechmark or sth) showing that would help a lot for some real-world scenario, then we should consider adding it.

@ahutsunshine
Copy link
Contributor Author

I'm doing the namespace indexer implement and will do a benchmark and will share, because we already met the performance issue without namespace indexer when hitting apiserver cache.

@jiahuif
Copy link
Member

jiahuif commented Oct 10, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 10, 2023
@AxeZhan
Copy link
Member

AxeZhan commented Oct 12, 2023

/cc

1 similar comment
@DrAuYueng
Copy link
Contributor

/cc

@ahutsunshine
Copy link
Contributor Author

ahutsunshine commented Nov 10, 2023

I complete the benchmark for pod namespace indexers. It shows excellent performance in terms of latency, CPU, and memory in large clusters (like with 100k+ pods) for listing pods within a single namespace with namespace indexers. This implementation significantly reduces cpu and memory consumption, especially when dealing with a small number of pods.
Will share the benchmark document later.
List 100 pods p90 latency(s)
List 100 pods memory(G)
List 100 pods cpu

@ahutsunshine
Copy link
Contributor Author

@wojtek-t @cici37 @jiahuif @AxeZhan @DrAuYueng I would like to share the pod namespace indexer benchmark. Feel free to comment and recommend. https://docs.google.com/document/d/1bTxGqlH0tqrpH16NxC2SIUTGMN3_1zNKRiRdIW9Y5VU/edit?usp=sharing

@wojtek-t
Copy link
Member

Thanks for writing that down - I added couple comments.

So there are couple questions that I have now:

(1) Can you briefly describe what usage pattern you have so that you're seeing a bunch of namespace-scoped LISTs?
(a) Who/what is sending them?
(b) How many QPS for these you actually observe

(2) The latency impact doesn't convince me - I don't think it really matter much if something takes 50ms or 5ms
So we're down to CPU/MEM impact that can justify this change.
Those are much clearer and more convincing, but what I'm missing is the impact on smaller/less list-intensive clusters.

So I'm interested in cases like:

  • there are a lot of pods and high churn of those, but there aren't really namespace-scoped lists
  • there is a medium number of pods (say 1000), no/low traffic - do we see any impact on cpu/memory?

@ahutsunshine
Copy link
Contributor Author

ahutsunshine commented Nov 16, 2023

@wojtek-t Thank you for your feedback. I've addressed your comments in the document.

(1) In our scenarios, the tekton pipeline controller in our k8s platform performs numerous list pod operations. With thousands of pipelines and over 7000 pipeline namespaces (e.g. pip-cd), each containing a few pods, listing these pods across various namespaces significantly impacts CPU and memory resources. Our dashboard indicates a QPS range of 10-40, and these requests typically persist for 1-2 minutes or more.

(2) Even small list requests, taking 5ms or 50ms individually, can lead to noticeable CPU and memory usage when multiplied and lasting several minutes. In my benchmark document, I simulated a similar scenario where the test client runs for approximately 5 minutes at a fixed QPS. For instance, set QPS as 10 and list 100 pods that means the client sends 10 list requests per second, targeting 100 pods in a specific namespace. Subsequently, I analyzed the p50, p75, p90, and p99 latency of all requests while collecting CPU and memory data in a stable status.

there is a medium number of pods (say 1000), no/low traffic - do we see any impact on cpu/memory?

The below pictures show the impacts of cpu & memory with different QPS(Our test way is our client runs for 5 minutes continuously with the certain QPS and we statistic cpu & memory when stable and test with different QPS).

List 1000 pods cpu
List 1000 pods memory(G)

@ahutsunshine
Copy link
Contributor Author

@wojtek-t Any other concerns for the feature?

@wojtek-t
Copy link
Member

The below pictures show the impacts of cpu & memory with different QPS(Our test way is our client runs for 5 minutes continuously with the certain QPS and we statistic cpu & memory when stable and test with different QPS).

I guess what I'm missing is the case where there is 0 QPS. Statiscally majority of clusters in the world are very small or even idle and I would like to understand if the impact for those isn't non-negligible.
So 0 QPS is one thing, the other is some very small cluster with say 10 pods.

@ahutsunshine
Copy link
Contributor Author

@wojtek-t For small clusters, the namespace indexer feature might not be noticeable in small clusters, but for large clusters with 100k+ pods or even 1000k+ pods, it will show perfect performance. Actually some companies like Ant Group (famous for Alipay), they also implement the feature for optimization in their internal Sigma k8s platform from the public blogs where their pod count exceeds 1000k+. Listing small pods like 10 pods in one namespace will significantly strains cpu resources by filtering from 1000k+ pods. In scenarios with high request volumes, the absence of the namespace indexer could result in excessive cpu usage, potentially causing incidents.
If you have some concerns or more recommendations, I would like to join the sig meeting to have more discussion if having.

@wojtek-t
Copy link
Member

@wojtek-t For small clusters, the namespace indexer feature might not be noticeable in small clusters,

But I want to see data for that - I want to see that e.g. with 0 QPS we're not observing visibly more memory.
And that for small clusters we're not observing memory increase [or rather that memory increase is negligible].

@ahutsunshine
Copy link
Contributor Author

@wojtek-t Sorry for the late response. Do you mean the memory data between with indexer and without indexer (all with 0 qps)? If yes, we have already compared and record it in the document. The cpu and memory usage is almost the same in stable status whether pod namespace indexer feature is enabled or not.
image

@wojtek-t
Copy link
Member

Yes - that's what i was asking for - I somehow missed that. Thanks - that makes sense.

I guess I would be fine with adding the indexer, but I wouldn't do that for that many resources for now as you have in your PR. I would rather start just with pods and really prove that for quite some time before extending to others.

@ahutsunshine
Copy link
Contributor Author

Thanks @wojtek-t. That's really a good news. I can update my PR to enable the pod namespace indexer firstly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

10 participants