-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support pod namespace index in cache #120778
Comments
/sig api-machinery |
refer #61343 . You may get the details that why pod request does not hit apiserver cache |
if you satisfied, please close the issue |
@tamilselvan1102 Perhaps you misunderstand me. I mean:
|
Could you give an output of any client or |
@jiahuif @wojtek-t @tamilselvan1102 Case Description Clearly: Why doesn't the community build namespace index for pods in apiserver side so that it can significantly improve the cpu cost and decrease the query latency? Try to implement:
Question: |
Anyone take a look and help answer? |
IIRC this cache is used when watch cache is enabled. /cc @wojtek-t |
Hi @ahutsunshine, this might be a good topic to be discussed in sig meetings. Please feel free to bring it there :) /cc @p0lyn0mial |
You can define index by namespace already today - that works. There aren't any strong technical reasons why we can't do that - the reasoning was basically that: If you can provide some good data (bechmark or sth) showing that would help a lot for some real-world scenario, then we should consider adding it. |
I'm doing the namespace indexer implement and will do a benchmark and will share, because we already met the performance issue without namespace indexer when hitting apiserver cache. |
/triage accepted |
/cc |
1 similar comment
/cc |
@wojtek-t @cici37 @jiahuif @AxeZhan @DrAuYueng I would like to share the pod namespace indexer benchmark. Feel free to comment and recommend. https://docs.google.com/document/d/1bTxGqlH0tqrpH16NxC2SIUTGMN3_1zNKRiRdIW9Y5VU/edit?usp=sharing |
Thanks for writing that down - I added couple comments. So there are couple questions that I have now: (1) Can you briefly describe what usage pattern you have so that you're seeing a bunch of namespace-scoped LISTs? (2) The latency impact doesn't convince me - I don't think it really matter much if something takes 50ms or 5ms So I'm interested in cases like:
|
@wojtek-t Thank you for your feedback. I've addressed your comments in the document. (1) In our scenarios, the tekton pipeline controller in our k8s platform performs numerous list pod operations. With thousands of pipelines and over 7000 pipeline namespaces (e.g. pip-cd), each containing a few pods, listing these pods across various namespaces significantly impacts CPU and memory resources. Our dashboard indicates a QPS range of 10-40, and these requests typically persist for 1-2 minutes or more. (2) Even small list requests, taking 5ms or 50ms individually, can lead to noticeable CPU and memory usage when multiplied and lasting several minutes. In my benchmark document, I simulated a similar scenario where the test client runs for approximately 5 minutes at a fixed QPS. For instance, set QPS as 10 and list 100 pods that means the client sends 10 list requests per second, targeting 100 pods in a specific namespace. Subsequently, I analyzed the p50, p75, p90, and p99 latency of all requests while collecting CPU and memory data in a stable status.
The below pictures show the impacts of cpu & memory with different QPS(Our test way is our client runs for 5 minutes continuously with the certain QPS and we statistic cpu & memory when stable and test with different QPS). |
@wojtek-t Any other concerns for the feature? |
I guess what I'm missing is the case where there is 0 QPS. Statiscally majority of clusters in the world are very small or even idle and I would like to understand if the impact for those isn't non-negligible. |
@wojtek-t For small clusters, the namespace indexer feature might not be noticeable in small clusters, but for large clusters with 100k+ pods or even 1000k+ pods, it will show perfect performance. Actually some companies like Ant Group (famous for Alipay), they also implement the feature for optimization in their internal Sigma k8s platform from the public blogs where their pod count exceeds 1000k+. Listing small pods like 10 pods in one namespace will significantly strains cpu resources by filtering from 1000k+ pods. In scenarios with high request volumes, the absence of the namespace indexer could result in excessive cpu usage, potentially causing incidents. |
But I want to see data for that - I want to see that e.g. with 0 QPS we're not observing visibly more memory. |
@wojtek-t Sorry for the late response. Do you mean the memory data between with indexer and without indexer (all with 0 qps)? If yes, we have already compared and record it in the document. The cpu and memory usage is almost the same in stable status whether pod namespace indexer feature is enabled or not. |
Yes - that's what i was asking for - I somehow missed that. Thanks - that makes sense. I guess I would be fine with adding the indexer, but I wouldn't do that for that many resources for now as you have in your PR. I would rather start just with pods and really prove that for quite some time before extending to others. |
Thanks @wojtek-t. That's really a good news. I can update my PR to enable the pod namespace indexer firstly. |
What would you like to be added?
Why is this needed?
In our cluster, the number of some resources like pods are very large(100k+). When listing a namespace resources like pods from apiserver cache, the apiserver process will firstly get all the resources in all the namespaces and then filter the resources by namespace predicate even it's a very small request. It's heavy for filtering namespaced resource even the resource number in the namespace is very small with many client requests. It consumes lots of cpu & memory that's not expected.
The text was updated successfully, but these errors were encountered: