New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce pagination and filtering to improve ClusterProfiler memory usage #6511
Introduce pagination and filtering to improve ClusterProfiler memory usage #6511
Conversation
Hi @knopt. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @davidvossel |
b6185f3
to
6774d08
Compare
581df90
to
75665fd
Compare
req := &v1.ClusterProfilerRequest{ | ||
PageSize: int64(pageSize), | ||
LabelSelector: labelSelector, | ||
} | ||
counter := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req := &v1.ClusterProfilerRequest{ | |
PageSize: int64(pageSize), | |
LabelSelector: labelSelector, | |
} | |
counter := 0 | |
var ( | |
req = &v1.ClusterProfilerRequest{ | |
PageSize: int64(pageSize), | |
LabelSelector: labelSelector, | |
} | |
counter = 0 | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if _, err := os.Stat(dir); err == nil { | ||
oldResultsDstDir := fmt.Sprintf("%s-old-%s", dir, rand.String(4)) | ||
log.Printf("Moving already existing %q => %q\n", dir, oldResultsDstDir) | ||
os.Rename(dir, oldResultsDstDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.Rename(dir, oldResultsDstDir) | |
if err := os.Rename(dir, oldResultsDstDir); err != nil { | |
return err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/virt-api/rest/profiler.go
Outdated
|
||
for _, pod := range podList.Items { | ||
if podIsReadyComponent(&pod) { | ||
pods = append(pods, pod.DeepCopy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need DeepCopy
here? I think we could just simply use non-allocatable filtering and drop the DeepCopy
, so before the for
loop:
pods = podList.Items[:0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clientutil "kubevirt.io/client-go/util" | ||
) | ||
|
||
const ( | ||
maxClusterProfilerResultsPageSize = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20
sounds really low for the max
, is there a reason for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6478. Each additional entry adds ~9Mb of MEM usage to virt-api, so restricting max to 20
gives us at maximum ~200Mb usage.
This was just my judgement of how much additional memory is reasonable for this feature. Probably it would be great to have it configurable in a config, but I think it should go in a different MR if anyone needs it.
What's more, dumping cluster profiles is not performance related, so I would prefer safety (not killing virt-api with OOMs) over speed of downloading debugging data.
@VirrageS: changing LGTM is restricted to collaborators In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, I just made a few comments.
The pagination logic in the cluster-profiler tool is way better than i expected. Functionally, you've found a way to reduce the data cached in virt-api at one time while still making it trivial for a user to retrieve all that data using the client tooling.
|
||
// +k8s:openapi-gen=true | ||
type ClusterProfilerRequest struct { | ||
LabelSelector string `json:"labelSelectors"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's an api warning because the variable name is LabelSelector
and the json marker has an s
on it.
also, does this need omiteempty
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added omitempty
api/api-rule-violations-known.list
Outdated
@@ -221,6 +221,7 @@ API rule violation: names_match,kubevirt.io/client-go/api/v1,CDRomTarget,ReadOnl | |||
API rule violation: names_match,kubevirt.io/client-go/api/v1,CPU,DedicatedCPUPlacement | |||
API rule violation: names_match,kubevirt.io/client-go/api/v1,CloudInitConfigDriveSource,UserDataSecretRef | |||
API rule violation: names_match,kubevirt.io/client-go/api/v1,CloudInitNoCloudSource,UserDataSecretRef | |||
API rule violation: names_match,kubevirt.io/client-go/api/v1,ClusterProfilerRequest,LabelSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either change the variable to LabelSelectors, or change the json marker to labelSelector. The s
character mis match in types.go is what causes this. We ideally don't want to add more known violations unless we have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've changed labelSelector json tag
flag.IntVar(&pageSize, "page-size", defaultDumpPageSize, "Page size used for fetching profile results. Works only with dump command") | ||
|
||
// NOTE: To profile specific kubevirt component (for example virt-api) use `kubevirt.io=virt-operator` label selector. | ||
flag.StringVar(&labelSelector, "l", "", "Label selector for limiting pods to fetch the profiler results from. kubectl LIST label selector format expected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does label only work for the dump command as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works only for the dump
command. I've added the info to the flag description + validation
func fetchAndSaveClusterProfilerResults(c kubecli.KubevirtClient, pageSize int, labelSelector, outputDir string) error { | ||
if err := prepareDir(outputDir); err != nil { | ||
return err | ||
} | ||
|
||
req := &v1.ClusterProfilerRequest{ | ||
PageSize: int64(pageSize), | ||
LabelSelector: labelSelector, | ||
} | ||
counter := 0 | ||
|
||
for { | ||
fmt.Printf("\rFetching in progress. Downloaded so far: %d ", counter) | ||
result, err := c.ClusterProfiler().Dump(req) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
if len(result.ComponentResults) == 0 { | ||
break | ||
} | ||
|
||
err = writeResultsToDisk(outputDir, result) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
counter += len(result.ComponentResults) | ||
if result.Continue == "" { | ||
break | ||
} | ||
req.Continue = result.Continue | ||
} | ||
|
||
log.Printf("\rSUCCESS: Dumped PProf %d results for KubeVirt control plane to [%s]\n", counter, outputDir) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting. So the user never sees the token for the pages, we just always download all the data all at once in a loop using this tool. I like that.
75665fd
to
6e42b39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @knopt thanks a lot for your contribution! We require commits to be signed, you can read more about it here https://github.com/kubevirt/kubevirt/blob/main/CONTRIBUTING.md#contributor-compliance-with-developer-certificate-of-origin-dco Let us know if you need any help :) |
6e42b39
to
055886e
Compare
055886e
to
d750e4b
Compare
…usage Signed-off-by: Tomasz Knopik <tomasz.knopik@gmail.com>
I've rebased the change + adjusted commit messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
/ok-to-test |
/retest |
1 similar comment
/retest |
What this PR does / why we need it:
This MR improves (significantly reduces) memory usage of virt-api when executing ClusterProfiler
dump
request. Large scale cluster without this feature might experience virt-api OOMs when using ClusterProfiler.Which issue(s) this PR fixes:
Fixes #6478
Special notes for your reviewer:
General idea behind this change is to use pagination to reduce number of pods processed at one time. Additionally, label selector is introduced for users convenience and further memory reduction.
Label selector can be used to for instance filter pods based on their component type. Additionally, field selector can be added, the question remains if it's necessary atm.
Release note: