Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule to collect issues with the pods #11557

Merged
merged 1 commit into from Apr 15, 2024

Conversation

avlitman
Copy link
Contributor

@avlitman avlitman commented Mar 20, 2024

What this PR does

Before this PR:
We currently have rules that calculate the available memory for all pods per container.
After this PR:
This pr adds rule that calculate issues with the pods like the memory of the pod with the highest memory exception value by container based on working set/rss memory (in bytes).

For example, if the virt-api container has 2 pods, one of which has an exception rate of 1000 and the other has -2000: the rule will show 1000.
The rules added is kubevirt_memory_delta_from_requested_bytes which includes both the working_set and rss memory minus the requested memory, see:
Screenshot from 2024-03-31 21-48-41

**cnv_abnormal added to hco repo, see pr: kubevirt/hyperconverged-cluster-operator#2855

Fixes #
Jira-ticket: https://issues.redhat.com/browse/CNV-39598

Why we need it and why it was done in this way

this pr is the next step after deprecating the KubeVirtComponentExceedsRequestedCPU and KubeVirtComponentExceedsRequestedMemory alerts: kubevirt/monitoring#222.
we need it in order to still track memory exceeded issues and other issues in the future without making noise (like the alerts did)

Links to places where the discussion took place:

The rules calculation discussed widely with cnv maintainers.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

New memory statistics added named kubevirt_memory_delta_from_requested_bytes

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M labels Mar 20, 2024
@kubevirt-bot kubevirt-bot added area/monitoring sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/scale labels Mar 20, 2024
@avlitman avlitman changed the title [monitoring] Add rules to collect memory exceed values [wip] Add rules to collect memory exceed values Mar 20, 2024
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2024
@avlitman avlitman force-pushed the add-mem-metric branch 2 times, most recently from 6391ad9 to 8c08654 Compare March 20, 2024 12:28
@avlitman avlitman marked this pull request as draft March 20, 2024 12:28
@avlitman avlitman marked this pull request as ready for review March 21, 2024 08:15
@avlitman avlitman changed the title [wip] Add rules to collect memory exceed values Add rules to collect issues with the pods Mar 21, 2024
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2024
@avlitman
Copy link
Contributor Author

@enp0s3 @iholder101 Will appreciate your review (:

@iholder101
Copy link
Contributor

iholder101 commented Mar 21, 2024

@enp0s3 @iholder101 Will appreciate your review (:

Please tag someone that's more familiar with monitoring for an initial review.
Since @sradco and @machadovilaca are going to be SIG-obersvability's approver/reviewer, they might be a good fit.

Copy link

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what you're going to do with the collected metrics. Will you use them to fine-tune the default memory requests? What's the impact for users if requests << usage?

pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
pkg/monitoring/rules/recordingrules/operator.go Outdated Show resolved Hide resolved
@avlitman avlitman force-pushed the add-mem-metric branch 3 times, most recently from 23c650c to 3296b3a Compare March 23, 2024 13:23
docs/metrics.md Outdated Show resolved Hide resolved
@sradco
Copy link
Contributor

sradco commented Apr 8, 2024

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2024
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2024
@avlitman
Copy link
Contributor Author

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2024
@machadovilaca
Copy link
Member

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2024
Signed-off-by: alitman <alitman@alitman-thinkpadp1gen4i.tlv.csb>
@kubevirt-bot kubevirt-bot added sig/observability Denotes an issue or PR that relates to observability. and removed lgtm Indicates that a PR is ready to be merged. labels Apr 15, 2024
@machadovilaca
Copy link
Member

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2024
@enp0s3
Copy link
Contributor

enp0s3 commented Apr 15, 2024

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enp0s3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2024
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-windows2016
/test pull-kubevirt-e2e-kind-1.27-vgpu
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.29-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-storage
/test pull-kubevirt-e2e-k8s-1.27-sig-compute
/test pull-kubevirt-e2e-k8s-1.27-sig-operator
/test pull-kubevirt-e2e-k8s-1.28-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-storage
/test pull-kubevirt-e2e-k8s-1.28-sig-compute
/test pull-kubevirt-e2e-k8s-1.28-sig-operator

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 15, 2024
@kubevirt-bot
Copy link
Contributor

@avlitman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.27-sig-performance 007b9e2 link true /test pull-kubevirt-e2e-k8s-1.27-sig-performance
pull-kubevirt-conformance-arm64 8ed18b4 link false /test pull-kubevirt-conformance-arm64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@avlitman
Copy link
Contributor Author

/cherry-pick release-1.2

@kubevirt-bot
Copy link
Contributor

@avlitman: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot kubevirt-bot merged commit deafbf9 into kubevirt:main Apr 15, 2024
39 of 40 checks passed
@kubevirt-bot
Copy link
Contributor

@avlitman: new pull request created: #11703

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/monitoring dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/observability Denotes an issue or PR that relates to observability. sig/scale size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants