Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod_swap_usage_bytes as an expected metric in resource metric e2e test #119421

Merged

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jul 19, 2023

What type of PR is this?

/kind cleanup
/kind failing-test

What this PR does / why we need it:

Recently, swap stats were added to /stats/summary and /metrics/resource.
These changes caused the ResourceMetricsAPI e2e test to fail at periodic jobs [1], that do not expect the pod_swap_usage_bytes metric.

This PR adds expectation for pod_swap_usage_bytes metric to the test, which will now cause it to pass.

[1] https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-features

Which issue(s) this PR fixes:

Fixes #119400

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 19, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @iholder101. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 19, 2023
@iholder101 iholder101 force-pushed the swap/adapt-resource-metric-test branch from 27e07f7 to 683e36a Compare July 19, 2023 08:04
@iholder101
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 19, 2023
@PiotrProkop
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 19, 2023
@pacoxu
Copy link
Member

pacoxu commented Jul 19, 2023

/assign

@iholder101
Copy link
Contributor Author

/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features

@iholder101 iholder101 force-pushed the swap/adapt-resource-metric-test branch from 683e36a to bbd9786 Compare July 19, 2023 09:27
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features

@iholder101 iholder101 force-pushed the swap/adapt-resource-metric-test branch from bbd9786 to a4f8892 Compare July 19, 2023 09:55
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 19, 2023
@iholder101
Copy link
Contributor Author

iholder101 commented Jul 19, 2023

@pacoxu @ffromani

Tests pass and PR is ready to review. PTAL.

Changes summary:
The pod_swap_usage_bytes metric is not always reported, as it depends if swap is enabled on the node. Therefore, I've added the IgnoreMissing option to MatchKeys, this way even if pod_swap_usage_bytes is missing the test would pass.

However, I wanted to keep ensuring that all of the other metrics are always reported. Therefore, I've explicitly ensured they were reported by using And(..., haveKeys(...)).

@@ -165,3 +165,18 @@ func boundedSample(lower, upper interface{}) types.GomegaMatcher {
"Histogram": gstruct.Ignore(),
}))
}

func haveKeys(keys ...string) types.GomegaMatcher {
gomega.Expect(keys).ToNot(gomega.BeEmpty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use ExpectWithOffset - not sure if there's a better method mark this function as helper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done 👍

@@ -73,7 +73,7 @@ var _ = SIGDescribe("ResourceMetricsAPI [NodeFeature:ResourceMetrics]", func() {
memoryCapacity := node.Status.Capacity["memory"]
memoryLimit := memoryCapacity.Value()

matchResourceMetrics := gstruct.MatchAllKeys(gstruct.Keys{
matchResourceMetrics := gomega.And(gstruct.MatchKeys(gstruct.IgnoreMissing, gstruct.Keys{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how hard is to detect if the swap is neabled in the node programmatically, and just use different MatchAllKeys in the two flows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can fix the failure and iterate later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of an easy way to do so.

Even if NodeSwap feature-gate is enabled, it doesn't mean that the node actually has swap provisioned on it. We could always try to find the node that the pod runs on and try executing stuff on it to find this out, but I think it's a big enough complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can fix the failure and iterate later

Sounds good to me 👍 Thanks!

Copy link
Contributor

@ffromani ffromani Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking aloud, maybe we can use swapon? Anyway, the more I think about it the more it seems wiser to fix the burning issue and improve later, the current approach seems good enough for the time being.

If you resubmit for whatever reason, please add a copy of this content #119421 (comment) as code comment right above the gomega.And

Signed-off-by: Itamar Holder <iholder@redhat.com>
Use haveKeys() matcher from previous commit to ensure
required keys exist.

Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101 iholder101 force-pushed the swap/adapt-resource-metric-test branch from 0d74030 to ee82654 Compare July 19, 2023 11:44
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 19, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cfdeabac715824bb51bbbfc184c67161945e9947

@iholder101
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind

@ffromani
Copy link
Contributor

/cc @SergeyKanzhelev @mrunalp

@bart0sh bart0sh moved this from Triage to Needs Approver in SIG Node PR Triage Jul 19, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 19, 2023

/assign @SergeyKanzhelev @mrunalp

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iholder101, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Jul 19, 2023

/milestone v1.28

@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jul 19, 2023
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit fa88c0b into kubernetes:master Jul 20, 2023
14 of 15 checks passed
SIG Node CI/Test Board automation moved this from Triage to Done Jul 20, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Jul 20, 2023
@pacoxu
Copy link
Member

pacoxu commented Jul 20, 2023

https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-crio-cgroupv1-node-e2e-features
is green now.

test failures(7 today) link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
9 participants