Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WINC-1181: Add pod CPU and Memory metrics #1949

Merged
merged 3 commits into from Dec 13, 2023

Conversation

sebsoto
Copy link
Contributor

@sebsoto sebsoto commented Nov 16, 2023

Adds recording rules which translates the windows exporter's CPU and memory metrics into the time series required for display in the OpenShift console.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2023
Copy link
Contributor

openshift-ci bot commented Nov 16, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@sebsoto
Copy link
Contributor Author

sebsoto commented Nov 16, 2023

image

@sebsoto
Copy link
Contributor Author

sebsoto commented Nov 16, 2023

/approve cancel

@mtnbikenc
Copy link
Member

@sebsoto This is for Pods not Nodes, correct? That distinction might be helpful in the title and commit message.

@sebsoto
Copy link
Contributor Author

sebsoto commented Nov 17, 2023

Correct, I'll add that to the commit/pr message

Copy link
Contributor

@jrvaldes jrvaldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Should this be documented? So that users are aware such metrics are not available.

@jrvaldes
Copy link
Contributor

What about backports?

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 21, 2023
@sebsoto sebsoto changed the title [metrics] Add CPU and Memory metrics [WIP] [metrics] Add CPU and Memory metrics Nov 21, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2023
@sebsoto sebsoto changed the title [WIP] [metrics] Add CPU and Memory metrics WINC-568: Add CPU and Memory metrics Dec 6, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 6, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 6, 2023

@sebsoto: This pull request references WINC-568 which is a valid jira issue.

In response to this:

Adds recording rules which translates the windows exporter's CPU and memory metrics into the time series required for display in the OpenShift console.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sebsoto sebsoto changed the title WINC-568: Add CPU and Memory metrics WINC-568: Add pod CPU and Memory metrics Dec 6, 2023
@sebsoto sebsoto changed the title WINC-568: Add pod CPU and Memory metrics WINC-1181: Add pod CPU and Memory metrics Dec 6, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 6, 2023

@sebsoto: This pull request references WINC-1181 which is a valid jira issue.

In response to this:

Adds recording rules which translates the windows exporter's CPU and memory metrics into the time series required for display in the OpenShift console.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

record: pod:container_cpu_usage:sum
- expr: |
label_replace(windows_container_memory_usage_private_working_set_bytes * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""},"container","","","")
record: container_memory_working_set_bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maybe out of scope here, but a quick thing to give the number of running pods

  • expr: |
    windows_container_available
    record: kubelet_running_pods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mansikulkarni96 I'd prefer that be done as a followup PR as this work is targetting the graphs

Copy link
Member

@mansikulkarni96 mansikulkarni96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @sebsoto, PTAL at my comments.

@@ -39,3 +39,9 @@ spec:
- expr: |
windows_cpu_info
record: node_cpu_info
- expr: |
sum(rate(windows_container_cpu_usage_seconds_total[5m]) * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""}) by (pod,namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have to test this out but AFAIK pod label is not available through windows_exporter, an option here to remove * on(container_id) is to do a label mapping in:
[bundle/manifests/windows-exporter_monitoring.coreos.com_v1_servicemonitor.yaml]

relabelings:
        - action: replace
          regex: (.*)
          replacement:
          sourceLabels:
            - __meta_kubernetes_endpoint_address_target_name
          targetLabel: instance
        - action: labelmap
          source_labels: [container_id]
          replacement: pod
```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct Windows exporter does not add the pod label
I am adding it via this method

@mansikulkarni96
Copy link
Member

@sebsoto changes look good here, I missed this in my previous review but wouldn't it be beneficial to add some tests here to ensure we continue to get the pod metrics like we do here:

func (tc *testContext) testNodeResourceUsage(t *testing.T) {

Adds recording rules which translates the windows exporter's pod CPU
and memory metrics into the time series required for display in the
OpenShift console.
Adds makePrometheusQuery() making code more readable, and allowing for
the reuse of the functionality elsewhere.
@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 11, 2023

/test azure-e2e-operator vsphere-e2e-operator

@mansikulkarni96
Copy link
Member

Thanks for working on this @sebsoto !
/approve

Copy link
Contributor

openshift-ci bot commented Dec 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mansikulkarni96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2023
@jrvaldes
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2023
@sebsoto sebsoto marked this pull request as ready for review December 12, 2023 14:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 12, 2023
@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 12, 2023

/test remaining-required

Copy link
Contributor

@saifshaikh48 saifshaikh48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks, finally got to understand the query structure.

@@ -70,6 +70,14 @@ func testStorage(t *testing.T) {
}
}()
}
t.Run("pod metrics", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would such a test be better suited as part of metrics_test.go? May it'd be best in the long run to split the other metrics tests out of the creation suite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dependent on a workload, for the sake of not increasing the test runtime anymore I felt it best to put it in the storage test. Ideally in the future we use the same workload for both the storage test, and the network tests.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 40881cf and 2 for PR HEAD 8926229 in total

@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 12, 2023

/override ci/prow/aws-e2e-upgrade
Test passed, must-gather failed unrelated to WMCO

Copy link
Contributor

openshift-ci bot commented Dec 12, 2023

@sebsoto: Overrode contexts on behalf of sebsoto: ci/prow/aws-e2e-upgrade

In response to this:

/override ci/prow/aws-e2e-upgrade
Test passed, must-gather failed unrelated to WMCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Tests that Windows pod metrics are being properly generated. This will
run as part of the storage test in order to not deploy another workload.
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2023
Copy link
Contributor

@saifshaikh48 saifshaikh48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -70,6 +70,17 @@ func testStorage(t *testing.T) {
}
}()
}
t.Run("pod metrics", func(t *testing.T) {
if skipWorkloadDeletion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's worth leaving a TODO to remove this condition in 10.16?

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2023
@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 13, 2023

/test remaining-required

@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 13, 2023

/test nutanix-e2e-operator

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 40881cf and 2 for PR HEAD 1c3c17a in total

Copy link
Contributor

openshift-ci bot commented Dec 13, 2023

@sebsoto: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 9975407 into openshift:master Dec 13, 2023
17 checks passed
@sebsoto
Copy link
Contributor Author

sebsoto commented Dec 13, 2023

/cherry-pick release-4.15

@openshift-cherrypick-robot

@sebsoto: new pull request created: #1976

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@sebsoto: new pull request could not be created: failed to create pull request against openshift/windows-machine-config-operator#release-4.15 from head openshift-cherrypick-robot:cherry-pick-1949-to-release-4.15: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for openshift-cherrypick-robot:cherry-pick-1949-to-release-4.15."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request"}

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants