New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WINC-1181: Add pod CPU and Memory metrics #1949
WINC-1181: Add pod CPU and Memory metrics #1949
Conversation
Skipping CI for Draft Pull Request. |
/approve cancel |
ee5ccc2
to
e533c5d
Compare
@sebsoto This is for Pods not Nodes, correct? That distinction might be helpful in the title and commit message. |
Correct, I'll add that to the commit/pr message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Should this be documented? So that users are aware such metrics are not available.
What about backports? |
e533c5d
to
a763c96
Compare
a763c96
to
f3a6cd1
Compare
@sebsoto: This pull request references WINC-568 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sebsoto: This pull request references WINC-1181 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
record: pod:container_cpu_usage:sum | ||
- expr: | | ||
label_replace(windows_container_memory_usage_private_working_set_bytes * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""},"container","","","") | ||
record: container_memory_working_set_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It maybe out of scope here, but a quick thing to give the number of running pods
- expr: |
windows_container_available
record: kubelet_running_pods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mansikulkarni96 I'd prefer that be done as a followup PR as this work is targetting the graphs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @sebsoto, PTAL at my comments.
@@ -39,3 +39,9 @@ spec: | |||
- expr: | | |||
windows_cpu_info | |||
record: node_cpu_info | |||
- expr: | | |||
sum(rate(windows_container_cpu_usage_seconds_total[5m]) * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""}) by (pod,namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have to test this out but AFAIK pod label is not available through windows_exporter, an option here to remove * on(container_id)
is to do a label mapping in:
[bundle/manifests/windows-exporter_monitoring.coreos.com_v1_servicemonitor.yaml]
relabelings:
- action: replace
regex: (.*)
replacement:
sourceLabels:
- __meta_kubernetes_endpoint_address_target_name
targetLabel: instance
- action: labelmap
source_labels: [container_id]
replacement: pod
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct Windows exporter does not add the pod label
I am adding it via this method
@sebsoto changes look good here, I missed this in my previous review but wouldn't it be beneficial to add some tests here to ensure we continue to get the pod metrics like we do here:
|
Adds recording rules which translates the windows exporter's pod CPU and memory metrics into the time series required for display in the OpenShift console.
f3a6cd1
to
ecf531f
Compare
Adds makePrometheusQuery() making code more readable, and allowing for the reuse of the functionality elsewhere.
ecf531f
to
8926229
Compare
/test azure-e2e-operator vsphere-e2e-operator |
Thanks for working on this @sebsoto ! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mansikulkarni96 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/test remaining-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks, finally got to understand the query structure.
@@ -70,6 +70,14 @@ func testStorage(t *testing.T) { | |||
} | |||
}() | |||
} | |||
t.Run("pod metrics", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would such a test be better suited as part of metrics_test.go? May it'd be best in the long run to split the other metrics tests out of the creation suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dependent on a workload, for the sake of not increasing the test runtime anymore I felt it best to put it in the storage test. Ideally in the future we use the same workload for both the storage test, and the network tests.
/override ci/prow/aws-e2e-upgrade |
@sebsoto: Overrode contexts on behalf of sebsoto: ci/prow/aws-e2e-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tests that Windows pod metrics are being properly generated. This will run as part of the storage test in order to not deploy another workload.
8926229
to
1c3c17a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -70,6 +70,17 @@ func testStorage(t *testing.T) { | |||
} | |||
}() | |||
} | |||
t.Run("pod metrics", func(t *testing.T) { | |||
if skipWorkloadDeletion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it's worth leaving a TODO to remove this condition in 10.16?
/test remaining-required |
/test nutanix-e2e-operator |
@sebsoto: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-4.15 |
@sebsoto: new pull request created: #1976 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sebsoto: new pull request could not be created: failed to create pull request against openshift/windows-machine-config-operator#release-4.15 from head openshift-cherrypick-robot:cherry-pick-1949-to-release-4.15: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for openshift-cherrypick-robot:cherry-pick-1949-to-release-4.15."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request"} In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Adds recording rules which translates the windows exporter's CPU and memory metrics into the time series required for display in the OpenShift console.