Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: fix can not collect psi metrics #1402

Merged
merged 1 commit into from Jun 25, 2023

Conversation

lucming
Copy link
Contributor

@lucming lucming commented Jun 21, 2023

Ⅰ. Describe what this PR does

koordlet can not collect psi metrics.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

  1. kernel with psi feature on
sudo grubby --update-kernel="/boot/vmlinuz-4.19.91-24.8.an8.x86_64" --args="psi=1 psi_v1=1"

update-kernel need to be replaced with the actual kernel version
截屏2023-06-21 10 31 51

  1. open koordlet's feature for collecting psi metrics.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: koordlet
spec:
  template:
    spec:
      containers:
        - name: koordlet
          image: registry.cn-beijing.aliyuncs.com/koordinator-sh/koordlet:v1.2.0
          command:
            - /koordlet
          args:
            - '-cgroup-root-dir=/host-cgroup/'
            - >-
              -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true,CPICollector=true,PSICollector=true
            - '-runtime-hooks-host-endpoint=/var/run/koordlet/koordlet.sock'
            - '--logtostderr=true'
            - '--v=4'
  1. then we can see
    3.1. Access to koordlet's metrics api, but didn't get any info about psi
    3.2. check koordlet's log:
    截屏2023-06-20 17 04 19
    It seems to be saying that there are no pressure related files under cpuacct cgroup
    3.3. looking at the cgroup, we can see that psi-related files exist
    截屏2023-06-21 10 35 30

then there may be a problem with koordlet handling

The essential cause of the problem:
In golang, global variables are initialized before normal functions, so even though there is logic in the koordlet that modifies CgroupPathFormatter, the initialization of global variables only happens once and before the modification, so the logic in SupportedIfFileExistsInKubepods will use cgroupPathFormatterInSystemd, then the file related to psi can not be find, then making koordlet decide not support psi.

截屏2023-06-21 11 07 10

截屏2023-06-21 11 14 00

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@lucming lucming force-pushed the fix-get-cgroupdriver branch 4 times, most recently from 6ed709b to 5ff5f04 Compare June 21, 2023 03:41
@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Patch coverage: 64.58% and project coverage change: +0.02 🎉

Comparison is base (382f7a0) 64.74% compared to head (6c5ebad) 64.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1402      +/-   ##
==========================================
+ Coverage   64.74%   64.76%   +0.02%     
==========================================
  Files         333      333              
  Lines       34296    34345      +49     
==========================================
+ Hits        22204    22243      +39     
- Misses      10458    10468      +10     
  Partials     1634     1634              
Flag Coverage Δ
unittests 64.76% <64.58%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/koordlet/util/system/cgroup_driver_linux.go 36.13% <64.10%> (+13.63%) ⬆️
pkg/koordlet/util/system/cgroup_driver.go 51.56% <66.66%> (+1.14%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@koordinator-bot koordinator-bot bot added size/L and removed size/M labels Jun 21, 2023
@lucming lucming force-pushed the fix-get-cgroupdriver branch 2 times, most recently from 58d7527 to 386df6d Compare June 21, 2023 11:29
Signed-off-by: lucming <2876757716@qq.com>
@saintube
Copy link
Member

/lgtm
@lucming Please remember to mark the conversation as resolved.

@hormes
Copy link
Member

hormes commented Jun 25, 2023

/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 28c1925 into koordinator-sh:main Jun 25, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants