Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: fix cpi collector bad fd err #782

Merged

Conversation

songtao98
Copy link
Contributor

Signed-off-by: songtao98 songtao2603060@gmail.com

Ⅰ. Describe what this PR does

This PR fixes the error occurs at koordlet CPI collector when closing container cgroup files after profiling container CPI. Related error information is shown as below:

E1102 21:10:28.757191 435643 performance_collector_linux.go:99] close CgroupFd 36, err : bad file descriptor

Previously we need to get one container's cgroup directory file descriptor to call perf_event_open() as param pid, thus we use struct PerfCollector to save that fd after open the cgroup path instead of saving a reference to the *os.File it self. It turns out that with golang gc mechanism the *os.File gets garbage collected, then the fd gets closed. But at performance_collector_linux.go we close that file explicitly again and that leads to the above error.

A solution to the problem is to change PerfCollector to record the *os.File instead of Fd int and call File.Fd() when you need it. For the purpose of minimum change, this PR store both of them.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Base: 68.16% // Head: 68.14% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (e36e573) compared to base (94913ea).
Patch coverage: 50.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #782      +/-   ##
==========================================
- Coverage   68.16%   68.14%   -0.03%     
==========================================
  Files         210      210              
  Lines       24084    24096      +12     
==========================================
+ Hits        16418    16421       +3     
- Misses       6525     6534       +9     
  Partials     1141     1141              
Flag Coverage Δ
unittests 68.14% <50.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...dlet/metricsadvisor/performance_collector_linux.go 77.52% <0.00%> (-4.62%) ⬇️
pkg/util/stat.go 81.08% <66.66%> (ø)
pkg/util/perf/perf_linux.go 50.00% <76.92%> (+2.50%) ⬆️
pkg/koordlet/statesinformer/states_pods.go 54.60% <0.00%> (-2.13%) ⬇️
pkg/scheduler/plugins/elasticquota/node_handler.go 68.18% <0.00%> (+0.48%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jasonliu747
Copy link
Member

/cc @zwzhang0107 @saintube

@songtao98 songtao98 force-pushed the fix_cpi_collector_badfd_err branch 2 times, most recently from 4b5b38c to 719be34 Compare November 3, 2022 11:16
@saintube
Copy link
Member

saintube commented Nov 4, 2022

@songtao98 UT pipeline failed to upload to codecov, please retry.

You can use git commit --amend to trigger a commit without any code modification.

@saintube
Copy link
Member

saintube commented Nov 4, 2022

/lgtm

Signed-off-by: songtao98 <songtao2603060@gmail.com>
@zwzhang0107
Copy link
Contributor

/lgtm

@eahydra eahydra changed the title fix koordlet cpi collector bad fd err koordlet: fix cpi collector bad fd err Nov 8, 2022
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 3a39f5b into koordinator-sh:main Nov 8, 2022
@FillZpp
Copy link
Member

FillZpp commented Dec 13, 2022

/milestone v1.1

@koordinator-bot koordinator-bot bot added this to the v1.1 milestone Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants