Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timer GC delays in the Linux filesystem collector #2169

Merged
merged 2 commits into from
Oct 24, 2021

Conversation

jordy1024
Copy link
Contributor

@jordy1024 jordy1024 commented Oct 17, 2021

Hello,I found a potential hazard in filesystem_linux.go that may cause memory leaks,the details are as follows:
function GetStats will loop through mount points, according to the number of all mount points.
Recently I encountered a problem: the number of mount points of a certain system in the production environment is abnormal, as many as more than 500,000 rows; mainly because the for loop is blocked, and then the Timer Heap objects in the statement for {select case->time.After cause accumulation, and eventually a memory leak occurs.
It may be a better way to control the number of mount points to be checked, but this may require more changes; therefore, I only made a slight optimization for this problem: change time.After to time.NewTimer.
Expect to be accepted or answered.

@jordy1024 jordy1024 changed the title Avoid time.after memory leak when calling time.After Avoid memory leak when calling time.After Oct 17, 2021
Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
@discordianfish
Copy link
Member

Hi,

not sure I understand the implications of this change. If you have 500k mountpoints, having 500k timer objects that disappear again after the timeout is expected. And from what I can tell, with your change this would still be the case, right?

@jordy1024 jordy1024 force-pushed the hotfix_filesystem_go branch 2 times, most recently from 89f3c96 to 8a87b71 Compare October 24, 2021 07:35
@jordy1024
Copy link
Contributor Author

@SuperQ hi, why ci/circleci report "Your tests failed on CircleCI" ? What should I do ?

@SuperQ
Copy link
Member

SuperQ commented Oct 24, 2021

I think the issue with CircleCI is because you've selected to "Follow" the node_exporter fork in your CircleCI account. This triggers CircleCI to think that it should build under your account, rather than the Prometheus account.

If you click the "Unfollow Project" for node_exporter in your CircleCI account, we can try to re-trigger builds.

@jordy1024
Copy link
Contributor Author

I think the issue with CircleCI is because you've selected to "Follow" the node_exporter fork in your CircleCI account. This triggers CircleCI to think that it should build under your account, rather than the Prometheus account.

If you click the "Unfollow Project" for node_exporter in your CircleCI account, we can try to re-trigger builds.

Okay, maybe this is my problem;
When I saw a CI error, I immediately clicked and performed the related sign up, and also clicked follow, haha ^_^.
It has been cancelled(Unfollow Project), so I don’t need to do anything for the next build step?

@SuperQ
Copy link
Member

SuperQ commented Oct 24, 2021

You might need to make a commit to trigger the build. Maybe squash up the commit history with git rebase -i origin/master?

@jordy1024 jordy1024 changed the title Avoid memory leak when calling time.After Use time.NewTimer instead of using time.After that maybe cause memory leak. Oct 24, 2021
Signed-off-by: bawenmao <jordy1024@163.com>

Use Timer instead of using time.After that can avoid memory leak.

Signed-off-by: bawenmao <jordy1024@163.com>

Avoid time.after memory leak when calling time.After

Signed-off-by: bawenmao <jordy1024@163.com>

Use Timer instead of using time.After that maybe cause memory leak.

Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
@jordy1024
Copy link
Contributor Author

You might need to make a commit to trigger the build. Maybe squash up the commit history with git rebase -i origin/master?

Okay,Thanks.
Please review again, looking forward to being approved.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, for the record if you read the Go docs for time.After() this is explicitly mentioned.

After waits for the duration to elapse and then sends the current time on the returned channel. It is equivalent to NewTimer(d).C. The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.

So, yes, it's much better to explicitly call Stop() when there's nothing to wait for anymore.

@SuperQ SuperQ changed the title Use time.NewTimer instead of using time.After that maybe cause memory leak. Fix GC delays in the Linux filesystem collector Oct 24, 2021
@SuperQ SuperQ changed the title Fix GC delays in the Linux filesystem collector Fix timer GC delays in the Linux filesystem collector Oct 24, 2021
@SuperQ SuperQ merged commit fbc2354 into prometheus:master Oct 24, 2021
@roidelapluie
Copy link
Member

Awesome findings!

@SuperQ SuperQ mentioned this pull request Oct 27, 2021
SuperQ added a commit that referenced this pull request Oct 27, 2021
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156

Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ added a commit that referenced this pull request Oct 28, 2021
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add darwin powersupply collector #1777
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156

Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ added a commit that referenced this pull request Oct 29, 2021
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ #1771
* [FEATURE] Add darwin powersupply collector #1777
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156

Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ added a commit that referenced this pull request Nov 18, 2021
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ #1771
* [FEATURE] Add darwin powersupply collector #1777
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156

Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ added a commit that referenced this pull request Nov 18, 2021
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ #1771
* [FEATURE] Add darwin powersupply collector #1777
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156

Signed-off-by: Ben Kochie <superq@gmail.com>
@juzhao juzhao mentioned this pull request Jun 12, 2023
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling.

Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector prometheus#2146
* [CHANGE] Exclude filesystems under /run/credentials prometheus#2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ prometheus#1771
* [FEATURE] Add darwin powersupply collector prometheus#1777
* [FEATURE] Add support for monitoring GPUs on Linux prometheus#1998
* [FEATURE] Add Darwin thermal collector prometheus#2032
* [FEATURE] Add os release collector prometheus#2094
* [FEATURE] Add netdev.address-info collector prometheus#2105
* [ENHANCEMENT] Support glob textfile collector directories prometheus#1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric prometheus#2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering prometheus#2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics prometheus#2123
* [ENHANCEMENT] Add DMI collector prometheus#2131
* [ENHANCEMENT] Add threads metrics to processes collector prometheus#2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector prometheus#2169
* [BUGFIX] ethtool: Sanitize metric names prometheus#2093
* [BUGFIX] Fix ethtool collector for multiple interfaces prometheus#2126
* [BUGFIX] Fix possible panic on macOS prometheus#2133
* [BUGFIX] Collect flag_info and bug_info only for one core prometheus#2156

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling.

Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector prometheus#2146
* [CHANGE] Exclude filesystems under /run/credentials prometheus#2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ prometheus#1771
* [FEATURE] Add darwin powersupply collector prometheus#1777
* [FEATURE] Add support for monitoring GPUs on Linux prometheus#1998
* [FEATURE] Add Darwin thermal collector prometheus#2032
* [FEATURE] Add os release collector prometheus#2094
* [FEATURE] Add netdev.address-info collector prometheus#2105
* [ENHANCEMENT] Support glob textfile collector directories prometheus#1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric prometheus#2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering prometheus#2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics prometheus#2123
* [ENHANCEMENT] Add DMI collector prometheus#2131
* [ENHANCEMENT] Add threads metrics to processes collector prometheus#2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector prometheus#2169
* [BUGFIX] ethtool: Sanitize metric names prometheus#2093
* [BUGFIX] Fix ethtool collector for multiple interfaces prometheus#2126
* [BUGFIX] Fix possible panic on macOS prometheus#2133
* [BUGFIX] Collect flag_info and bug_info only for one core prometheus#2156

Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants