Skip to content

node_exporter_metrics: Add file path to error messages#11839

Open
rossigee wants to merge 1 commit into
fluent:masterfrom
rossigee:improve-node-exporter-errors
Open

node_exporter_metrics: Add file path to error messages#11839
rossigee wants to merge 1 commit into
fluent:masterfrom
rossigee:improve-node-exporter-errors

Conversation

@rossigee
Copy link
Copy Markdown

@rossigee rossigee commented May 22, 2026

Problem: When the node_exporter_metrics input plugin fails to read hardware sensors (hwmon, thermal zones, cpufreq), the error logs only show:

[error] [ne_utils.c:132 errno=61] No data available
[error] [ne_utils.c:132 errno=6] No such device or address

This is useless for debugging because users have no way to know which file/path is missing.

Solution: Add the file path to error messages in ne_utils_file_read_uint64, ne_utils_file_read_sds, and ne_utils_file_read_lines functions.

Example after fix:

[error] [input:node_exporter_metrics:node_exporter_metrics.0] could not open '/sys/class/hwmon/hwmon4/fan1_label'
[error] [input:node_exporter_metrics:node_exporter_metrics.0] could not read from '/sys/class/thermal/thermal_zone6/temp'

Closes #11838

Summary by CodeRabbit

  • Refactor
    • Updated internal helper functions in the node exporter metrics plugin to improve code consistency and maintainability across multiple collectors (CPU, memory, network, disk, and thermal monitoring).

Review Change Stack

Improve error logs to indicate which file path failed when reading
metrics, making it easier to debug missing hardware sensors or
permissions issues.

Signed-off-by: Ross Golder <ross@golder.org>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR refactors three core filesystem-reading utility functions to accept a context pointer for improved error reporting. All 15 collector modules in the node exporter plugin are updated to pass the context at call sites, enabling better error diagnostics through plugin logging.

Changes

Utility functions context propagation

Layer / File(s) Summary
Core utility function signatures and error reporting
plugins/in_node_exporter_metrics/ne_utils.h, plugins/in_node_exporter_metrics/ne_utils.c
ne_utils_file_read_uint64, ne_utils_file_read_lines, and ne_utils_file_read_sds now accept struct flb_ne *ctx as the first parameter. Error handling uses flb_plg_error(ctx->ins, ...) when context is provided, otherwise falls back to flb_errno().
CPU and thermal zone metrics collection
plugins/in_node_exporter_metrics/ne_cpu_linux.c, plugins/in_node_exporter_metrics/ne_thermalzone_linux.c
Thermal zone reads (core ID, physical package ID, throttle counters) and zone/device numeric value reads are updated to pass ctx to utility functions.
CPU frequency metrics formatting
plugins/in_node_exporter_metrics/ne_cpufreq_linux.c
Blank-line formatting is added before each frequency metric read statement for improved code organization.
Storage and file descriptor metrics
plugins/in_node_exporter_metrics/ne_diskstats_linux.c, plugins/in_node_exporter_metrics/ne_filefd_linux.c, plugins/in_node_exporter_metrics/ne_stat_linux.c, plugins/in_node_exporter_metrics/ne_meminfo_linux.c
File-read return values are explicitly assigned to ret before error checks in diskstats, filefd, and stat; memory info reads are adjusted in configure and update paths.
Network interface and protocol statistics
plugins/in_node_exporter_metrics/ne_netdev_linux.c, plugins/in_node_exporter_metrics/ne_netstat_linux.c, plugins/in_node_exporter_metrics/ne_sockstat_linux.c
Network device (/net/dev), protocol statistics (/net/snmp), and socket counts (/net/sockstat, /net/sockstat6) reads are updated to include context.
Hardware monitoring sensor values
plugins/in_node_exporter_metrics/ne_hwmon_linux.c
Sensor input values, temperature thresholds (max/crit), and chip name reads are updated to pass ctx to utility functions.
Process and system load metrics
plugins/in_node_exporter_metrics/ne_processes_linux.c, plugins/in_node_exporter_metrics/ne_loadavg_linux.c, plugins/in_node_exporter_metrics/ne_vmstat_linux.c
Thread/process stat reads, kernel sysfs values (threads-max, pid_max), load average, and virtual memory stats are updated to pass context; return value assignment is corrected in loadavg.
NVMe device entry values
plugins/in_node_exporter_metrics/ne_nvme_linux.c
NVMe entry value reads are updated to pass ctx as the first argument to ne_utils_file_read_lines.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

  • fluent/fluent-bit#10719: Updates ne_hwmon_linux.c call sites to match the refactored ne_utils_file_read_* helper signatures.
  • fluent/fluent-bit#11253: Modifies ne_utils_file_read_lines buffer size; both PRs touch the same function with related but orthogonal changes.

Suggested labels

backport to v4.0.x

Suggested reviewers

  • edsiper
  • cosmo0920

Poem

🐰 Contexts flow through utilities bright,
Error paths now log the light,
Collectors dance in rhythmic grace,
Context passed from place to place! 🔍

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the pull request: adding file paths to error messages in the node_exporter_metrics plugin to improve debugging.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44bf2d2665

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +139 to +141
if (ctx) {
flb_plg_error(ctx->ins, "could not read from '%s'", p);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Log path before freeing it in uint64 read errors

In ne_utils_file_read_uint64, the p buffer is destroyed immediately after a successful open, but the new read-error log path still formats %s with p when read() returns -1. That makes the error path dereference freed memory (undefined behavior), which can produce corrupted logs or crash under allocator checks when I/O errors occur (e.g., transient EIO on sysfs/procfs reads).

Useful? React with 👍 / 👎.

Comment on lines +284 to +286
if (ctx) {
flb_plg_error(ctx->ins, "could not read from '%s'", p);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid use-after-free in SDS read error logging

The same pattern exists in ne_utils_file_read_sds: p is freed right after open() succeeds, then reused in flb_plg_error(..., "%s", p) if read() fails. This introduces a use-after-free in an error path, so a read failure can trigger undefined behavior instead of safe diagnostic logging.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/in_node_exporter_metrics/ne_utils.c`:
- Around line 135-141: The code frees the SDS string p via flb_sds_destroy(p)
before calling read(fd, ...) and then uses p in the error log
flb_plg_error(ctx->ins, "could not read from '%s'", p), leading to a
use-after-free; to fix, delay calling flb_sds_destroy(p) until after the read()
result is handled (i.e., after the if (bytes == -1) block and any flb_plg_error
that references p), and apply the same change to the second identical block
around the other read() (the one at the 280-286 region) so both read/error paths
log p while it remains valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 780dcfd7-8ada-4a79-b714-4c89712daf30

📥 Commits

Reviewing files that changed from the base of the PR and between f6126eb and 44bf2d2.

📒 Files selected for processing (17)
  • plugins/in_node_exporter_metrics/ne_cpu_linux.c
  • plugins/in_node_exporter_metrics/ne_cpufreq_linux.c
  • plugins/in_node_exporter_metrics/ne_diskstats_linux.c
  • plugins/in_node_exporter_metrics/ne_filefd_linux.c
  • plugins/in_node_exporter_metrics/ne_hwmon_linux.c
  • plugins/in_node_exporter_metrics/ne_loadavg_linux.c
  • plugins/in_node_exporter_metrics/ne_meminfo_linux.c
  • plugins/in_node_exporter_metrics/ne_netdev_linux.c
  • plugins/in_node_exporter_metrics/ne_netstat_linux.c
  • plugins/in_node_exporter_metrics/ne_nvme_linux.c
  • plugins/in_node_exporter_metrics/ne_processes_linux.c
  • plugins/in_node_exporter_metrics/ne_sockstat_linux.c
  • plugins/in_node_exporter_metrics/ne_stat_linux.c
  • plugins/in_node_exporter_metrics/ne_thermalzone_linux.c
  • plugins/in_node_exporter_metrics/ne_utils.c
  • plugins/in_node_exporter_metrics/ne_utils.h
  • plugins/in_node_exporter_metrics/ne_vmstat_linux.c

Comment on lines 135 to +141
flb_sds_destroy(p);

bytes = read(fd, &tmp, sizeof(tmp));
if (bytes == -1) {
flb_errno();
if (ctx) {
flb_plg_error(ctx->ins, "could not read from '%s'", p);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Use-after-free in read error logging path

Line 135 and Line 280 free p before read(), but Line 140 and Line 285 still log %s using p on read failure. Keep p alive until after read/error handling.

Proposed fix
@@
-    flb_sds_destroy(p);
-
     bytes = read(fd, &tmp, sizeof(tmp));
     if (bytes == -1) {
         if (ctx) {
             flb_plg_error(ctx->ins, "could not read from '%s'", p);
         }
@@
         }
         close(fd);
+        flb_sds_destroy(p);
         return -1;
     }
     close(fd);
+    flb_sds_destroy(p);
@@
-    flb_sds_destroy(p);
-
     bytes = read(fd, &tmp, sizeof(tmp));
     if (bytes == -1) {
         if (ctx) {
             flb_plg_error(ctx->ins, "could not read from '%s'", p);
         }
@@
         }
         close(fd);
+        flb_sds_destroy(p);
         return -1;
     }
     close(fd);
+    flb_sds_destroy(p);

Also applies to: 280-286

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/in_node_exporter_metrics/ne_utils.c` around lines 135 - 141, The code
frees the SDS string p via flb_sds_destroy(p) before calling read(fd, ...) and
then uses p in the error log flb_plg_error(ctx->ins, "could not read from '%s'",
p), leading to a use-after-free; to fix, delay calling flb_sds_destroy(p) until
after the read() result is handled (i.e., after the if (bytes == -1) block and
any flb_plg_error that references p), and apply the same change to the second
identical block around the other read() (the one at the 280-286 region) so both
read/error paths log p while it remains valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

node_exporter_metrics: Error messages don't include file path, making debugging impossible

1 participant