Skip to content

Feature: NVML encode/decode % included in nvidia GPU usage metric#2

Merged
owaindjones merged 5 commits into
mainfrom
feat/nvml-encoder-decoder-metrics
Apr 30, 2026
Merged

Feature: NVML encode/decode % included in nvidia GPU usage metric#2
owaindjones merged 5 commits into
mainfrom
feat/nvml-encoder-decoder-metrics

Conversation

@owaindjones
Copy link
Copy Markdown
Owner

No description provided.

…of-three

Query NVML encoder_utilization() and decoder_utilization() alongside
existing utilization_rates() for GPU compute. Combine all three into a
single composite percentage using max()-of-three approach, ensuring the
device appears busy when any engine (compute, encode, decode) is active.
This prevents premature sleep inhibition release during video encoding/
decoding workloads that would otherwise show zero GPU compute usage.
NVML utilization_rates() reports SM busy percentage at the current
clock speed, not normalized to max rated frequency. A GPU running
200MHz at 100% is effectively ~6% loaded vs its 3200MHz peak —
the same principle as CPU freq-weighting (cpu.rs:404-436).

Apply identical formula for NVIDIA GPUs:
  effective_max = max(current_freq, rated_max, observed_peak)
  weighted_usage = raw_gpu_pct * (current_freq / effective_max)

Track per-card peak observed frequency to handle turbo boost beyond
rated maximums. Encoder/decoder engines run at fixed clocks and are
combined via max() without freq weighting. AMD/Intel remain unchanged
(their sysfs gpu_busy_percent is already normalized by the kernel).
…nc config defaults

- Remove redundant debug lines from cpu.rs (core count), gpu.rs (NVML init,
  enumeration count, per-card lines)
- Rewrite network.rs collect() to return NetworkThroughput with per-interface
  Mbps breakdown; remove 'Skipping excluded interface' debug line
- Update disk.rs collect() to return DiskThroughput with interval + total +
  per-device MB/s breakdown using delta tracking in loop
- Add GpuDisplayEntry, sorted_gpu_display(), gpu/display_string(), network/
  disk display formatters in metrics/mod.rs
- Create TickMetrics struct carrying full throughput alongside Metrics
- Refactor service.rs tick() to use new types; GPU log shows sorted entries
  with inline smoothed values per-GPU; Network/Disk show breakdowns
- Sync config defaults: total_threshold=25.0, gpu threshold=15.0, network/
  disk thresholds=10.0 across src/config.rs and all docs
… in display

HashMap iteration order is non-deterministic, causing log output to
show different ordering on every tick. Sort key-value pairs by name
before formatting so each run produces identical alphabetical order.
- Implement 3-tier config merging (embedded -> /etc/rouser -> $XDG_CONFIG_HOME)
- Auto-install default config on first startup via load_merged()
- --print-config serializes merged TOML instead of raw embedded string
- Fix tracing init order so auto-install INFO log is captured at startup
- Remove which crate dependency and NOTIFY_SOCKET warning (not used)
- Update install.sh to respect $XDG_BIN_HOME, $XDG_CONFIG_HOME
- Remove config file copying from install script (app creates it at startup)
- Remove gh CLI dependency from install script (curl works fine)
- Fix systemd service path: extracted/systemd/rouser.service
- Correct maintainer name: Owain Jones in deb.sh, rpm.sh, pkgbuild.sh
- Add XDG Base Directory Compliance section to AGENTS.md
- Add vibecoded development note to README
@owaindjones owaindjones merged commit 48c7d59 into main Apr 30, 2026
2 checks passed
@owaindjones owaindjones deleted the feat/nvml-encoder-decoder-metrics branch April 30, 2026 21:45
owaindjones added a commit that referenced this pull request May 2, 2026
Bug #1: 'Predictive cooldown extension' info log fired on every tick
while extended cooldown was active because predicted_additional_time
was already set from a previous tick. Added check for
predicted_additional_time.is_zero() so the message only logs once per
transition into below-threshold state, matching how 'Sleep inhibited'
logs only fire on state transitions.

Bug #2: Predictive cooldown extension had no effect — inhibition was
released after base cooldown_duration (10s) instead of respecting the
predicted +1028s extension. The release logic checked plain
cooldown_duration first and released before reaching the predictive
branch. Replaced two-branch logic with single path using
std::cmp::max(cooldown_duration, predicted_additional_time) so the
prediction always extends (not replaces) the base cooldown period.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant