Skip to content

refactor(obs): migrate metrics runtime/schema and tighten migration guards#2584

Merged
houseme merged 22 commits intomainfrom
refactor/improve-metrics-system
Apr 18, 2026
Merged

refactor(obs): migrate metrics runtime/schema and tighten migration guards#2584
houseme merged 22 commits intomainfrom
refactor/improve-metrics-system

Conversation

@houseme
Copy link
Copy Markdown
Contributor

@houseme houseme commented Apr 17, 2026

Type of Change

  • New Feature
  • Bug Fix
  • Documentation
  • Performance Improvement
  • Test/CI
  • Refactor
  • Other:

Related Issues

Summary of Changes

This PR completes the metrics migration by moving runtime/schema responsibilities to rustfs-obs, removing the legacy
rustfs-metrics crate, and tightening migration guardrails.

Additional fixes included after review:

  • Fixed parse_kv_u64 to skip non key:value lines instead of aborting early.
  • Scoped run_command/parse_kv_u64 with platform cfgs to avoid dead_code warnings.
  • Fixed DistributedLockGuard::disarm() to release held-lock counters when disarming.
  • Removed obsolete CI jobs referencing the removed rustfs-metrics package.

Checklist

  • I have read and followed the CONTRIBUTING.md guidelines
  • Passed make pre-commit
  • Added/updated necessary tests
  • Documentation updated (if needed)
  • CI/CD passed (if applicable)

Impact

  • Breaking change (compatibility)
  • Requires doc/config/deployment update
  • Other impact:

Additional Notes


Thank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md). If this is your first contribution, review the CLA document and sign it by commenting I have read and agree to the CLA. on the PR.

Copilot AI review requested due to automatic review settings April 17, 2026 19:50
@github-actions
Copy link
Copy Markdown
Contributor

CLA requirements are satisfied for this pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

Scorecard details
PackageVersionScoreDetails
cargo/aws-sdk-sts 1.102.0 UnknownUnknown
cargo/dial9-macro 0.3.0 UnknownUnknown
cargo/dial9-tokio-telemetry 0.3.0 UnknownUnknown
cargo/dial9-trace-format 0.3.0 UnknownUnknown
cargo/dial9-trace-format-derive 0.3.0 UnknownUnknown
cargo/libbz2-rs-sys 0.2.3 UnknownUnknown
cargo/symbolic-common 12.18.1 🟢 5.3
Details
CheckScoreReason
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Maintained🟢 1024 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Code-Review🟢 6Found 19/30 approved changesets -- score normalized to 6
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts⚠️ 0binaries present in source code
Pinned-Dependencies🟢 10all dependencies are pinned
Fuzzing🟢 10project is fuzzed
License🟢 10license file detected
Branch-Protection🟢 4branch protection is not maximal on development and all release branches
Signed-Releases⚠️ 0Project has not signed or included provenance with any releases.
Security-Policy🟢 10security policy file detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
cargo/symbolic-demangle 12.18.1 🟢 5.3
Details
CheckScoreReason
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Maintained🟢 1024 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Code-Review🟢 6Found 19/30 approved changesets -- score normalized to 6
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts⚠️ 0binaries present in source code
Pinned-Dependencies🟢 10all dependencies are pinned
Fuzzing🟢 10project is fuzzed
License🟢 10license file detected
Branch-Protection🟢 4branch protection is not maximal on development and all release branches
Signed-Releases⚠️ 0Project has not signed or included provenance with any releases.
Security-Policy🟢 10security policy file detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
cargo/symlink 0.1.0 🟢 5.3
Details
CheckScoreReason
License⚠️ -1internal error: RepoClient.ListLicenses: error during licensesHandler.setup: couldn't parse gitlab repo license url:
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Signed-Releases⚠️ -1no releases found
Code-Review⚠️ 0found 5 unreviewed changesets out of 5 -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Pinned-Dependencies⚠️ -1no dependencies found
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Security-Policy⚠️ 0security policy file not detected
Maintained🟢 100 commit(s) and 144 issue activity found in the last 90 days -- score normalized to 10
Fuzzing⚠️ 0project is not fuzzed
cargo/tracing-appender 0.2.5 🟢 6.1
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1011 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 9security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
cargo/wasip2 1.0.3+wasi-0.2.9 UnknownUnknown
cargo/wit-bindgen 0.57.1 UnknownUnknown

Scanned Files

  • Cargo.lock

@houseme houseme enabled auto-merge April 17, 2026 19:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands RustFS observability by adding cross-platform process/lock statistics gathering and introducing per-target delivery metrics for audit/notification targets, then wiring those into the metrics collectors.

Changes:

  • Add a cross-platform process metrics provider (platform snapshots + lock-held counters) and collect process resource + process stats in a single sysinfo refresh.
  • Add per-target delivery counters/snapshots to notification targets (webhook/mqtt) and propagate “final failure” signals through notification/audit pipelines.
  • Add Prometheus metric descriptors + collectors for notification target metrics and schedule periodic audit/notification metrics collection tasks.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/targets/src/target/webhook.rs Add shared delivery counters and expose delivery snapshots/failure recording.
crates/targets/src/target/mqtt.rs Add shared delivery counters and expose delivery snapshots/failure recording.
crates/targets/src/target/mod.rs Introduce delivery snapshot/counters types; add Dropped handling + trait hooks for metrics.
crates/targets/src/lib.rs Re-export TargetDeliverySnapshot for downstream metrics usage.
crates/targets/src/error.rs Add TargetError::Dropped for invalid queued payloads.
crates/notify/src/stream.rs Treat dropped payloads as non-retryable; record final failures for permanent/max-retry cases.
crates/notify/src/notifier.rs Thread shared NotificationMetrics through notifier and record skipped/processed/failed counters.
crates/notify/src/lib.rs Export new global snapshot APIs + metric snapshot types.
crates/notify/src/integration.rs Add skipped counters + snapshot structs; add per-target metrics snapshot API.
crates/notify/src/global.rs Provide global snapshot functions for aggregate + per-target notification metrics.
crates/metrics/src/metrics_type/notification_target.rs Define metric descriptors for per-target notification delivery metrics.
crates/metrics/src/metrics_type/mod.rs Register notification_target metrics type module.
crates/metrics/src/metrics_type/logger_webhook.rs Remove legacy logger webhook metric descriptors.
crates/metrics/src/metrics_type/entry/subsystem.rs Remove LoggerWebhook subsystem entries.
crates/metrics/src/metrics_type/entry/metric_name.rs Add notification-target metric names; remove legacy webhook metric names.
crates/metrics/src/metrics_type/cluster_notification.rs Fix current-send-in-progress descriptor to be a gauge; update skipped description text.
crates/metrics/src/constants/mod.rs Add env/config defaults for audit + notification metrics intervals.
crates/metrics/src/collectors/stats_collector.rs Collect resource + process stats together; incorporate platform + lock snapshots.
crates/metrics/src/collectors/notification_target.rs New collector to render per-target notification metrics to Prometheus output.
crates/metrics/src/collectors/mod.rs Register notification-target collector and remove logger webhook collector exports.
crates/metrics/src/collectors/logger_webhook.rs Remove legacy webhook metrics collector.
crates/metrics/src/collectors/global.rs Schedule periodic audit + notification delivery metrics collection; emit new process metrics.
crates/metrics/Cargo.toml Add deps on audit/notify/io-metrics for new collectors and process stats.
crates/lock/src/fast_lock/guard.rs Record lock-held acquire/release events into io-metrics.
crates/lock/src/distributed_lock.rs Record lock-held acquire/release events for distributed locks.
crates/lock/Cargo.toml Add dependency on rustfs-io-metrics.
crates/io-metrics/src/process_lock_metrics.rs New cross-platform platform snapshot + lock-held counters implementation.
crates/io-metrics/src/lib.rs Export process lock/platform snapshot APIs.
crates/audit/src/system.rs Add dropped handling + record final failures; add per-target audit delivery snapshot API/type.
crates/audit/src/lib.rs Re-export AuditTargetMetricSnapshot.
crates/audit/src/global.rs Provide global async snapshot API for audit target metrics.
Cargo.lock Lockfile updates for new intra-workspace dependencies.

Comment on lines +87 to +89
let (k, v) = line.split_once(':')?;
if k.trim().eq_ignore_ascii_case(key) {
return v.trim().parse::<u64>().ok();
Comment on lines +99 to 103
pub(crate) fn new(lock_id: LockId, entries: Vec<(LockId, Arc<dyn LockClient>)>, lock_type: LockType) -> Self {
record_lock_held_acquire(lock_type);
Self {
lock_id,
entries,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. disarm() now records a held-lock release once when transitioning to disarmed.

Comment on lines +245 to +255
// Spawn task for audit target delivery metrics
let token_clone = token.clone();
tokio::spawn(async move {
let mut interval = tokio::time::interval(audit_interval);
loop {
tokio::select! {
_ = interval.tick() => {
let stats = audit_target_metrics().await
.into_iter()
.map(|snapshot| AuditTargetStats {
failed_messages: snapshot.failed_messages,
Copilot AI review requested due to automatic review settings April 18, 2026 06:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Prometheus-style metrics runtime and schema into rustfs-obs, adds a cross-platform process stats sampling provider in rustfs-io-metrics, and wires per-target delivery counters into notification/audit targets for richer observability.

Changes:

  • Move metrics scheduling/reporting and metric descriptors into crates/obs/src/metrics/* and update rustfs to initialize metrics via rustfs_obs::init_metrics_runtime.
  • Add per-target delivery counters/snapshots for MQTT/Webhook targets and plumb final-failure recording through notify/audit processing loops.
  • Introduce cross-platform process/system snapshot collection in rustfs-io-metrics and instrument lock guards to track locks held.

Reviewed changes

Copilot reviewed 71 out of 99 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/check_metrics_migration_refs.sh Adds a ripgrep-based guard script to prevent reintroducing legacy metrics references during migration.
rustfs/src/main.rs Switches metrics initialization to rustfs_obs::init_metrics_runtime.
rustfs/src/config/info.rs Updates feature dependency string from rustfs-metrics/gpu to rustfs-obs/gpu.
rustfs/Cargo.toml Removes rustfs-metrics dependency; routes metrics-gpu feature to rustfs-obs/gpu.
crates/targets/src/target/webhook.rs Adds delivery counters and exposes delivery snapshots/final-failure recording for webhook targets.
crates/targets/src/target/mqtt.rs Adds delivery counters and exposes delivery snapshots/final-failure recording for MQTT targets.
crates/targets/src/target/mod.rs Introduces delivery snapshot/counter types and propagates “dropped payload” as an error with metrics hooks.
crates/targets/src/lib.rs Re-exports TargetDeliverySnapshot for downstream metrics collection.
crates/targets/src/error.rs Adds TargetError::Dropped for invalid queued payloads.
crates/obs/src/metrics/stats_collector.rs Switches process stats collection to rustfs-io-metrics snapshots and splits resource vs system stats.
crates/obs/src/metrics/schema/system_process.rs Adds metric descriptors for process/system metrics under the obs schema.
crates/obs/src/metrics/schema/system_network.rs Adds metric descriptors for internode and process network metrics.
crates/obs/src/metrics/schema/system_memory.rs Adds metric descriptors for system memory metrics.
crates/obs/src/metrics/schema/system_gpu.rs Adds GPU metric descriptor(s) behind the obs gpu feature.
crates/obs/src/metrics/schema/system_drive.rs Adds system drive metric descriptors/labels used by drive collectors.
crates/obs/src/metrics/schema/system_cpu.rs Adds system CPU metric descriptors.
crates/obs/src/metrics/schema/scanner.rs Adds scanner metric descriptors.
crates/obs/src/metrics/schema/request.rs Adds API request metric descriptors.
crates/obs/src/metrics/schema/replication.rs Adds replication metric descriptors.
crates/obs/src/metrics/schema/process_resource.rs Adds resource (cpu/memory/uptime) metric descriptors under /process.
crates/obs/src/metrics/schema/notification_target.rs Adds per-notification-target delivery metric descriptors.
crates/obs/src/metrics/schema/node_disk.rs Adds node disk metric descriptors.
crates/obs/src/metrics/schema/node_bucket.rs Adds node/bucket quota/usage metric descriptors.
crates/obs/src/metrics/schema/mod.rs Wires new schema modules and removes the legacy logger_webhook schema module.
crates/obs/src/metrics/schema/ilm.rs Adds ILM metric descriptors.
crates/obs/src/metrics/schema/entry/subsystem.rs Removes the LoggerWebhook subsystem and keeps subsystem path mappings consistent.
crates/obs/src/metrics/schema/entry/path_utils.rs Adds helper to format metric subsystem paths and includes unit tests.
crates/obs/src/metrics/schema/entry/namespace.rs Adds MetricNamespace type used for full metric name generation.
crates/obs/src/metrics/schema/entry/mod.rs Adds descriptor factory helpers (counter/gauge/histogram) and unit tests.
crates/obs/src/metrics/schema/entry/metric_type.rs Adds MetricType enum for schema descriptors.
crates/obs/src/metrics/schema/entry/metric_name.rs Extends metric name enum to include notification target delivery metrics; removes webhook log metric names.
crates/obs/src/metrics/schema/entry/descriptor.rs Adds MetricDescriptor implementation + tests for full metric name conventions.
crates/obs/src/metrics/schema/cluster_usage.rs Adds cluster usage metric descriptors.
crates/obs/src/metrics/schema/cluster_notification.rs Adjusts notification metrics descriptor types/help text (e.g., in-progress is a gauge).
crates/obs/src/metrics/schema/cluster_iam.rs Adds IAM metric descriptors.
crates/obs/src/metrics/schema/cluster_health.rs Adds cluster health metric descriptors.
crates/obs/src/metrics/schema/cluster_erasure_set.rs Adds erasure-set metric descriptors.
crates/obs/src/metrics/schema/cluster_config.rs Adds cluster config metric descriptors.
crates/obs/src/metrics/schema/cluster.rs Adds base cluster capacity/object metric descriptors.
crates/obs/src/metrics/schema/bucket_replication.rs Adds bucket replication metric descriptors.
crates/obs/src/metrics/schema/bucket.rs Adds bucket API metric descriptors.
crates/obs/src/metrics/schema/audit.rs Adds audit delivery metric descriptors.
crates/obs/src/metrics/scheduler.rs Renames/extends runtime scheduling, adds audit/notification collection tasks, and keeps a compatibility alias.
crates/obs/src/metrics/report.rs Refactors reporting to use the new schema module types.
crates/obs/src/metrics/mod.rs Introduces the metrics module tree and re-exports runtime/schema/report APIs.
crates/obs/src/metrics/config.rs Adds audit/notification interval env vars + defaults.
crates/obs/src/metrics/collectors/system_process.rs Updates imports to new schema/report paths for process collectors.
crates/obs/src/metrics/collectors/system_network.rs Updates imports to new schema/report paths for network collectors.
crates/obs/src/metrics/collectors/system_memory.rs Updates imports to new schema/report paths for memory collectors.
crates/obs/src/metrics/collectors/system_gpu.rs Updates docs/imports for GPU collectors under rustfs_obs::metrics.
crates/obs/src/metrics/collectors/system_drive.rs Updates imports to new schema/report paths for drive collectors.
crates/obs/src/metrics/collectors/system_cpu.rs Updates imports to new schema/report paths for CPU collectors.
crates/obs/src/metrics/collectors/scanner.rs Updates imports to new schema/report paths for scanner collectors.
crates/obs/src/metrics/collectors/resource.rs Updates imports to new schema/report paths for resource collectors.
crates/obs/src/metrics/collectors/request.rs Updates imports to new schema/report paths for request collectors.
crates/obs/src/metrics/collectors/replication.rs Updates imports to new schema/report paths for replication collectors.
crates/obs/src/metrics/collectors/notification_target.rs Adds collector for per-target notification delivery metrics (+ unit test).
crates/obs/src/metrics/collectors/notification.rs Updates imports to new schema/report paths for notification collectors.
crates/obs/src/metrics/collectors/node.rs Updates imports to new schema/report paths for node collectors.
crates/obs/src/metrics/collectors/mod.rs Restructures collectors module visibility/exports and adds notification target collector exports.
crates/obs/src/metrics/collectors/ilm.rs Updates imports to new schema/report paths for ILM collector.
crates/obs/src/metrics/collectors/dial9.rs Updates PrometheusMetric import path for dial9 collector.
crates/obs/src/metrics/collectors/cluster_usage.rs Updates imports to new schema/report paths for cluster usage collector.
crates/obs/src/metrics/collectors/cluster_iam.rs Updates imports to new schema/report paths for IAM collector.
crates/obs/src/metrics/collectors/cluster_health.rs Updates imports to new schema/report paths for cluster health collector.
crates/obs/src/metrics/collectors/cluster_erasure_set.rs Updates imports to new schema/report paths for erasure-set collector.
crates/obs/src/metrics/collectors/cluster_config.rs Updates imports to new schema/report paths for cluster config collector.
crates/obs/src/metrics/collectors/cluster.rs Updates imports to new schema/report paths for cluster collector.
crates/obs/src/metrics/collectors/bucket_replication.rs Updates imports to new schema/report paths for bucket replication collector.
crates/obs/src/metrics/collectors/bucket.rs Updates imports to new schema/report paths for bucket collector.
crates/obs/src/metrics/collectors/audit.rs Updates imports to new schema/report paths for audit collector.
crates/obs/src/lib.rs Exposes pub mod metrics, re-exports schema/runtime APIs, and updates docs to reference init_metrics_runtime.
crates/obs/src/global.rs Updates comment to reflect metrics runtime is owned by rustfs_obs.
crates/obs/Cargo.toml Adds dependencies needed for metrics runtime (audit/notify/ecstore/io-metrics/sysinfo) and introduces gpu feature.
crates/notify/src/stream.rs Handles TargetError::Dropped and records per-target final failures.
crates/notify/src/notifier.rs Plumbs NotificationMetrics into EventNotifier and increments skipped/processing/failed counters.
crates/notify/src/lib.rs Exposes notification metric snapshot APIs from the global module.
crates/notify/src/integration.rs Adds snapshot structs and snapshot methods for aggregate/per-target notification delivery metrics.
crates/notify/src/global.rs Adds global snapshot helpers for notification metrics and per-target delivery metrics.
crates/lock/src/fast_lock/guard.rs Instruments fast lock guard acquisition/release using rustfs-io-metrics lock-held counters.
crates/lock/src/distributed_lock.rs Instruments distributed lock guard acquisition/release using rustfs-io-metrics lock-held counters.
crates/lock/Cargo.toml Adds rustfs-io-metrics dependency for lock-held instrumentation.
crates/io-metrics/src/sampler/system.rs Simplifies system sampler surface to a process-platform snapshot helper.
crates/io-metrics/src/sampler/process.rs Adds unified resource + system process snapshot sampling using sysinfo + platform-specific fallbacks.
crates/io-metrics/src/sampler/mod.rs Adds sampler module exports for process/system snapshot APIs.
crates/io-metrics/src/process_lock_metrics.rs Adds lock-held counters + cross-platform platform stats snapshot implementation.
crates/io-metrics/src/lib.rs Re-exports new sampler/platform snapshot APIs and lock-held counters.
crates/io-metrics/Cargo.toml Adds sysinfo dependency required for process sampling.
crates/audit/src/system.rs Handles TargetError::Dropped, records per-target final failures, and adds per-target metric snapshot support.
crates/audit/src/lib.rs Re-exports AuditTargetMetricSnapshot.
crates/audit/src/global.rs Adds a global audit_target_metrics() accessor for per-target delivery metrics.
Cargo.toml Removes crates/metrics from workspace members and removes rustfs-metrics workspace dependency.
Cargo.lock Updates lockfile for dependency graph changes (including removal of rustfs-metrics).
ARCHITECTURE.md Updates documentation to reflect metrics runtime/schema live in rustfs-obs.
.github/workflows/ci.yml Adds metrics migration guard job and a metrics contract snapshot job (currently pointing at removed rustfs-metrics).
crates/metrics/src/metrics_type/logger_webhook.rs Deletes legacy metrics descriptor module from removed rustfs-metrics crate.
crates/metrics/src/global.rs Deletes legacy metrics init entrypoint from removed rustfs-metrics crate.
crates/metrics/src/collectors/logger_webhook.rs Deletes legacy webhook collector from removed rustfs-metrics crate.
crates/metrics/Cargo.toml Removes the rustfs-metrics crate package definition.

Comment thread .github/workflows/ci.yml Outdated
Comment on lines +105 to +127
metrics-contract-snapshot:
name: Metrics Contract Snapshot
needs: skip-check
if: needs.skip-check.outputs.should_skip != 'true'
runs-on: ubicloud-standard-2
timeout-minutes: 20
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
steps:
- name: Checkout repository
uses: actions/checkout@v6

- name: Setup Rust environment
uses: ./.github/actions/setup
with:
rust-version: stable
cache-shared-key: ci-metrics-contract-${{ hashFiles('**/Cargo.lock') }}
github-token: ${{ secrets.GITHUB_TOKEN }}
cache-save-if: ${{ github.ref == 'refs/heads/main' }}

- name: Verify metrics contract snapshot
run: cargo test -p rustfs-metrics --test metric_contract_snapshot

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed obsolete metrics migration/snapshot jobs that referenced the deleted rustfs-metrics crate.

Comment on lines +74 to +82
fn run_command(command: &str, args: &[&str]) -> Option<String> {
let output = Command::new(command).args(args).output().ok()?;
if !output.status.success() {
return None;
}

let stdout = String::from_utf8_lossy(&output.stdout).trim().to_string();
if stdout.is_empty() { None } else { Some(stdout) }
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. run_command and Command import are now gated to the platforms that use them.

Comment on lines +84 to +93
#[cfg(any(target_os = "linux", target_os = "windows", test))]
fn parse_kv_u64(content: &str, key: &str) -> Option<u64> {
for line in content.lines() {
let (k, v) = line.split_once(':')?;
if k.trim().eq_ignore_ascii_case(key) {
return v.trim().parse::<u64>().ok();
}
}
None
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. parse_kv_u64 now skips non-matching lines (continue) instead of returning early.

@houseme houseme changed the title feat(metrics): add cross-platform process stats provider refactor(obs): migrate metrics runtime/schema and tighten migration guards Apr 18, 2026
Copilot AI review requested due to automatic review settings April 18, 2026 07:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the metrics migration by moving metrics runtime scheduling + schema/descriptor ownership into rustfs-obs, removing the legacy rustfs-metrics crate, and adding stronger migration guardrails. It also extends observability/metrics coverage for notification/audit target delivery and process/lock metrics.

Changes:

  • Migrate metrics runtime initialization from rustfs-metrics to rustfs-obs (init_metrics_runtime) and remove the crates/metrics crate from the workspace.
  • Add delivery metrics plumbing for notification/audit targets (per-target counters + snapshots) and integrate them into the rustfs-obs scheduler.
  • Add process/lock sampling utilities in rustfs-io-metrics and wire lock-held counters into fast/distributed locks.

Reviewed changes

Copilot reviewed 70 out of 98 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/check_metrics_migration_refs.sh Adds ripgrep-based guard to block reintroduction of legacy metrics references (with allowlist).
rustfs/src/main.rs Switches binary startup from init_metrics_system to rustfs_obs::init_metrics_runtime.
rustfs/src/config/info.rs Removes direct-io feature from reported feature list and updates tests; updates metrics-gpu dependency label.
rustfs/Cargo.toml Removes rustfs-metrics dependency and updates feature wiring (default features, metrics-gpu, full).
crates/targets/src/target/webhook.rs Adds per-target delivery counters and records success/final-failure outcomes for webhook target.
crates/targets/src/target/mqtt.rs Adds per-target delivery counters and records success/final-failure outcomes for MQTT target.
crates/targets/src/target/mod.rs Introduces TargetDeliverySnapshot/Counters and changes corrupt queued payload handling to TargetError::Dropped.
crates/targets/src/lib.rs Re-exports TargetDeliverySnapshot for downstream metric collection.
crates/targets/src/error.rs Adds TargetError::Dropped to represent permanently dropped queued payloads.
crates/obs/src/metrics/stats_collector.rs Replaces direct sysinfo sampling with rustfs-io-metrics process snapshot helpers and exposes process/system stat collectors.
crates/obs/src/metrics/schema/system_process.rs Adds system process metric descriptors (locks, CPU, fds, IO, status, GPU-related process metrics, etc.).
crates/obs/src/metrics/schema/system_network.rs Adds internode network metric descriptors.
crates/obs/src/metrics/schema/system_memory.rs Adds system memory metric descriptors.
crates/obs/src/metrics/schema/system_gpu.rs Adds GPU metric descriptor(s) behind feature use.
crates/obs/src/metrics/schema/system_drive.rs Adds drive metric descriptors and drive label constants.
crates/obs/src/metrics/schema/system_cpu.rs Adds system CPU metric descriptors.
crates/obs/src/metrics/schema/scanner.rs Adds scanner metric descriptors.
crates/obs/src/metrics/schema/request.rs Adds API request metric descriptors.
crates/obs/src/metrics/schema/replication.rs Adds replication metric descriptors.
crates/obs/src/metrics/schema/process_resource.rs Adds process resource metric descriptors (custom subsystem).
crates/obs/src/metrics/schema/notification_target.rs Adds per-notification-target delivery metric descriptors.
crates/obs/src/metrics/schema/node_disk.rs Adds node disk descriptor(s) (custom subsystem).
crates/obs/src/metrics/schema/node_bucket.rs Adds bucket usage/quota descriptor(s) (custom naming).
crates/obs/src/metrics/schema/mod.rs Registers schema modules and swaps logger webhook schema for notification target schema.
crates/obs/src/metrics/schema/ilm.rs Adds ILM metric descriptors.
crates/obs/src/metrics/schema/entry/subsystem.rs Removes legacy logger webhook subsystem and keeps canonical subsystem path->name formatting.
crates/obs/src/metrics/schema/entry/path_utils.rs Adds shared path formatting helper with unit tests.
crates/obs/src/metrics/schema/entry/namespace.rs Adds MetricNamespace for name generation.
crates/obs/src/metrics/schema/entry/mod.rs Adds descriptor factory helpers (counter/gauge/histogram) + tests.
crates/obs/src/metrics/schema/entry/metric_type.rs Adds MetricType enum used by descriptors and reporting.
crates/obs/src/metrics/schema/entry/metric_name.rs Adds notification-target metric names and removes webhook-log metric names.
crates/obs/src/metrics/schema/entry/descriptor.rs Adds MetricDescriptor (name/type/help/labels/subsystem) + tests for name generation.
crates/obs/src/metrics/schema/cluster_usage.rs Adds cluster usage metric descriptors.
crates/obs/src/metrics/schema/cluster_notification.rs Fixes notification in-progress metric to gauge and adjusts skipped-events help string.
crates/obs/src/metrics/schema/cluster_iam.rs Adds IAM metric descriptors.
crates/obs/src/metrics/schema/cluster_health.rs Adds cluster health metric descriptors.
crates/obs/src/metrics/schema/cluster_erasure_set.rs Adds erasure-set metric descriptors.
crates/obs/src/metrics/schema/cluster_config.rs Adds cluster config metric descriptors.
crates/obs/src/metrics/schema/cluster.rs Adds base cluster capacity/object/bucket descriptors (custom names).
crates/obs/src/metrics/schema/bucket_replication.rs Adds bucket replication metric descriptors.
crates/obs/src/metrics/schema/bucket.rs Adds per-bucket API metric descriptors (traffic, inflight, totals, errors, histogram).
crates/obs/src/metrics/schema/audit.rs Adds audit target delivery metric descriptors.
crates/obs/src/metrics/scheduler.rs Renames collector init to init_metrics_runtime, adds audit/notification metric tasks, and keeps a compatibility alias.
crates/obs/src/metrics/report.rs Moves reporting to use new schema types and keeps metrics crate integration for counters/gauges/histograms.
crates/obs/src/metrics/mod.rs Introduces metrics module boundary and re-exports runtime + schema/report entrypoints.
crates/obs/src/metrics/config.rs Adds env keys/default intervals for audit + notification metric scheduling.
crates/obs/src/metrics/collectors/system_process.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/system_network.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/system_memory.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/system_gpu.rs Updates docs/imports to new crate/module names and schema paths.
crates/obs/src/metrics/collectors/system_drive.rs Updates imports and descriptor typing to new schema module.
crates/obs/src/metrics/collectors/system_cpu.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/scanner.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/resource.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/request.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/replication.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/notification_target.rs Adds collector to emit per-target notification delivery metrics.
crates/obs/src/metrics/collectors/notification.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/node.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/mod.rs Makes collectors public modules, removes legacy webhook collector export, adds notification target collector export.
crates/obs/src/metrics/collectors/ilm.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/dial9.rs Updates PrometheusMetric import path.
crates/obs/src/metrics/collectors/cluster_usage.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/cluster_iam.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/cluster_health.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/cluster_erasure_set.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/cluster_config.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/cluster.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/bucket_replication.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/bucket.rs Updates imports to new schema/report module paths.
crates/obs/src/metrics/collectors/audit.rs Updates imports to new schema/report module paths.
crates/obs/src/lib.rs Publicly exposes metrics module, re-exports schema, and updates docs to reference init_metrics_runtime.
crates/obs/src/global.rs Updates comments to indicate metrics runtime scheduling is exposed via rustfs_obs.
crates/obs/Cargo.toml Adds dependencies needed for migrated metrics runtime/collectors (audit/notify/ecstore/io-metrics/sysinfo/tokio-util) and defines gpu feature.
crates/notify/src/stream.rs Handles TargetError::Dropped explicitly and records per-target final failures via Target::record_final_failure.
crates/notify/src/notifier.rs Threads NotificationMetrics into notifier, counts skipped events, and refines processed/failed accounting for deferred vs direct delivery.
crates/notify/src/lib.rs Re-exports global snapshot helpers and metric snapshot types.
crates/notify/src/integration.rs Adds snapshot types, adds skipped counter + snapshot helpers, and adds per-target delivery snapshot collection.
crates/notify/src/global.rs Adds global functions to snapshot aggregate and per-target notification delivery metrics.
crates/metrics/src/metrics_type/logger_webhook.rs Deletes legacy rustfs-metrics schema for webhook logs.
crates/metrics/src/global.rs Deletes legacy rustfs-metrics init entrypoint.
crates/metrics/src/collectors/logger_webhook.rs Deletes legacy rustfs-metrics webhook collector.
crates/metrics/Cargo.toml Removes legacy rustfs-metrics crate manifest.
crates/lock/src/fast_lock/guard.rs Records lock-held acquire/release counters via rustfs-io-metrics when acquiring/releasing fast locks.
crates/lock/src/distributed_lock.rs Records lock-held acquire/release counters and ensures disarm() decrements held-lock counters.
crates/lock/Cargo.toml Adds dependency on rustfs-io-metrics for lock-held counters.
crates/io-metrics/src/sampler/system.rs Refactors sampler module API to expose platform snapshot helper.
crates/io-metrics/src/sampler/process.rs Adds sysinfo-based process resource/system snapshot collection (including lock-held counters).
crates/io-metrics/src/sampler/mod.rs Adds sampler module exports for process + platform snapshots.
crates/io-metrics/src/process_lock_metrics.rs Adds lock-held counters and platform-specific process I/O/syscall/VM max sampling.
crates/io-metrics/src/lib.rs Re-exports new process lock/platform snapshot and sampler APIs.
crates/io-metrics/Cargo.toml Adds sysinfo dependency to support new samplers.
crates/audit/src/system.rs Adds per-target audit metric snapshot types and handles dropped payloads + records final failures.
crates/audit/src/lib.rs Re-exports AuditTargetMetricSnapshot.
crates/audit/src/global.rs Adds global async API to fetch audit target metrics for Prometheus collection.
Cargo.toml Removes crates/metrics from workspace members and removes rustfs-metrics workspace dependency entry.
Cargo.lock Updates lockfile for removed crate and dependency graph changes (including sysinfo + other transitive updates).
ARCHITECTURE.md Updates architecture docs to point metrics runtime/schema ownership to rustfs-obs.

@houseme houseme added this pull request to the merge queue Apr 18, 2026
Merged via the queue into main with commit 1cbf156 Apr 18, 2026
11 checks passed
@houseme houseme deleted the refactor/improve-metrics-system branch April 18, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants