refactor(obs): migrate metrics runtime/schema and tighten migration guards#2584
refactor(obs): migrate metrics runtime/schema and tighten migration guards#2584
Conversation
|
CLA requirements are satisfied for this pull request. |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF ScorecardScorecard details
Scanned Files
|
There was a problem hiding this comment.
Pull request overview
This PR expands RustFS observability by adding cross-platform process/lock statistics gathering and introducing per-target delivery metrics for audit/notification targets, then wiring those into the metrics collectors.
Changes:
- Add a cross-platform process metrics provider (platform snapshots + lock-held counters) and collect process resource + process stats in a single sysinfo refresh.
- Add per-target delivery counters/snapshots to notification targets (webhook/mqtt) and propagate “final failure” signals through notification/audit pipelines.
- Add Prometheus metric descriptors + collectors for notification target metrics and schedule periodic audit/notification metrics collection tasks.
Reviewed changes
Copilot reviewed 31 out of 32 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/targets/src/target/webhook.rs | Add shared delivery counters and expose delivery snapshots/failure recording. |
| crates/targets/src/target/mqtt.rs | Add shared delivery counters and expose delivery snapshots/failure recording. |
| crates/targets/src/target/mod.rs | Introduce delivery snapshot/counters types; add Dropped handling + trait hooks for metrics. |
| crates/targets/src/lib.rs | Re-export TargetDeliverySnapshot for downstream metrics usage. |
| crates/targets/src/error.rs | Add TargetError::Dropped for invalid queued payloads. |
| crates/notify/src/stream.rs | Treat dropped payloads as non-retryable; record final failures for permanent/max-retry cases. |
| crates/notify/src/notifier.rs | Thread shared NotificationMetrics through notifier and record skipped/processed/failed counters. |
| crates/notify/src/lib.rs | Export new global snapshot APIs + metric snapshot types. |
| crates/notify/src/integration.rs | Add skipped counters + snapshot structs; add per-target metrics snapshot API. |
| crates/notify/src/global.rs | Provide global snapshot functions for aggregate + per-target notification metrics. |
| crates/metrics/src/metrics_type/notification_target.rs | Define metric descriptors for per-target notification delivery metrics. |
| crates/metrics/src/metrics_type/mod.rs | Register notification_target metrics type module. |
| crates/metrics/src/metrics_type/logger_webhook.rs | Remove legacy logger webhook metric descriptors. |
| crates/metrics/src/metrics_type/entry/subsystem.rs | Remove LoggerWebhook subsystem entries. |
| crates/metrics/src/metrics_type/entry/metric_name.rs | Add notification-target metric names; remove legacy webhook metric names. |
| crates/metrics/src/metrics_type/cluster_notification.rs | Fix current-send-in-progress descriptor to be a gauge; update skipped description text. |
| crates/metrics/src/constants/mod.rs | Add env/config defaults for audit + notification metrics intervals. |
| crates/metrics/src/collectors/stats_collector.rs | Collect resource + process stats together; incorporate platform + lock snapshots. |
| crates/metrics/src/collectors/notification_target.rs | New collector to render per-target notification metrics to Prometheus output. |
| crates/metrics/src/collectors/mod.rs | Register notification-target collector and remove logger webhook collector exports. |
| crates/metrics/src/collectors/logger_webhook.rs | Remove legacy webhook metrics collector. |
| crates/metrics/src/collectors/global.rs | Schedule periodic audit + notification delivery metrics collection; emit new process metrics. |
| crates/metrics/Cargo.toml | Add deps on audit/notify/io-metrics for new collectors and process stats. |
| crates/lock/src/fast_lock/guard.rs | Record lock-held acquire/release events into io-metrics. |
| crates/lock/src/distributed_lock.rs | Record lock-held acquire/release events for distributed locks. |
| crates/lock/Cargo.toml | Add dependency on rustfs-io-metrics. |
| crates/io-metrics/src/process_lock_metrics.rs | New cross-platform platform snapshot + lock-held counters implementation. |
| crates/io-metrics/src/lib.rs | Export process lock/platform snapshot APIs. |
| crates/audit/src/system.rs | Add dropped handling + record final failures; add per-target audit delivery snapshot API/type. |
| crates/audit/src/lib.rs | Re-export AuditTargetMetricSnapshot. |
| crates/audit/src/global.rs | Provide global async snapshot API for audit target metrics. |
| Cargo.lock | Lockfile updates for new intra-workspace dependencies. |
| let (k, v) = line.split_once(':')?; | ||
| if k.trim().eq_ignore_ascii_case(key) { | ||
| return v.trim().parse::<u64>().ok(); |
| pub(crate) fn new(lock_id: LockId, entries: Vec<(LockId, Arc<dyn LockClient>)>, lock_type: LockType) -> Self { | ||
| record_lock_held_acquire(lock_type); | ||
| Self { | ||
| lock_id, | ||
| entries, |
There was a problem hiding this comment.
Fixed. disarm() now records a held-lock release once when transitioning to disarmed.
| // Spawn task for audit target delivery metrics | ||
| let token_clone = token.clone(); | ||
| tokio::spawn(async move { | ||
| let mut interval = tokio::time::interval(audit_interval); | ||
| loop { | ||
| tokio::select! { | ||
| _ = interval.tick() => { | ||
| let stats = audit_target_metrics().await | ||
| .into_iter() | ||
| .map(|snapshot| AuditTargetStats { | ||
| failed_messages: snapshot.failed_messages, |
There was a problem hiding this comment.
Pull request overview
This PR migrates the Prometheus-style metrics runtime and schema into rustfs-obs, adds a cross-platform process stats sampling provider in rustfs-io-metrics, and wires per-target delivery counters into notification/audit targets for richer observability.
Changes:
- Move metrics scheduling/reporting and metric descriptors into
crates/obs/src/metrics/*and updaterustfsto initialize metrics viarustfs_obs::init_metrics_runtime. - Add per-target delivery counters/snapshots for MQTT/Webhook targets and plumb final-failure recording through notify/audit processing loops.
- Introduce cross-platform process/system snapshot collection in
rustfs-io-metricsand instrument lock guards to track locks held.
Reviewed changes
Copilot reviewed 71 out of 99 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/check_metrics_migration_refs.sh | Adds a ripgrep-based guard script to prevent reintroducing legacy metrics references during migration. |
| rustfs/src/main.rs | Switches metrics initialization to rustfs_obs::init_metrics_runtime. |
| rustfs/src/config/info.rs | Updates feature dependency string from rustfs-metrics/gpu to rustfs-obs/gpu. |
| rustfs/Cargo.toml | Removes rustfs-metrics dependency; routes metrics-gpu feature to rustfs-obs/gpu. |
| crates/targets/src/target/webhook.rs | Adds delivery counters and exposes delivery snapshots/final-failure recording for webhook targets. |
| crates/targets/src/target/mqtt.rs | Adds delivery counters and exposes delivery snapshots/final-failure recording for MQTT targets. |
| crates/targets/src/target/mod.rs | Introduces delivery snapshot/counter types and propagates “dropped payload” as an error with metrics hooks. |
| crates/targets/src/lib.rs | Re-exports TargetDeliverySnapshot for downstream metrics collection. |
| crates/targets/src/error.rs | Adds TargetError::Dropped for invalid queued payloads. |
| crates/obs/src/metrics/stats_collector.rs | Switches process stats collection to rustfs-io-metrics snapshots and splits resource vs system stats. |
| crates/obs/src/metrics/schema/system_process.rs | Adds metric descriptors for process/system metrics under the obs schema. |
| crates/obs/src/metrics/schema/system_network.rs | Adds metric descriptors for internode and process network metrics. |
| crates/obs/src/metrics/schema/system_memory.rs | Adds metric descriptors for system memory metrics. |
| crates/obs/src/metrics/schema/system_gpu.rs | Adds GPU metric descriptor(s) behind the obs gpu feature. |
| crates/obs/src/metrics/schema/system_drive.rs | Adds system drive metric descriptors/labels used by drive collectors. |
| crates/obs/src/metrics/schema/system_cpu.rs | Adds system CPU metric descriptors. |
| crates/obs/src/metrics/schema/scanner.rs | Adds scanner metric descriptors. |
| crates/obs/src/metrics/schema/request.rs | Adds API request metric descriptors. |
| crates/obs/src/metrics/schema/replication.rs | Adds replication metric descriptors. |
| crates/obs/src/metrics/schema/process_resource.rs | Adds resource (cpu/memory/uptime) metric descriptors under /process. |
| crates/obs/src/metrics/schema/notification_target.rs | Adds per-notification-target delivery metric descriptors. |
| crates/obs/src/metrics/schema/node_disk.rs | Adds node disk metric descriptors. |
| crates/obs/src/metrics/schema/node_bucket.rs | Adds node/bucket quota/usage metric descriptors. |
| crates/obs/src/metrics/schema/mod.rs | Wires new schema modules and removes the legacy logger_webhook schema module. |
| crates/obs/src/metrics/schema/ilm.rs | Adds ILM metric descriptors. |
| crates/obs/src/metrics/schema/entry/subsystem.rs | Removes the LoggerWebhook subsystem and keeps subsystem path mappings consistent. |
| crates/obs/src/metrics/schema/entry/path_utils.rs | Adds helper to format metric subsystem paths and includes unit tests. |
| crates/obs/src/metrics/schema/entry/namespace.rs | Adds MetricNamespace type used for full metric name generation. |
| crates/obs/src/metrics/schema/entry/mod.rs | Adds descriptor factory helpers (counter/gauge/histogram) and unit tests. |
| crates/obs/src/metrics/schema/entry/metric_type.rs | Adds MetricType enum for schema descriptors. |
| crates/obs/src/metrics/schema/entry/metric_name.rs | Extends metric name enum to include notification target delivery metrics; removes webhook log metric names. |
| crates/obs/src/metrics/schema/entry/descriptor.rs | Adds MetricDescriptor implementation + tests for full metric name conventions. |
| crates/obs/src/metrics/schema/cluster_usage.rs | Adds cluster usage metric descriptors. |
| crates/obs/src/metrics/schema/cluster_notification.rs | Adjusts notification metrics descriptor types/help text (e.g., in-progress is a gauge). |
| crates/obs/src/metrics/schema/cluster_iam.rs | Adds IAM metric descriptors. |
| crates/obs/src/metrics/schema/cluster_health.rs | Adds cluster health metric descriptors. |
| crates/obs/src/metrics/schema/cluster_erasure_set.rs | Adds erasure-set metric descriptors. |
| crates/obs/src/metrics/schema/cluster_config.rs | Adds cluster config metric descriptors. |
| crates/obs/src/metrics/schema/cluster.rs | Adds base cluster capacity/object metric descriptors. |
| crates/obs/src/metrics/schema/bucket_replication.rs | Adds bucket replication metric descriptors. |
| crates/obs/src/metrics/schema/bucket.rs | Adds bucket API metric descriptors. |
| crates/obs/src/metrics/schema/audit.rs | Adds audit delivery metric descriptors. |
| crates/obs/src/metrics/scheduler.rs | Renames/extends runtime scheduling, adds audit/notification collection tasks, and keeps a compatibility alias. |
| crates/obs/src/metrics/report.rs | Refactors reporting to use the new schema module types. |
| crates/obs/src/metrics/mod.rs | Introduces the metrics module tree and re-exports runtime/schema/report APIs. |
| crates/obs/src/metrics/config.rs | Adds audit/notification interval env vars + defaults. |
| crates/obs/src/metrics/collectors/system_process.rs | Updates imports to new schema/report paths for process collectors. |
| crates/obs/src/metrics/collectors/system_network.rs | Updates imports to new schema/report paths for network collectors. |
| crates/obs/src/metrics/collectors/system_memory.rs | Updates imports to new schema/report paths for memory collectors. |
| crates/obs/src/metrics/collectors/system_gpu.rs | Updates docs/imports for GPU collectors under rustfs_obs::metrics. |
| crates/obs/src/metrics/collectors/system_drive.rs | Updates imports to new schema/report paths for drive collectors. |
| crates/obs/src/metrics/collectors/system_cpu.rs | Updates imports to new schema/report paths for CPU collectors. |
| crates/obs/src/metrics/collectors/scanner.rs | Updates imports to new schema/report paths for scanner collectors. |
| crates/obs/src/metrics/collectors/resource.rs | Updates imports to new schema/report paths for resource collectors. |
| crates/obs/src/metrics/collectors/request.rs | Updates imports to new schema/report paths for request collectors. |
| crates/obs/src/metrics/collectors/replication.rs | Updates imports to new schema/report paths for replication collectors. |
| crates/obs/src/metrics/collectors/notification_target.rs | Adds collector for per-target notification delivery metrics (+ unit test). |
| crates/obs/src/metrics/collectors/notification.rs | Updates imports to new schema/report paths for notification collectors. |
| crates/obs/src/metrics/collectors/node.rs | Updates imports to new schema/report paths for node collectors. |
| crates/obs/src/metrics/collectors/mod.rs | Restructures collectors module visibility/exports and adds notification target collector exports. |
| crates/obs/src/metrics/collectors/ilm.rs | Updates imports to new schema/report paths for ILM collector. |
| crates/obs/src/metrics/collectors/dial9.rs | Updates PrometheusMetric import path for dial9 collector. |
| crates/obs/src/metrics/collectors/cluster_usage.rs | Updates imports to new schema/report paths for cluster usage collector. |
| crates/obs/src/metrics/collectors/cluster_iam.rs | Updates imports to new schema/report paths for IAM collector. |
| crates/obs/src/metrics/collectors/cluster_health.rs | Updates imports to new schema/report paths for cluster health collector. |
| crates/obs/src/metrics/collectors/cluster_erasure_set.rs | Updates imports to new schema/report paths for erasure-set collector. |
| crates/obs/src/metrics/collectors/cluster_config.rs | Updates imports to new schema/report paths for cluster config collector. |
| crates/obs/src/metrics/collectors/cluster.rs | Updates imports to new schema/report paths for cluster collector. |
| crates/obs/src/metrics/collectors/bucket_replication.rs | Updates imports to new schema/report paths for bucket replication collector. |
| crates/obs/src/metrics/collectors/bucket.rs | Updates imports to new schema/report paths for bucket collector. |
| crates/obs/src/metrics/collectors/audit.rs | Updates imports to new schema/report paths for audit collector. |
| crates/obs/src/lib.rs | Exposes pub mod metrics, re-exports schema/runtime APIs, and updates docs to reference init_metrics_runtime. |
| crates/obs/src/global.rs | Updates comment to reflect metrics runtime is owned by rustfs_obs. |
| crates/obs/Cargo.toml | Adds dependencies needed for metrics runtime (audit/notify/ecstore/io-metrics/sysinfo) and introduces gpu feature. |
| crates/notify/src/stream.rs | Handles TargetError::Dropped and records per-target final failures. |
| crates/notify/src/notifier.rs | Plumbs NotificationMetrics into EventNotifier and increments skipped/processing/failed counters. |
| crates/notify/src/lib.rs | Exposes notification metric snapshot APIs from the global module. |
| crates/notify/src/integration.rs | Adds snapshot structs and snapshot methods for aggregate/per-target notification delivery metrics. |
| crates/notify/src/global.rs | Adds global snapshot helpers for notification metrics and per-target delivery metrics. |
| crates/lock/src/fast_lock/guard.rs | Instruments fast lock guard acquisition/release using rustfs-io-metrics lock-held counters. |
| crates/lock/src/distributed_lock.rs | Instruments distributed lock guard acquisition/release using rustfs-io-metrics lock-held counters. |
| crates/lock/Cargo.toml | Adds rustfs-io-metrics dependency for lock-held instrumentation. |
| crates/io-metrics/src/sampler/system.rs | Simplifies system sampler surface to a process-platform snapshot helper. |
| crates/io-metrics/src/sampler/process.rs | Adds unified resource + system process snapshot sampling using sysinfo + platform-specific fallbacks. |
| crates/io-metrics/src/sampler/mod.rs | Adds sampler module exports for process/system snapshot APIs. |
| crates/io-metrics/src/process_lock_metrics.rs | Adds lock-held counters + cross-platform platform stats snapshot implementation. |
| crates/io-metrics/src/lib.rs | Re-exports new sampler/platform snapshot APIs and lock-held counters. |
| crates/io-metrics/Cargo.toml | Adds sysinfo dependency required for process sampling. |
| crates/audit/src/system.rs | Handles TargetError::Dropped, records per-target final failures, and adds per-target metric snapshot support. |
| crates/audit/src/lib.rs | Re-exports AuditTargetMetricSnapshot. |
| crates/audit/src/global.rs | Adds a global audit_target_metrics() accessor for per-target delivery metrics. |
| Cargo.toml | Removes crates/metrics from workspace members and removes rustfs-metrics workspace dependency. |
| Cargo.lock | Updates lockfile for dependency graph changes (including removal of rustfs-metrics). |
| ARCHITECTURE.md | Updates documentation to reflect metrics runtime/schema live in rustfs-obs. |
| .github/workflows/ci.yml | Adds metrics migration guard job and a metrics contract snapshot job (currently pointing at removed rustfs-metrics). |
| crates/metrics/src/metrics_type/logger_webhook.rs | Deletes legacy metrics descriptor module from removed rustfs-metrics crate. |
| crates/metrics/src/global.rs | Deletes legacy metrics init entrypoint from removed rustfs-metrics crate. |
| crates/metrics/src/collectors/logger_webhook.rs | Deletes legacy webhook collector from removed rustfs-metrics crate. |
| crates/metrics/Cargo.toml | Removes the rustfs-metrics crate package definition. |
| metrics-contract-snapshot: | ||
| name: Metrics Contract Snapshot | ||
| needs: skip-check | ||
| if: needs.skip-check.outputs.should_skip != 'true' | ||
| runs-on: ubicloud-standard-2 | ||
| timeout-minutes: 20 | ||
| env: | ||
| FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true" | ||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v6 | ||
|
|
||
| - name: Setup Rust environment | ||
| uses: ./.github/actions/setup | ||
| with: | ||
| rust-version: stable | ||
| cache-shared-key: ci-metrics-contract-${{ hashFiles('**/Cargo.lock') }} | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| cache-save-if: ${{ github.ref == 'refs/heads/main' }} | ||
|
|
||
| - name: Verify metrics contract snapshot | ||
| run: cargo test -p rustfs-metrics --test metric_contract_snapshot | ||
|
|
There was a problem hiding this comment.
Fixed. Removed obsolete metrics migration/snapshot jobs that referenced the deleted rustfs-metrics crate.
| fn run_command(command: &str, args: &[&str]) -> Option<String> { | ||
| let output = Command::new(command).args(args).output().ok()?; | ||
| if !output.status.success() { | ||
| return None; | ||
| } | ||
|
|
||
| let stdout = String::from_utf8_lossy(&output.stdout).trim().to_string(); | ||
| if stdout.is_empty() { None } else { Some(stdout) } | ||
| } |
There was a problem hiding this comment.
Fixed. run_command and Command import are now gated to the platforms that use them.
| #[cfg(any(target_os = "linux", target_os = "windows", test))] | ||
| fn parse_kv_u64(content: &str, key: &str) -> Option<u64> { | ||
| for line in content.lines() { | ||
| let (k, v) = line.split_once(':')?; | ||
| if k.trim().eq_ignore_ascii_case(key) { | ||
| return v.trim().parse::<u64>().ok(); | ||
| } | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
Fixed. parse_kv_u64 now skips non-matching lines (continue) instead of returning early.
There was a problem hiding this comment.
Pull request overview
This PR completes the metrics migration by moving metrics runtime scheduling + schema/descriptor ownership into rustfs-obs, removing the legacy rustfs-metrics crate, and adding stronger migration guardrails. It also extends observability/metrics coverage for notification/audit target delivery and process/lock metrics.
Changes:
- Migrate metrics runtime initialization from
rustfs-metricstorustfs-obs(init_metrics_runtime) and remove thecrates/metricscrate from the workspace. - Add delivery metrics plumbing for notification/audit targets (per-target counters + snapshots) and integrate them into the
rustfs-obsscheduler. - Add process/lock sampling utilities in
rustfs-io-metricsand wire lock-held counters into fast/distributed locks.
Reviewed changes
Copilot reviewed 70 out of 98 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/check_metrics_migration_refs.sh | Adds ripgrep-based guard to block reintroduction of legacy metrics references (with allowlist). |
| rustfs/src/main.rs | Switches binary startup from init_metrics_system to rustfs_obs::init_metrics_runtime. |
| rustfs/src/config/info.rs | Removes direct-io feature from reported feature list and updates tests; updates metrics-gpu dependency label. |
| rustfs/Cargo.toml | Removes rustfs-metrics dependency and updates feature wiring (default features, metrics-gpu, full). |
| crates/targets/src/target/webhook.rs | Adds per-target delivery counters and records success/final-failure outcomes for webhook target. |
| crates/targets/src/target/mqtt.rs | Adds per-target delivery counters and records success/final-failure outcomes for MQTT target. |
| crates/targets/src/target/mod.rs | Introduces TargetDeliverySnapshot/Counters and changes corrupt queued payload handling to TargetError::Dropped. |
| crates/targets/src/lib.rs | Re-exports TargetDeliverySnapshot for downstream metric collection. |
| crates/targets/src/error.rs | Adds TargetError::Dropped to represent permanently dropped queued payloads. |
| crates/obs/src/metrics/stats_collector.rs | Replaces direct sysinfo sampling with rustfs-io-metrics process snapshot helpers and exposes process/system stat collectors. |
| crates/obs/src/metrics/schema/system_process.rs | Adds system process metric descriptors (locks, CPU, fds, IO, status, GPU-related process metrics, etc.). |
| crates/obs/src/metrics/schema/system_network.rs | Adds internode network metric descriptors. |
| crates/obs/src/metrics/schema/system_memory.rs | Adds system memory metric descriptors. |
| crates/obs/src/metrics/schema/system_gpu.rs | Adds GPU metric descriptor(s) behind feature use. |
| crates/obs/src/metrics/schema/system_drive.rs | Adds drive metric descriptors and drive label constants. |
| crates/obs/src/metrics/schema/system_cpu.rs | Adds system CPU metric descriptors. |
| crates/obs/src/metrics/schema/scanner.rs | Adds scanner metric descriptors. |
| crates/obs/src/metrics/schema/request.rs | Adds API request metric descriptors. |
| crates/obs/src/metrics/schema/replication.rs | Adds replication metric descriptors. |
| crates/obs/src/metrics/schema/process_resource.rs | Adds process resource metric descriptors (custom subsystem). |
| crates/obs/src/metrics/schema/notification_target.rs | Adds per-notification-target delivery metric descriptors. |
| crates/obs/src/metrics/schema/node_disk.rs | Adds node disk descriptor(s) (custom subsystem). |
| crates/obs/src/metrics/schema/node_bucket.rs | Adds bucket usage/quota descriptor(s) (custom naming). |
| crates/obs/src/metrics/schema/mod.rs | Registers schema modules and swaps logger webhook schema for notification target schema. |
| crates/obs/src/metrics/schema/ilm.rs | Adds ILM metric descriptors. |
| crates/obs/src/metrics/schema/entry/subsystem.rs | Removes legacy logger webhook subsystem and keeps canonical subsystem path->name formatting. |
| crates/obs/src/metrics/schema/entry/path_utils.rs | Adds shared path formatting helper with unit tests. |
| crates/obs/src/metrics/schema/entry/namespace.rs | Adds MetricNamespace for name generation. |
| crates/obs/src/metrics/schema/entry/mod.rs | Adds descriptor factory helpers (counter/gauge/histogram) + tests. |
| crates/obs/src/metrics/schema/entry/metric_type.rs | Adds MetricType enum used by descriptors and reporting. |
| crates/obs/src/metrics/schema/entry/metric_name.rs | Adds notification-target metric names and removes webhook-log metric names. |
| crates/obs/src/metrics/schema/entry/descriptor.rs | Adds MetricDescriptor (name/type/help/labels/subsystem) + tests for name generation. |
| crates/obs/src/metrics/schema/cluster_usage.rs | Adds cluster usage metric descriptors. |
| crates/obs/src/metrics/schema/cluster_notification.rs | Fixes notification in-progress metric to gauge and adjusts skipped-events help string. |
| crates/obs/src/metrics/schema/cluster_iam.rs | Adds IAM metric descriptors. |
| crates/obs/src/metrics/schema/cluster_health.rs | Adds cluster health metric descriptors. |
| crates/obs/src/metrics/schema/cluster_erasure_set.rs | Adds erasure-set metric descriptors. |
| crates/obs/src/metrics/schema/cluster_config.rs | Adds cluster config metric descriptors. |
| crates/obs/src/metrics/schema/cluster.rs | Adds base cluster capacity/object/bucket descriptors (custom names). |
| crates/obs/src/metrics/schema/bucket_replication.rs | Adds bucket replication metric descriptors. |
| crates/obs/src/metrics/schema/bucket.rs | Adds per-bucket API metric descriptors (traffic, inflight, totals, errors, histogram). |
| crates/obs/src/metrics/schema/audit.rs | Adds audit target delivery metric descriptors. |
| crates/obs/src/metrics/scheduler.rs | Renames collector init to init_metrics_runtime, adds audit/notification metric tasks, and keeps a compatibility alias. |
| crates/obs/src/metrics/report.rs | Moves reporting to use new schema types and keeps metrics crate integration for counters/gauges/histograms. |
| crates/obs/src/metrics/mod.rs | Introduces metrics module boundary and re-exports runtime + schema/report entrypoints. |
| crates/obs/src/metrics/config.rs | Adds env keys/default intervals for audit + notification metric scheduling. |
| crates/obs/src/metrics/collectors/system_process.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/system_network.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/system_memory.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/system_gpu.rs | Updates docs/imports to new crate/module names and schema paths. |
| crates/obs/src/metrics/collectors/system_drive.rs | Updates imports and descriptor typing to new schema module. |
| crates/obs/src/metrics/collectors/system_cpu.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/scanner.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/resource.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/request.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/replication.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/notification_target.rs | Adds collector to emit per-target notification delivery metrics. |
| crates/obs/src/metrics/collectors/notification.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/node.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/mod.rs | Makes collectors public modules, removes legacy webhook collector export, adds notification target collector export. |
| crates/obs/src/metrics/collectors/ilm.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/dial9.rs | Updates PrometheusMetric import path. |
| crates/obs/src/metrics/collectors/cluster_usage.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/cluster_iam.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/cluster_health.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/cluster_erasure_set.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/cluster_config.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/cluster.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/bucket_replication.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/bucket.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/metrics/collectors/audit.rs | Updates imports to new schema/report module paths. |
| crates/obs/src/lib.rs | Publicly exposes metrics module, re-exports schema, and updates docs to reference init_metrics_runtime. |
| crates/obs/src/global.rs | Updates comments to indicate metrics runtime scheduling is exposed via rustfs_obs. |
| crates/obs/Cargo.toml | Adds dependencies needed for migrated metrics runtime/collectors (audit/notify/ecstore/io-metrics/sysinfo/tokio-util) and defines gpu feature. |
| crates/notify/src/stream.rs | Handles TargetError::Dropped explicitly and records per-target final failures via Target::record_final_failure. |
| crates/notify/src/notifier.rs | Threads NotificationMetrics into notifier, counts skipped events, and refines processed/failed accounting for deferred vs direct delivery. |
| crates/notify/src/lib.rs | Re-exports global snapshot helpers and metric snapshot types. |
| crates/notify/src/integration.rs | Adds snapshot types, adds skipped counter + snapshot helpers, and adds per-target delivery snapshot collection. |
| crates/notify/src/global.rs | Adds global functions to snapshot aggregate and per-target notification delivery metrics. |
| crates/metrics/src/metrics_type/logger_webhook.rs | Deletes legacy rustfs-metrics schema for webhook logs. |
| crates/metrics/src/global.rs | Deletes legacy rustfs-metrics init entrypoint. |
| crates/metrics/src/collectors/logger_webhook.rs | Deletes legacy rustfs-metrics webhook collector. |
| crates/metrics/Cargo.toml | Removes legacy rustfs-metrics crate manifest. |
| crates/lock/src/fast_lock/guard.rs | Records lock-held acquire/release counters via rustfs-io-metrics when acquiring/releasing fast locks. |
| crates/lock/src/distributed_lock.rs | Records lock-held acquire/release counters and ensures disarm() decrements held-lock counters. |
| crates/lock/Cargo.toml | Adds dependency on rustfs-io-metrics for lock-held counters. |
| crates/io-metrics/src/sampler/system.rs | Refactors sampler module API to expose platform snapshot helper. |
| crates/io-metrics/src/sampler/process.rs | Adds sysinfo-based process resource/system snapshot collection (including lock-held counters). |
| crates/io-metrics/src/sampler/mod.rs | Adds sampler module exports for process + platform snapshots. |
| crates/io-metrics/src/process_lock_metrics.rs | Adds lock-held counters and platform-specific process I/O/syscall/VM max sampling. |
| crates/io-metrics/src/lib.rs | Re-exports new process lock/platform snapshot and sampler APIs. |
| crates/io-metrics/Cargo.toml | Adds sysinfo dependency to support new samplers. |
| crates/audit/src/system.rs | Adds per-target audit metric snapshot types and handles dropped payloads + records final failures. |
| crates/audit/src/lib.rs | Re-exports AuditTargetMetricSnapshot. |
| crates/audit/src/global.rs | Adds global async API to fetch audit target metrics for Prometheus collection. |
| Cargo.toml | Removes crates/metrics from workspace members and removes rustfs-metrics workspace dependency entry. |
| Cargo.lock | Updates lockfile for removed crate and dependency graph changes (including sysinfo + other transitive updates). |
| ARCHITECTURE.md | Updates architecture docs to point metrics runtime/schema ownership to rustfs-obs. |
Type of Change
Related Issues
Summary of Changes
This PR completes the metrics migration by moving runtime/schema responsibilities to
rustfs-obs, removing the legacyrustfs-metricscrate, and tightening migration guardrails.Additional fixes included after review:
parse_kv_u64to skip nonkey:valuelines instead of aborting early.run_command/parse_kv_u64with platform cfgs to avoid dead_code warnings.DistributedLockGuard::disarm()to release held-lock counters when disarming.rustfs-metricspackage.Checklist
make pre-commitImpact
Additional Notes
Thank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md). If this is your first contribution, review the CLA document and sign it by commenting
I have read and agree to the CLA.on the PR.