Skip to content

Add controller RPC metrics export#1053

Merged
theyoprst merged 1 commit intodevfrom
controller-rpc-metrics
Jun 24, 2025
Merged

Add controller RPC metrics export#1053
theyoprst merged 1 commit intodevfrom
controller-rpc-metrics

Conversation

@theyoprst
Copy link
Collaborator

Summary

Implements SLURM controller RPC metrics export functionality for monitoring controller performance.

Changes

  • RPC Metrics by Message Type: Exports slurm_controller_rpc_calls_total and slurm_controller_rpc_duration_seconds_total metrics with message_type labels
  • RPC Metrics by User: Exports slurm_controller_rpc_user_calls_total and slurm_controller_rpc_user_duration_seconds_total metrics with user and user_id labels
  • Controller Metrics: Exports slurm_controller_server_thread_count gauge metric

New Metrics Output

slurm_controller_rpc_calls_total{message_type="MESSAGE_NODE_REGISTRATION_STATUS"} 14
slurm_controller_rpc_calls_total{message_type="REQUEST_JOB_INFO"} 370
slurm_controller_rpc_calls_total{message_type="REQUEST_NODE_INFO"} 742
slurm_controller_rpc_calls_total{message_type="REQUEST_NODE_INFO_SINGLE"} 356
slurm_controller_rpc_calls_total{message_type="REQUEST_PARTITION_INFO"} 1100
slurm_controller_rpc_calls_total{message_type="REQUEST_PING"} 536
slurm_controller_rpc_calls_total{message_type="REQUEST_STATS_INFO"} 4

slurm_controller_rpc_duration_seconds_total{message_type="MESSAGE_NODE_REGISTRATION_STATUS"} 0.001678
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_JOB_INFO"} 0.03879
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_NODE_INFO"} 0.077907
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_NODE_INFO_SINGLE"} 0.020751
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_PARTITION_INFO"} 0.062916
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_PING"} 0.018569
slurm_controller_rpc_duration_seconds_total{message_type="REQUEST_STATS_INFO"} 0.00024

slurm_controller_rpc_user_calls_total{user="root",user_id="0"} 3122
slurm_controller_rpc_user_duration_seconds_total{user="root",user_id="0"} 0.220851

slurm_controller_server_thread_count 1

Resolves #1027

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Implements controller-side RPC diagnostics export for Prometheus by introducing a new GetDiag API call and wiring its data into the metrics collector.

  • Adds GetDiag to the Slurm API interface, client implementation, and mock.
  • Defines and registers four RPC-related counters and one gauge for server threads.
  • Extends unit tests to verify RPC metrics output and edge‐case handling.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/slurmapi/interface.go Added GetDiag to the Client interface
internal/slurmapi/fake/mock_client.go Implemented mock methods for GetDiag
internal/slurmapi/client.go Implemented GetDiag using SlurmV0041GetDiagWithResponse and error checks
internal/exporter/collector.go Defined RPC and thread‐count metric descriptors and wired them into Collect
internal/exporter/collector_test.go Extended tests to assert presence/absence of new metrics
Comments suppressed due to low confidence (1)

internal/exporter/collector_test.go:11

  • You’ve imported the same package twice under two names (api and slurmapispec), which will cause a compile error. Remove the redundant import alias.
	slurmapispec "github.com/SlinkyProject/slurm-client/api/v0041"

@theyoprst theyoprst force-pushed the controller-rpc-metrics branch 2 times, most recently from 70e7ffa to 1ac8c95 Compare June 24, 2025 14:09
Export SLURM controller diagnostics as Prometheus metrics including:
- RPC calls and duration by message type
- RPC calls and duration by user
- Controller server thread count
@theyoprst theyoprst force-pushed the controller-rpc-metrics branch from 1ac8c95 to 6a684bb Compare June 24, 2025 14:10
@theyoprst theyoprst merged commit 576d26a into dev Jun 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Controller RPC metrics (like in sdiag)

4 participants