Skip to content

Add self-monitoring metrics on separate port 8081#1382

Merged
theyoprst merged 1 commit intomainfrom
exporter/self-monitoring
Aug 6, 2025
Merged

Add self-monitoring metrics on separate port 8081#1382
theyoprst merged 1 commit intomainfrom
exporter/self-monitoring

Conversation

@theyoprst
Copy link
Collaborator

@theyoprst theyoprst commented Aug 6, 2025

Add self-monitoring capabilities:

  • New monitoring server on port 8081 with 5 Prometheus metrics (slurm_exporter_ prefix)
  • Collection performance tracking: duration, attempts, failures, request counts
  • Separate HTTP endpoint to isolate operational metrics from business metrics
  • Single PodMonitor with dual endpoints for efficient Kubernetes resource usage
  • Added --monitoring-bind-address flag and comprehensive documentation

This enables monitoring of the exporter's own health and performance separate from SLURM business metrics.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive self-monitoring capabilities to the SLURM exporter by introducing a dedicated monitoring endpoint on port 8081. The changes enable tracking of the exporter's own health and performance metrics separately from SLURM business metrics.

  • Introduces 5 self-monitoring Prometheus metrics with slurm_exporter_ prefix for collection performance tracking
  • Adds separate HTTP server on port 8081 for operational metrics isolation
  • Updates Kubernetes PodMonitor to scrape both business and operational metrics endpoints

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/exporter/monitoring.go New monitoring metrics implementation with 5 self-tracking metrics
internal/exporter/exporter.go Added monitoring server and instrumented metrics handler
internal/exporter/collector.go Integrated monitoring metrics into collection lifecycle
internal/render/exporter/pod_monitor.go Added second endpoint for monitoring metrics scraping
internal/render/exporter/container.go Added monitoring port to container specification
internal/consts/container.go Added monitoring endpoint constants
cmd/exporter/main.go Added monitoring bind address flag
docs/slurm-exporter.md Comprehensive documentation for self-monitoring features
Comments suppressed due to low confidence (1)

internal/exporter/monitoring_test.go:125

  • The getMetricValue helper function is defined but never used in the test file. This creates dead code that should either be utilized in tests or removed.
func getMetricValue(families []*dto.MetricFamily, name string, metricType string) float64 {

Implements Phase 2 of issue 1369 by adding comprehensive self-monitoring capabilities:
- New monitoring server on port 8081 with 5 Prometheus metrics (slurm_exporter_ prefix)
- Collection performance tracking: duration, attempts, failures, request counts
- Separate HTTP endpoint to isolate operational metrics from business metrics
- Single PodMonitor with dual endpoints for efficient Kubernetes resource usage
- Added --monitoring-bind-address flag and comprehensive documentation

Main metrics: collection_duration_seconds, collection_attempts_total,
collection_failures_total, metrics_requests_total, metrics_exported
@theyoprst theyoprst force-pushed the exporter/self-monitoring branch from aafb5a0 to f1d6648 Compare August 6, 2025 11:00
@theyoprst theyoprst merged commit bd8d30d into main Aug 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants