Skip to content

Conversation

@gma1k
Copy link
Owner

@gma1k gma1k commented Dec 9, 2025

Add Resource Limit Monitoring

Summary

Implements resource limit monitoring for Kubernetes pods, tracking CPU, memory, and I/O utilization against cgroup limits with configurable alert thresholds and full integration into Podtrace's diagnostic reporting system.

Features

  • Resource Monitoring: Tracks CPU, memory, and I/O usage vs. limits from cgroup filesystems
  • Alert System: Configurable thresholds (80% WARNING, 90% CRITICAL, 95% EMERGENCY)
  • eBPF Integration: Uses BPF maps (cgroup_limits, cgroup_alerts) for kernel-space data storage
  • Diagnostic Reports: Resource limit statistics and alerts included in auto-detection summary
  • Prometheus Metrics: Exports podtrace_resource_limit_bytes, podtrace_resource_usage_bytes, podtrace_resource_utilization_percent, and podtrace_resource_alert_level
  • Event System: Emits EVENT_RESOURCE_LIMIT events integrated with existing event processing pipeline
  • Issue Detection: Automatically flags high resource utilization in diagnostic reports

Testing

  • Unit tests for all parsing functions
  • Integration tests for event formatting, issue detection, and report generation

@gma1k gma1k merged commit 3494462 into main Dec 9, 2025
1 check passed
@gma1k gma1k deleted the fe-feat branch December 9, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants