RSPEED-3017: use custom buckets for response duration histogram#1702
Conversation
The response_duration_seconds histogram used prometheus_client default buckets which max out at 10s, causing histogram_quantile in Grafana to appear capped for requests exceeding 10 seconds. Reuse the existing LLM_INFERENCE_DURATION_BUCKETS (0.1-120s) to cover the full expected response time range. Signed-off-by: Major Hayden <major@redhat.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Description
Add custom bucket boundaries to the
response_duration_secondsPrometheus histogram. The defaultprometheus_clientbuckets max out at 10s, which caps Grafanahistogram_quantilecalculations for requests that take longer. This reuses the existingLLM_INFERENCE_DURATION_BUCKETStuple (0.1-120s) already used byllm_inference_duration_seconds.Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run make test-unit).response_duration_secondshistogram now uses the custom buckets (0.1, 0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 60.0, 120.0, +Inf) matching thellm_inference_duration_secondshistogram.uv run make verifypasses cleanly.