RSPEED-2943: add SLO monitoring metrics by major · Pull Request #1637 · lightspeed-core/lightspeed-stack

major · 2026-04-29T23:15:04Z

Description

Adds bounded Prometheus metrics for SLO and dashboard coverage across authentication, authorization, quota checks, and LLM inference calls. This covers /v1/responses, /v1/infer, rh-identity and other auth modules, and provider/model inference timing without adding high-cardinality labels.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: OpenCode, CodeRabbit local review
Generated by: N/A

Related Tickets & Documents

Related Issue # RSPEED-2943
Closes # RSPEED-2943

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

uv run pytest tests/unit/metrics/test_recording.py tests/unit/authentication/test_utils.py tests/unit/app/endpoints/test_responses.py
- 89 passed, 1 existing Authlib deprecation warning
uv run make verify
- black, pylint, pyright, ruff, pydocstyle, and mypy passed
uv run radon cc -s src/app/endpoints/responses.py src/app/endpoints/rlsapi_v1.py src/authentication/rh_identity.py src/authentication/k8s.py src/authorization/middleware.py src/metrics/recording.py
- new/changed helper logic stayed A/B; remaining C complexity in responses.py matches baseline
Local CodeRabbit review completed and actionable findings were addressed.
Final Oracle review passed with no blockers.

Summary by CodeRabbit

Release Notes

New Features

Added metrics for authentication, authorization, quota checks, and LLM inference latency
Introduced duration tracking for authentication and authorization operations
Added support for tracking quota check results and inference latency across multiple models

Tests

Added unit tests for metric recording functions

Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai · 2026-04-29T23:15:18Z

Walkthrough

This pull request introduces comprehensive metrics instrumentation across authentication, authorization, quota validation, and LLM inference processing paths. New Prometheus metrics track authentication attempts, authorization checks, quota availability, and inference latency. Metrics recording helpers are added, and authentication/authorization components are instrumented with monotonic timing and structured outcome categorization.

Changes

Cohort / File(s)	Summary
Metrics Infrastructure `src/metrics/__init__.py`, `src/metrics/recording.py`	Introduces new Prometheus counters and histograms for auth attempts, authorization checks, quota checks, and LLM inference duration. Adds six recording helper functions that wrap metric updates with exception handling to prevent failures from propagating.
Authentication Utilities `src/authentication/utils.py`	Adds `record_auth_metrics` helper that records both attempt counters and duration histograms using monotonic timing, with broad exception handling to log (not raise) metric failures.
Authentication Handlers `src/authentication/api_key_token.py`, `src/authentication/noop.py`, `src/authentication/noop_with_token.py`	Instruments each auth dependency with monotonic timing and metrics recording for missing tokens, validation failures, and successful authentication paths, categorizing outcomes with specific reason labels.
Authentication Handlers (Refactored) `src/authentication/jwk_token.py`, `src/authentication/k8s.py`, `src/authentication/rh_identity.py`	Refactors error handling into dedicated helper functions while adding per-request timing and metrics recording for distinct failure modes (missing headers, invalid claims, JWK fetch errors, token decode errors, authorization failures).
Authorization Middleware `src/authorization/middleware.py`	Instruments authorization check with monotonic timing; records denied/success/error outcomes and elapsed duration via new metrics recording functions; wraps all paths (including error cases) in try/finally for consistent metric emission.
Endpoint Quota & Inference Metrics `src/app/endpoints/rlsapi_v1.py`	Adds `_check_infer_quota` helper wrapping quota subject resolution and `check_tokens_available` with duration-bounded metrics; records inference success/failure metrics on handled exception types before error handling continues.
Response Stream Processing `src/app/endpoints/responses.py`	Refactors streaming SSE chunk handling into dedicated helpers; adds monotonic inference timing tracking; records inference success metrics after backend response creation and failure metrics for both streaming exceptions and non-streaming error paths.
Test Coverage `tests/unit/authentication/test_utils.py`, `tests/unit/metrics/test_recording.py`, `tests/unit/app/endpoints/test_responses.py`	Adds unit tests for auth metrics utility, metric recording helpers (with mocked Prometheus interactions and exception paths), and streaming response inference failure scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'RSPEED-2943: add SLO monitoring metrics' directly and clearly describes the main purpose of the PR: adding metrics for SLO (Service Level Objective) monitoring. This aligns with the substantial changes across authentication, authorization, quota checks, and LLM inference instrumentation detailed in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 538-541: Update the docstring for _check_infer_quota(request:
Request, auth: AuthTuple, endpoint_path: str) to include a Returns section that
documents the return type Optional[str], specifying that it returns a string
error message when quota is exceeded or another blocking condition is detected
and returns None when the check passes; follow the project's docstring style
(brief summary, Args, Returns) and place the Returns section directly under the
summary as with other endpoint helpers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cd2fbd70-652a-4f70-a856-0f94e95d585a

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and cc793cd.

📒 Files selected for processing (15)

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py
src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/authorization/middleware.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/app/endpoints/test_responses.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: Pylinter

🧰 Additional context used

📓 Path-based instructions (6)

src/**/*.py