RSPEED-2943: add SLO monitoring metrics#1637
RSPEED-2943: add SLO monitoring metrics#1637major wants to merge 3 commits intolightspeed-core:mainfrom
Conversation
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
WalkthroughThis pull request introduces comprehensive metrics instrumentation across authentication, authorization, quota validation, and LLM inference processing paths. New Prometheus metrics track authentication attempts, authorization checks, quota availability, and inference latency. Metrics recording helpers are added, and authentication/authorization components are instrumented with monotonic timing and structured outcome categorization. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 538-541: Update the docstring for _check_infer_quota(request:
Request, auth: AuthTuple, endpoint_path: str) to include a Returns section that
documents the return type Optional[str], specifying that it returns a string
error message when quota is exceeded or another blocking condition is detected
and returns None when the check passes; follow the project's docstring style
(brief summary, Args, Returns) and place the Returns section directly under the
summary as with other endpoint helpers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cd2fbd70-652a-4f70-a856-0f94e95d585a
📒 Files selected for processing (15)
src/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: Pylinter
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/authentication/noop.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/authentication/api_key_token.pysrc/authentication/noop_with_token.pysrc/app/endpoints/rlsapi_v1.pysrc/authentication/jwk_token.pysrc/metrics/recording.pysrc/metrics/__init__.pysrc/authentication/k8s.pysrc/authentication/rh_identity.pysrc/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI dependencies:
from fastapi import APIRouter, HTTPException, Request, status, Depends
Files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_utils.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_utils.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (4)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
src/authentication/noop.pysrc/authentication/noop_with_token.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/app/endpoints/rlsapi_v1.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/rlsapi_v1.pysrc/app/endpoints/responses.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
src/authentication/rh_identity.py
🔇 Additional comments (22)
src/metrics/recording.py (1)
114-206: Metric helper additions are consistent and failure-safe.The new auth/authorization/quota/inference recorders keep telemetry writes non-blocking and use bounded label patterns consistently.
src/metrics/__init__.py (1)
59-106: New metric definitions align with recorder contracts.Label sets and metric intent are coherent with the new recording helpers and SLO instrumentation goals.
src/authentication/noop.py (1)
54-67: No-op auth now captures branch-specific telemetry cleanly.Both empty-user-id failure and normal bypass paths are instrumented with bounded outcome/reason labels.
src/authentication/utils.py (1)
44-66: Shared auth-metric helper is a solid consolidation.This removes duplication across auth dependencies while preserving non-blocking telemetry behavior.
tests/unit/authentication/test_utils.py (1)
47-59: New unit test covers the helper’s essential contract.It verifies both metric calls and confirms elapsed duration is computed from monotonic time.
src/authorization/middleware.py (1)
129-172: Authorization middleware instrumentation is well placed.The
finally-based recording ensures outcome/duration metrics are emitted across all control paths.tests/unit/app/endpoints/test_responses.py (1)
1270-1320: Great coverage for streaming iterator failure telemetry.The new parametrized test validates failure propagation and confirms a single inference-failure metric emission.
src/authentication/noop_with_token.py (1)
65-89: No-op-with-token metrics are integrated cleanly across all branches.Success and error paths now emit consistent auth telemetry without altering endpoint-facing behavior.
src/app/endpoints/rlsapi_v1.py (1)
458-460: LGTM!The LLM inference duration metrics are correctly recorded on both success and failure paths, using bounded result labels (
"success"and"failure"). The metric helper signature matches the contract insrc/metrics/recording.py.Also applies to: 754-756
src/authentication/api_key_token.py (1)
64-88: LGTM!The authentication metrics instrumentation correctly:
- Captures
start_timeusingtime.monotonic()at the start of the auth flow- Records metrics with bounded labels before re-raising exceptions on failure paths
- Records success metrics immediately before returning the auth tuple
The pattern is consistent with other auth dependencies in this PR.
src/authentication/jwk_token.py (2)
144-220: LGTM!The refactored helper functions cleanly separate concerns and ensure metrics are recorded before raising exceptions in all failure paths. The
cause_mappattern in_decode_jwk_claimsprovides clear, maintainable error messaging.
271-301: LGTM!The
__call__method correctly captures timing at the start and records bounded auth metrics on all code paths:
missing_headerwhen Authorization header is absentmissing_tokenwhen token extraction fails- Delegates to helpers for JWK fetch, decode, validation, and claim extraction errors
authenticatedon successful completiontests/unit/metrics/test_recording.py (2)
198-236: LGTM!The parametrized test structure using dataclasses is clean and extensible. Testing both success and failure paths in a single test function with mock reset is a pragmatic approach that ensures both code paths are covered.
290-320: LGTM!The quota check test correctly verifies that both the counter and histogram metrics are updated on success, and that a warning is logged (without crashing) when either metric operation fails. The parametrized
failing_metricapproach elegantly covers both failure scenarios.src/authentication/k8s.py (2)
389-420: LGTM!The refactored helpers cleanly encapsulate the kube:admin UID population and SubjectAccessReview creation logic while ensuring metrics are recorded before raising exceptions. The exception categorization (
k8s_api_unavailable,k8s_config_error,authorization_check_error,unexpected_error) provides useful bounded labels for monitoring.Also applies to: 423-459
513-566: LGTM!The
__call__method correctly captures timing at the start and records bounded auth metrics on all code paths including:
- Skip paths for health probes and metrics endpoints
- Failure paths for token extraction, token review, invalid tokens, and authorization denials
- Success path for fully authenticated and authorized requests
The delegation to
_populate_kube_admin_uidand_create_subject_access_reviewkeeps the main flow readable.src/authentication/rh_identity.py (2)
28-43: LGTM!The helper functions
_record_rh_identity_authand_get_auth_skip_tuplereduce boilerplate and improve readability by centralizing the auth module constant and skip-path logic.
372-385: LGTM!The try/except block correctly categorizes
HTTPExceptionfailures from identity validation and entitlement checks:
- 403 status →
entitlement_missing(authorization failure)- Other statuses →
invalid_identity(malformed identity data)This provides meaningful bounded labels for monitoring authentication failures.
src/app/endpoints/responses.py (4)
121-128: LGTM!The
StreamChunkProcessingResultdataclass cleanly encapsulates the multiple return values from stream chunk processing, including theinference_metric_recordedflag that prevents double-counting metrics when exceptions occur after a terminal event.
131-173: LGTM!The helper functions
_record_response_inference_resultand_check_response_quotacentralize the metric recording logic with proper timing and bounded labels. Therecord_failureflag in_record_response_inference_resultallows callers to optionally increment the failure counter alongside the duration histogram.
1030-1065: LGTM!The
response_generatorcorrectly tracks whether an inference metric was recorded viainference_metric_recorded, ensuring that if an exception occurs during stream iteration after the terminal event (e.g.,response.completed), the failure metric is not double-counted. The exception handling block only records the failure metric if it wasn't already recorded by the terminal chunk processing.
1236-1249: LGTM!The non-streaming path correctly records inference duration metrics:
- Success metric recorded immediately after successful
client.responses.create- Failure metrics recorded in each exception handler before re-raising
This ensures all inference outcomes are tracked for SLO monitoring.
Also applies to: 1269-1276, 1293-1300, 1318-1325
| def _check_infer_quota( | ||
| request: Request, auth: AuthTuple, endpoint_path: str | ||
| ) -> Optional[str]: | ||
| """Check infer quota availability and record bounded quota metrics.""" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Add docstring detail for return value.
The function docstring lacks documentation for the return value. Per coding guidelines, docstrings should include a Returns section.
📝 Suggested docstring improvement
def _check_infer_quota(
request: Request, auth: AuthTuple, endpoint_path: str
) -> Optional[str]:
- """Check infer quota availability and record bounded quota metrics."""
+ """Check infer quota availability and record bounded quota metrics.
+
+ Args:
+ request: The FastAPI request object for resolving identity context.
+ auth: Authentication tuple from the configured auth provider.
+ endpoint_path: API endpoint path for metric labeling.
+
+ Returns:
+ The resolved quota subject identifier, or None if quota is disabled.
+
+ Raises:
+ HTTPException: 429 if quota is exhausted.
+ """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/rlsapi_v1.py` around lines 538 - 541, Update the docstring
for _check_infer_quota(request: Request, auth: AuthTuple, endpoint_path: str) to
include a Returns section that documents the return type Optional[str],
specifying that it returns a string error message when quota is exceeded or
another blocking condition is detected and returns None when the check passes;
follow the project's docstring style (brief summary, Args, Returns) and place
the Returns section directly under the summary as with other endpoint helpers.
|
Closing this combined PR in favor of five focused replacement PRs:
These are independent PRs against |
Description
Adds bounded Prometheus metrics for SLO and dashboard coverage across authentication, authorization, quota checks, and LLM inference calls. This covers
/v1/responses,/v1/infer, rh-identity and other auth modules, and provider/model inference timing without adding high-cardinality labels.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run pytest tests/unit/metrics/test_recording.py tests/unit/authentication/test_utils.py tests/unit/app/endpoints/test_responses.pyuv run make verifyuv run radon cc -s src/app/endpoints/responses.py src/app/endpoints/rlsapi_v1.py src/authentication/rh_identity.py src/authentication/k8s.py src/authorization/middleware.py src/metrics/recording.pyresponses.pymatches baselineSummary by CodeRabbit
Release Notes
New Features
Tests