Skip to content

RSPEED-2943: add SLO monitoring metrics#1637

Closed
major wants to merge 3 commits intolightspeed-core:mainfrom
major:feat/slo-monitoring-metrics
Closed

RSPEED-2943: add SLO monitoring metrics#1637
major wants to merge 3 commits intolightspeed-core:mainfrom
major:feat/slo-monitoring-metrics

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 29, 2026

Description

Adds bounded Prometheus metrics for SLO and dashboard coverage across authentication, authorization, quota checks, and LLM inference calls. This covers /v1/responses, /v1/infer, rh-identity and other auth modules, and provider/model inference timing without adding high-cardinality labels.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: OpenCode, CodeRabbit local review
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2943
  • Closes # RSPEED-2943

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • uv run pytest tests/unit/metrics/test_recording.py tests/unit/authentication/test_utils.py tests/unit/app/endpoints/test_responses.py
    • 89 passed, 1 existing Authlib deprecation warning
  • uv run make verify
    • black, pylint, pyright, ruff, pydocstyle, and mypy passed
  • uv run radon cc -s src/app/endpoints/responses.py src/app/endpoints/rlsapi_v1.py src/authentication/rh_identity.py src/authentication/k8s.py src/authorization/middleware.py src/metrics/recording.py
    • new/changed helper logic stayed A/B; remaining C complexity in responses.py matches baseline
  • Local CodeRabbit review completed and actionable findings were addressed.
  • Final Oracle review passed with no blockers.

Summary by CodeRabbit

Release Notes

New Features

  • Added metrics for authentication, authorization, quota checks, and LLM inference latency
  • Introduced duration tracking for authentication and authorization operations
  • Added support for tracking quota check results and inference latency across multiple models

Tests

  • Added unit tests for metric recording functions

major added 3 commits April 29, 2026 17:25
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Walkthrough

This pull request introduces comprehensive metrics instrumentation across authentication, authorization, quota validation, and LLM inference processing paths. New Prometheus metrics track authentication attempts, authorization checks, quota availability, and inference latency. Metrics recording helpers are added, and authentication/authorization components are instrumented with monotonic timing and structured outcome categorization.

Changes

Cohort / File(s) Summary
Metrics Infrastructure
src/metrics/__init__.py, src/metrics/recording.py
Introduces new Prometheus counters and histograms for auth attempts, authorization checks, quota checks, and LLM inference duration. Adds six recording helper functions that wrap metric updates with exception handling to prevent failures from propagating.
Authentication Utilities
src/authentication/utils.py
Adds record_auth_metrics helper that records both attempt counters and duration histograms using monotonic timing, with broad exception handling to log (not raise) metric failures.
Authentication Handlers
src/authentication/api_key_token.py, src/authentication/noop.py, src/authentication/noop_with_token.py
Instruments each auth dependency with monotonic timing and metrics recording for missing tokens, validation failures, and successful authentication paths, categorizing outcomes with specific reason labels.
Authentication Handlers (Refactored)
src/authentication/jwk_token.py, src/authentication/k8s.py, src/authentication/rh_identity.py
Refactors error handling into dedicated helper functions while adding per-request timing and metrics recording for distinct failure modes (missing headers, invalid claims, JWK fetch errors, token decode errors, authorization failures).
Authorization Middleware
src/authorization/middleware.py
Instruments authorization check with monotonic timing; records denied/success/error outcomes and elapsed duration via new metrics recording functions; wraps all paths (including error cases) in try/finally for consistent metric emission.
Endpoint Quota & Inference Metrics
src/app/endpoints/rlsapi_v1.py
Adds _check_infer_quota helper wrapping quota subject resolution and check_tokens_available with duration-bounded metrics; records inference success/failure metrics on handled exception types before error handling continues.
Response Stream Processing
src/app/endpoints/responses.py
Refactors streaming SSE chunk handling into dedicated helpers; adds monotonic inference timing tracking; records inference success metrics after backend response creation and failure metrics for both streaming exceptions and non-streaming error paths.
Test Coverage
tests/unit/authentication/test_utils.py, tests/unit/metrics/test_recording.py, tests/unit/app/endpoints/test_responses.py
Adds unit tests for auth metrics utility, metric recording helpers (with mocked Prometheus interactions and exception paths), and streaming response inference failure scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'RSPEED-2943: add SLO monitoring metrics' directly and clearly describes the main purpose of the PR: adding metrics for SLO (Service Level Objective) monitoring. This aligns with the substantial changes across authentication, authorization, quota checks, and LLM inference instrumentation detailed in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 538-541: Update the docstring for _check_infer_quota(request:
Request, auth: AuthTuple, endpoint_path: str) to include a Returns section that
documents the return type Optional[str], specifying that it returns a string
error message when quota is exceeded or another blocking condition is detected
and returns None when the check passes; follow the project's docstring style
(brief summary, Args, Returns) and place the Returns section directly under the
summary as with other endpoint helpers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cd2fbd70-652a-4f70-a856-0f94e95d585a

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and cc793cd.

📒 Files selected for processing (15)
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: Pylinter
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/authentication/noop.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/authentication/api_key_token.py
  • src/authentication/noop_with_token.py
  • src/app/endpoints/rlsapi_v1.py
  • src/authentication/jwk_token.py
  • src/metrics/recording.py
  • src/metrics/__init__.py
  • src/authentication/k8s.py
  • src/authentication/rh_identity.py
  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_utils.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_utils.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (4)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/authentication/rh_identity.py
🔇 Additional comments (22)
src/metrics/recording.py (1)

114-206: Metric helper additions are consistent and failure-safe.

The new auth/authorization/quota/inference recorders keep telemetry writes non-blocking and use bounded label patterns consistently.

src/metrics/__init__.py (1)

59-106: New metric definitions align with recorder contracts.

Label sets and metric intent are coherent with the new recording helpers and SLO instrumentation goals.

src/authentication/noop.py (1)

54-67: No-op auth now captures branch-specific telemetry cleanly.

Both empty-user-id failure and normal bypass paths are instrumented with bounded outcome/reason labels.

src/authentication/utils.py (1)

44-66: Shared auth-metric helper is a solid consolidation.

This removes duplication across auth dependencies while preserving non-blocking telemetry behavior.

tests/unit/authentication/test_utils.py (1)

47-59: New unit test covers the helper’s essential contract.

It verifies both metric calls and confirms elapsed duration is computed from monotonic time.

src/authorization/middleware.py (1)

129-172: Authorization middleware instrumentation is well placed.

The finally-based recording ensures outcome/duration metrics are emitted across all control paths.

tests/unit/app/endpoints/test_responses.py (1)

1270-1320: Great coverage for streaming iterator failure telemetry.

The new parametrized test validates failure propagation and confirms a single inference-failure metric emission.

src/authentication/noop_with_token.py (1)

65-89: No-op-with-token metrics are integrated cleanly across all branches.

Success and error paths now emit consistent auth telemetry without altering endpoint-facing behavior.

src/app/endpoints/rlsapi_v1.py (1)

458-460: LGTM!

The LLM inference duration metrics are correctly recorded on both success and failure paths, using bounded result labels ("success" and "failure"). The metric helper signature matches the contract in src/metrics/recording.py.

Also applies to: 754-756

src/authentication/api_key_token.py (1)

64-88: LGTM!

The authentication metrics instrumentation correctly:

  • Captures start_time using time.monotonic() at the start of the auth flow
  • Records metrics with bounded labels before re-raising exceptions on failure paths
  • Records success metrics immediately before returning the auth tuple

The pattern is consistent with other auth dependencies in this PR.

src/authentication/jwk_token.py (2)

144-220: LGTM!

The refactored helper functions cleanly separate concerns and ensure metrics are recorded before raising exceptions in all failure paths. The cause_map pattern in _decode_jwk_claims provides clear, maintainable error messaging.


271-301: LGTM!

The __call__ method correctly captures timing at the start and records bounded auth metrics on all code paths:

  • missing_header when Authorization header is absent
  • missing_token when token extraction fails
  • Delegates to helpers for JWK fetch, decode, validation, and claim extraction errors
  • authenticated on successful completion
tests/unit/metrics/test_recording.py (2)

198-236: LGTM!

The parametrized test structure using dataclasses is clean and extensible. Testing both success and failure paths in a single test function with mock reset is a pragmatic approach that ensures both code paths are covered.


290-320: LGTM!

The quota check test correctly verifies that both the counter and histogram metrics are updated on success, and that a warning is logged (without crashing) when either metric operation fails. The parametrized failing_metric approach elegantly covers both failure scenarios.

src/authentication/k8s.py (2)

389-420: LGTM!

The refactored helpers cleanly encapsulate the kube:admin UID population and SubjectAccessReview creation logic while ensuring metrics are recorded before raising exceptions. The exception categorization (k8s_api_unavailable, k8s_config_error, authorization_check_error, unexpected_error) provides useful bounded labels for monitoring.

Also applies to: 423-459


513-566: LGTM!

The __call__ method correctly captures timing at the start and records bounded auth metrics on all code paths including:

  • Skip paths for health probes and metrics endpoints
  • Failure paths for token extraction, token review, invalid tokens, and authorization denials
  • Success path for fully authenticated and authorized requests

The delegation to _populate_kube_admin_uid and _create_subject_access_review keeps the main flow readable.

src/authentication/rh_identity.py (2)

28-43: LGTM!

The helper functions _record_rh_identity_auth and _get_auth_skip_tuple reduce boilerplate and improve readability by centralizing the auth module constant and skip-path logic.


372-385: LGTM!

The try/except block correctly categorizes HTTPException failures from identity validation and entitlement checks:

  • 403 status → entitlement_missing (authorization failure)
  • Other statuses → invalid_identity (malformed identity data)

This provides meaningful bounded labels for monitoring authentication failures.

src/app/endpoints/responses.py (4)

121-128: LGTM!

The StreamChunkProcessingResult dataclass cleanly encapsulates the multiple return values from stream chunk processing, including the inference_metric_recorded flag that prevents double-counting metrics when exceptions occur after a terminal event.


131-173: LGTM!

The helper functions _record_response_inference_result and _check_response_quota centralize the metric recording logic with proper timing and bounded labels. The record_failure flag in _record_response_inference_result allows callers to optionally increment the failure counter alongside the duration histogram.


1030-1065: LGTM!

The response_generator correctly tracks whether an inference metric was recorded via inference_metric_recorded, ensuring that if an exception occurs during stream iteration after the terminal event (e.g., response.completed), the failure metric is not double-counted. The exception handling block only records the failure metric if it wasn't already recorded by the terminal chunk processing.


1236-1249: LGTM!

The non-streaming path correctly records inference duration metrics:

  • Success metric recorded immediately after successful client.responses.create
  • Failure metrics recorded in each exception handler before re-raising

This ensures all inference outcomes are tracked for SLO monitoring.

Also applies to: 1269-1276, 1293-1300, 1318-1325

Comment on lines +538 to +541
def _check_infer_quota(
request: Request, auth: AuthTuple, endpoint_path: str
) -> Optional[str]:
"""Check infer quota availability and record bounded quota metrics."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Add docstring detail for return value.

The function docstring lacks documentation for the return value. Per coding guidelines, docstrings should include a Returns section.

📝 Suggested docstring improvement
 def _check_infer_quota(
     request: Request, auth: AuthTuple, endpoint_path: str
 ) -> Optional[str]:
-    """Check infer quota availability and record bounded quota metrics."""
+    """Check infer quota availability and record bounded quota metrics.
+
+    Args:
+        request: The FastAPI request object for resolving identity context.
+        auth: Authentication tuple from the configured auth provider.
+        endpoint_path: API endpoint path for metric labeling.
+
+    Returns:
+        The resolved quota subject identifier, or None if quota is disabled.
+
+    Raises:
+        HTTPException: 429 if quota is exhausted.
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 538 - 541, Update the docstring
for _check_infer_quota(request: Request, auth: AuthTuple, endpoint_path: str) to
include a Returns section that documents the return type Optional[str],
specifying that it returns a string error message when quota is exceeded or
another blocking condition is detected and returns None when the check passes;
follow the project's docstring style (brief summary, Args, Returns) and place
the Returns section directly under the summary as with other endpoint helpers.

@major
Copy link
Copy Markdown
Contributor Author

major commented Apr 29, 2026

Closing this combined PR in favor of five focused replacement PRs:

These are independent PRs against main so reviewers can handle each metric area separately.

@major major closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant