Skip to content

RSPEED-2957: add endpoint label to LLM Prometheus metrics#1624

Merged
tisnik merged 2 commits into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2849-llm-metrics
Apr 29, 2026
Merged

RSPEED-2957: add endpoint label to LLM Prometheus metrics#1624
tisnik merged 2 commits into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2849-llm-metrics

Conversation

@Lifto
Copy link
Copy Markdown
Contributor

@Lifto Lifto commented Apr 28, 2026

Summary

  • Adds an endpoint label to all 5 LLM Prometheus metrics (llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total, llm_token_sent_total, llm_token_received_total) to enable per-endpoint traffic differentiation in Grafana
  • Threads endpoint_path: str through all call chains from the 4 endpoint handlers (/v1/responses, /v1/infer, /v1/streaming_query, /v1/query) down to the metric increment sites
  • Updates unit tests for _increment_llm_call_metric, extract_token_usage, and run_shield_moderation

Why

RSPEED-2849 — enables CLA vs Goose traffic differentiation in Grafana dashboards by labeling each LLM metric with the originating API endpoint.

Cardinality

The endpoint label has exactly 4 bounded values:

  • /v1/infer
  • /v1/responses
  • /v1/streaming_query
  • /v1/query

No cardinality explosion risk.

Stacking

This PR stacks on #1620 (feat/rspeed-2849-user-agent). Please merge #1620 first.

Summary by CodeRabbit

Release Notes

  • Chores
    • Enhanced metrics collection infrastructure to track API endpoint information, enabling improved monitoring and observability of system operations across all endpoints.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@Lifto has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 51 minutes and 58 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4a2423b3-23a8-402a-931a-6d389dd7e93c

📥 Commits

Reviewing files that changed from the base of the PR and between 6827c1f and ee9570a.

📒 Files selected for processing (4)
  • src/app/endpoints/responses.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/recording.py
  • tests/unit/app/endpoints/test_streaming_query.py

Walkthrough

The changes introduce endpoint-path tracking throughout the LLM metrics and moderation system. Endpoint identifiers (e.g., /v1/query, /v1/responses) are now propagated from handler entry points through shield moderation, token usage extraction, and metrics recording, enabling endpoint-specific observability.

Changes

Cohort / File(s) Summary
Endpoint Handlers
src/app/endpoints/query.py, src/app/endpoints/responses.py, src/app/endpoints/rlsapi_v1.py, src/app/endpoints/streaming_query.py
Thread constant endpoint_path values through request handling pipelines, passing them into shield moderation, token usage extraction, and turn summary construction.
Metrics Infrastructure
src/metrics/__init__.py, src/metrics/recording.py
Add endpoint label to Prometheus counters and update recording functions (record_llm_call, record_llm_failure, record_llm_validation_error, record_llm_token_usage) to accept and apply endpoint_path as a metric label.
Utility Functions
src/utils/responses.py, src/utils/shields.py
Extend extract_token_usage and build_turn_summary with endpoint_path parameter; extend run_shield_moderation to accept endpoint_path and apply it to validation error metrics.
Unit Tests
tests/unit/app/endpoints/test_metrics.py, tests/unit/utils/test_responses.py, tests/unit/utils/test_shields.py, tests/unit/metrics/test_recording.py
Update test invocations and mocks to supply and assert endpoint-path arguments in token extraction, shield moderation, and metrics recording calls.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as Endpoint Handler
    participant Shield as Shield Moderation
    participant TokenExt as Token Extraction
    participant Metrics as Metrics Recording
    
    Handler->>Handler: Set endpoint_path constant
    Handler->>Shield: run_shield_moderation(input, endpoint_path)
    Shield->>Metrics: record_llm_validation_error(endpoint_path)
    
    Handler->>TokenExt: extract_token_usage(usage, model, endpoint_path)
    TokenExt->>Metrics: record_llm_call(provider, model, endpoint_path)
    TokenExt->>Metrics: record_llm_token_usage(provider, model, tokens, endpoint_path)
    
    Metrics-->>Metrics: Label counter with endpoint
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through endpoints, threading paths so fine,
Metrics now know where they've been—each query gets a sign.
From shields to tokens tallied, the journey's now on track,
With labels for the endpoints, no data goes to black!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the primary change: adding an endpoint label to LLM Prometheus metrics across all relevant modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 51 minutes and 58 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/streaming_query.py (1)

330-335: 🛠️ Refactor suggestion | 🟠 Major

Docstrings need the new endpoint_path parameter.

Both updated function signatures include endpoint_path, but their Args sections do not document it.

Proposed docstring update
     Args:
         responses_params: The Responses API parameters
         context: The response generator context
+        endpoint_path: API endpoint path used for metric labeling.
@@
     Args:
         turn_response: The streaming response from Llama Stack
         context: The response generator context
         turn_summary: TurnSummary to populate during streaming
+        endpoint_path: API endpoint path used for metric labeling.

As per coding guidelines, "All functions require docstrings with brief descriptions and complete type annotations for parameters and return types".

Also applies to: 700-704

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/streaming_query.py` around lines 330 - 335, Two functions
in this module now accept the endpoint_path parameter but their docstrings were
not updated; add an Args entry for endpoint_path to each affected function's
docstring describing it and its type (e.g., endpoint_path: str — the request
endpoint path used by the response generator), keep the existing parameter order
and type annotations intact, and ensure the Args section format matches the
other parameters (including the tuple[AsyncIterator[str], TurnSummary] return
description) for both functions whose signatures were changed to include
endpoint_path.
src/utils/shields.py (1)

74-83: ⚠️ Potential issue | 🟠 Major

detect_shield_violations uses unlabeled .inc() on a labeled Counter at line 80.

The metric llm_calls_validation_errors_total requires an endpoint label (defined in src/metrics/__init__.py line 44), but line 80 calls .inc() without labels. This causes a Prometheus ValueError at runtime when the code path is executed. The function must receive endpoint_path and call labels(endpoint_path).inc() to match the pattern used correctly in run_shield_moderation (line 183).

Proposed fix
-def detect_shield_violations(output_items: list[Any]) -> bool:
+def detect_shield_violations(output_items: list[Any], endpoint_path: str) -> bool:
@@
-                metrics.llm_calls_validation_errors_total.inc()
+                metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc()

Update the function docstring to document the new endpoint_path parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/shields.py` around lines 74 - 83, The detect_shield_violations
function is incrementing a labeled Counter with .inc() without labels causing a
Prometheus ValueError; add an endpoint_path parameter to
detect_shield_violations, update its docstring to document endpoint_path, and
replace metrics.llm_calls_validation_errors_total.inc() with
metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the
metric is incremented with the required endpoint label (referencing
detect_shield_violations and metrics.llm_calls_validation_errors_total).
src/utils/responses.py (1)

1450-1499: ⚠️ Potential issue | 🟠 Major

Keep build_turn_summary() free of metric side effects.

This helper still calls extract_token_usage(), so it increments llm_calls_total and token counters while building a summary. Callers in src/app/endpoints/responses.py (Lines 1079-1081) and src/app/endpoints/rlsapi_v1.py (Line 722) already extracted usage earlier, so those paths are counted twice; the non-streaming blocked /v1/responses path also turns a synthetic zero-usage response into a real LLM call.

♻️ Suggested direction
 def build_turn_summary(  # pylint: disable=too-many-arguments,too-many-positional-arguments
     response: Optional[OpenAIResponseObject],
     model: str,
     endpoint_path: str,
     vector_store_ids: Optional[list[str]] = None,
     rag_id_mapping: Optional[dict[str, str]] = None,
     filter_server_tools: bool = False,
+    token_usage: Optional[TokenCounter] = None,
 ) -> TurnSummary:
@@
-    summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping)
-    summary.token_usage = extract_token_usage(response.usage, model, endpoint_path)
+    summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping)
+    summary.token_usage = token_usage or extract_token_usage(
+        response.usage, model, endpoint_path
+    )
     return summary

Then pass the already-computed token_usage from call sites that need it before summary construction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1450 - 1499, build_turn_summary
currently has a side effect of calling extract_token_usage (incrementing
metrics); remove that call and add an explicit token_usage parameter (e.g.,
token_usage: Optional[TokenUsage] = None) so callers supply the already-computed
usage instead of rebuilding it; update call sites that previously relied on
build_turn_summary (notably the handlers in src/app/endpoints/responses.py and
src/app/endpoints/rlsapi_v1.py) to compute token_usage once via
extract_token_usage(response.usage, model, endpoint_path) and pass it into
build_turn_summary, and ensure build_turn_summary simply assigns the provided
token_usage to summary.token_usage without calling extract_token_usage itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 318-323: The function signature for retrieve_response_generator
currently uses an empty-string default for endpoint_path which can emit
accidental endpoint="" metrics; change the signature to require endpoint_path:
str (remove the = "" default) in retrieve_response_generator and the other
similar function at the second occurrence (lines ~688-693), then update all call
sites to pass a concrete endpoint_path value and adjust any tests or callers
that relied on the default; also update related docstrings/comments to reflect
the parameter is required.

In `@src/metrics/__init__.py`:
- Around line 31-58: The endpoint label values used by the Prometheus counters
llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total,
llm_token_sent_total and llm_token_received_total must be centralized: define a
shared set/enum/Literal of allowed endpoint strings in constants.py (or reuse
existing ones there) and replace any hardcoded/inline endpoint values at each
metric emission site to reference that central constant to prevent new series
from typos/empties; update any code that constructs the label value for the
"endpoint" label to validate/normalize against that central constant set and
fall back to a single explicit "unknown" constant if needed.

---

Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 330-335: Two functions in this module now accept the endpoint_path
parameter but their docstrings were not updated; add an Args entry for
endpoint_path to each affected function's docstring describing it and its type
(e.g., endpoint_path: str — the request endpoint path used by the response
generator), keep the existing parameter order and type annotations intact, and
ensure the Args section format matches the other parameters (including the
tuple[AsyncIterator[str], TurnSummary] return description) for both functions
whose signatures were changed to include endpoint_path.

In `@src/utils/responses.py`:
- Around line 1450-1499: build_turn_summary currently has a side effect of
calling extract_token_usage (incrementing metrics); remove that call and add an
explicit token_usage parameter (e.g., token_usage: Optional[TokenUsage] = None)
so callers supply the already-computed usage instead of rebuilding it; update
call sites that previously relied on build_turn_summary (notably the handlers in
src/app/endpoints/responses.py and src/app/endpoints/rlsapi_v1.py) to compute
token_usage once via extract_token_usage(response.usage, model, endpoint_path)
and pass it into build_turn_summary, and ensure build_turn_summary simply
assigns the provided token_usage to summary.token_usage without calling
extract_token_usage itself.

In `@src/utils/shields.py`:
- Around line 74-83: The detect_shield_violations function is incrementing a
labeled Counter with .inc() without labels causing a Prometheus ValueError; add
an endpoint_path parameter to detect_shield_violations, update its docstring to
document endpoint_path, and replace
metrics.llm_calls_validation_errors_total.inc() with
metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the
metric is incremented with the required endpoint label (referencing
detect_shield_violations and metrics.llm_calls_validation_errors_total).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 98aed2a2-7c3d-43f5-9f88-cb41d554dc91

📥 Commits

Reviewing files that changed from the base of the PR and between 93b2bb5 and 8ae0d65.

📒 Files selected for processing (14)
  • AGENTS.md
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/__init__.py
  • src/observability/formats/responses.py
  • src/utils/responses.py
  • src/utils/shields.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/observability/formats/test_responses.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_metrics.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)

Import FastAPI dependencies from fastapi module (e.g., APIRouter, HTTPException, Request, status, Depends)

Import Llama Stack client as from llama_stack_client import AsyncLlamaStackClient

All modules must start with descriptive docstrings explaining purpose

Use logger = get_logger(__name__) from log.py for module logging

Use Final[type] as type hint for all constants

Use @field_validator and @model_validator for custom validation in Pydantic models

All functions require docstrings with brief descriptions and complete type annotations for parameters and return types

Use typing_extensions.Self type hint for model validators

Use modern Union syntax str | int instead of Union[str, int] for type hints

Use Optional[Type] for nullable types in function signatures

Use snake_case naming with descriptive, action-oriented function names (e.g., get_, validate_, check_)

Avoid in-place parameter modification anti-patterns - return new data structures instead of modifying input parameters

Use async def for I/O operations and external API calls in Python functions

Handle APIConnectionError from Llama Stack in error handling logic

Use standard log levels with clear purposes: debug() for diagnostics, info() for general execution, warning() for unexpected events, error() for serious problems

All classes require descriptive docstrings explaining purpose

Use PascalCase class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface

Pydantic configuration classes must extend ConfigurationBase, data models must extend BaseModel

Abstract classes must use ABC with @abstractmethod decorators for interface definitions

Pydantic models must use @model_validator and @field_validator for validation logic

Use complete type annotations for all class attributes with specific types...

Files:

  • src/observability/formats/responses.py
  • src/utils/shields.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/__init__.py
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • src/app/endpoints/rlsapi_v1.py
AGENTS.md

📄 CodeRabbit inference engine (CLAUDE.md)

Document agent implementations and their configurations in AGENTS.md

Files:

  • AGENTS.md
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest for all unit and integration tests, do not use unittest framework

Use pytest-mock for AsyncMock objects in test files

Use pytest.mark.asyncio marker for asynchronous test functions

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/observability/formats/test_responses.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files must contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always check `pyproject.toml` for existing dependencies before adding new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always verify current library versions in `pyproject.toml` rather than assuming versions
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Check `constants.py` for shared constants before defining new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Unit tests require 60% code coverage, integration tests require 10% coverage
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make format` to auto-format code with black and ruff before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make verify` to execute all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Create unit tests for all new code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Ensure all tests pass before submission
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use pylint with `source-roots = "src"` configuration for static analysis
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Never commit secrets/keys, use environment variables for sensitive data
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use bandit for security issue detection in code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: PR titles MUST start with a JIRA issue key prefix (LCORE-, RSPEED-, MGTM-, OLS-, RHDHPAI-, LEADS-)
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use `uv sync --group dev --group llslibdev` for setting up development dependencies
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always use `uv run` prefix for executing commands in the project
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Follow existing code patterns in the module you're modifying
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • AGENTS.md
📚 Learning: 2026-03-02T15:57:10.746Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-stack PR: 1250
File: AGENTS.md:11-12
Timestamp: 2026-03-02T15:57:10.746Z
Learning: In AGENTS.md, Linting Tools section should list actual tool names (e.g., mypy, pylint, pyright, ruff, black) instead of make target names (e.g., check-types). Ensure the section documents the individual tools that perform linting, not the Makefile targets that invoke them, so readers understand which tools are used and how to configure/run them directly.

Applied to files:

  • AGENTS.md
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import FastAPI dependencies with: `from fastapi import APIRouter, HTTPException, Request, status, Depends`

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
🪛 LanguageTool
AGENTS.md

[uncategorized] ~198-~198: The official name of this software platform is spelled with a capital “H”.
Context: ...es this via pr-title-checker (config: .github/pr-title-checker-config.json). Allowe...

(GITHUB)

🔇 Additional comments (7)
AGENTS.md (1)

196-204: Nice addition for CI title enforcement visibility.

This is clear and gives contributors concrete pass/fail examples for PR naming.

src/utils/shields.py (1)

122-126: Good propagation of endpoint label into shield moderation metrics.

The new endpoint_path threading and labeled increment in the blocked branch are aligned with the endpoint-level observability objective.

Also applies to: 181-184

tests/unit/app/endpoints/test_responses_splunk.py (1)

721-773: User-Agent sanitization tests are well-targeted.

These cases cover the important behavioral edges (empty, control chars, truncation, valid pass-through).

tests/unit/utils/test_responses.py (1)

2232-2300: Good update of metric tests for endpoint-labeled counters.

The call signatures and label assertions are consistent with the new endpoint dimension.

Also applies to: 2553-2596

tests/unit/utils/test_shields.py (1)

121-123: Nice synchronization of shield tests with endpoint-labeled metrics.

The updated call signatures and label assertion reduce drift with the production metric behavior.

Also applies to: 152-154, 194-201, 234-236, 265-267, 331-333, 349-350, 382-383

src/observability/formats/responses.py (1)

27-27: Telemetry schema and payload mapping for user_agent look good.

Defaulting to None keeps compatibility while enabling the new field.

Also applies to: 49-49

tests/unit/observability/formats/test_responses.py (1)

23-35: Good coverage for user_agent defaulting and payload emission.

These tests validate both explicit and omitted-field behavior cleanly.

Also applies to: 117-163

Comment thread src/app/endpoints/streaming_query.py
Comment thread src/metrics/__init__.py
Comment on lines 31 to +58
llm_calls_total = Counter(
"ls_llm_calls_total", "LLM calls counter", ["provider", "model"]
"ls_llm_calls_total", "LLM calls counter", ["provider", "model", "endpoint"]
)

# Metric that counts how many LLM calls failed
llm_calls_failures_total = Counter(
"ls_llm_calls_failures_total", "LLM calls failures", ["provider", "model"]
"ls_llm_calls_failures_total",
"LLM calls failures",
["provider", "model", "endpoint"],
)

# Metric that counts how many LLM calls had validation errors
llm_calls_validation_errors_total = Counter(
"ls_llm_validation_errors_total", "LLM validation errors"
"ls_llm_validation_errors_total", "LLM validation errors", ["endpoint"]
)

# TODO(lucasagomes): Add metric for token usage
# https://issues.redhat.com/browse/LCORE-411
llm_token_sent_total = Counter(
"ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model"]
"ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model", "endpoint"]
)

# TODO(lucasagomes): Add metric for token usage
# https://issues.redhat.com/browse/LCORE-411
llm_token_received_total = Counter(
"ls_llm_token_received_total", "LLM tokens received", ["provider", "model"]
"ls_llm_token_received_total",
"LLM tokens received",
["provider", "model", "endpoint"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Centralize the allowed endpoint label values.

These counters now depend on endpoint strings supplied from several call chains, so a typo or empty fallback will create a new Prometheus series silently. Please move the four supported endpoint paths behind shared constants (or an enum/Literal) and reuse them at every metric call site to preserve the bounded-cardinality guarantee.

Based on learnings: Check constants.py for shared constants before defining new ones.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/metrics/__init__.py` around lines 31 - 58, The endpoint label values used
by the Prometheus counters llm_calls_total, llm_calls_failures_total,
llm_calls_validation_errors_total, llm_token_sent_total and
llm_token_received_total must be centralized: define a shared set/enum/Literal
of allowed endpoint strings in constants.py (or reuse existing ones there) and
replace any hardcoded/inline endpoint values at each metric emission site to
reference that central constant to prevent new series from typos/empties; update
any code that constructs the label value for the "endpoint" label to
validate/normalize against that central constant set and fall back to a single
explicit "unknown" constant if needed.

@Lifto Lifto force-pushed the feat/rspeed-2849-llm-metrics branch from 8ae0d65 to 6827c1f Compare April 29, 2026 17:52
@Lifto Lifto changed the title RSPEED-2849: add endpoint label to LLM Prometheus metrics RSPEED-2957: add endpoint label to LLM Prometheus metrics Apr 29, 2026
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 29, 2026

Re: centralizing endpoint label values as constants — passing on this one. The endpoint_path values come from request.url.path at the handler level, not hardcoded strings that could be typo'd. An enum/constant set wouldn't prevent the actual cardinality risk since the values are runtime-derived. The required parameter change (removing = "" defaults) already prevents the silent empty-label case.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/app/endpoints/streaming_query.py (1)

330-335: Add endpoint_path to the docstring Args section.

The docstring for retrieve_response_generator documents the function parameters but doesn't include the newly added endpoint_path parameter.

Proposed docstring update
     Args:
         responses_params: The Responses API parameters
         context: The response generator context
+        endpoint_path: The API endpoint path for metric labeling.
     Returns:
         tuple[AsyncIterator[str], TurnSummary]: The response generator and turn summary
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/streaming_query.py` around lines 330 - 335, Update the
docstring for retrieve_response_generator to include the new endpoint_path
parameter in the Args section: add a line describing endpoint_path (type: str)
and its purpose (the API endpoint path used to build responses), keeping the
existing format used for responses_params and context and preserving the Returns
section (tuple[AsyncIterator[str], TurnSummary]) and overall docstring style.
src/app/endpoints/responses.py (1)

410-411: Consider making endpoint_path a required parameter to prevent accidental empty-label metrics.

The default value endpoint_path: str = "" on these internal functions could silently emit an endpoint="" metric series if a caller omits the argument. Since the PR aims for exactly four bounded endpoint values, making this parameter required would catch omissions at development time.

However, all call sites in this file correctly pass explicit values, so this is a defensive suggestion rather than a current bug.

Proposed signature changes
 async def handle_streaming_response(
     ...
     user_agent: Optional[str] = None,
-    endpoint_path: str = "",
+    endpoint_path: str,
 ) -> StreamingResponse:
 async def response_generator(
     ...
     filter_server_tools: bool = False,
-    endpoint_path: str = "",
+    endpoint_path: str,
 ) -> AsyncIterator[str]:
 async def handle_non_streaming_response(
     ...
     user_agent: Optional[str] = None,
-    endpoint_path: str = "",
+    endpoint_path: str,
 ) -> ResponsesResponse:

Also applies to: 806-807, 1019-1020

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 410 - 411, Make endpoint_path
required (remove the default empty string) on the internal
StreamingResponse-producing functions that currently declare endpoint_path: str
= "" so callers cannot accidentally omit it; change the signature to
endpoint_path: str (for each occurrence that returns StreamingResponse and uses
the endpoint_path parameter), update any related type hints/imports if needed,
and ensure all call sites (including the other two occurrences) still
compile/run and tests/type checks are updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/app/endpoints/responses.py`:
- Around line 410-411: Make endpoint_path required (remove the default empty
string) on the internal StreamingResponse-producing functions that currently
declare endpoint_path: str = "" so callers cannot accidentally omit it; change
the signature to endpoint_path: str (for each occurrence that returns
StreamingResponse and uses the endpoint_path parameter), update any related type
hints/imports if needed, and ensure all call sites (including the other two
occurrences) still compile/run and tests/type checks are updated.

In `@src/app/endpoints/streaming_query.py`:
- Around line 330-335: Update the docstring for retrieve_response_generator to
include the new endpoint_path parameter in the Args section: add a line
describing endpoint_path (type: str) and its purpose (the API endpoint path used
to build responses), keeping the existing format used for responses_params and
context and preserving the Returns section (tuple[AsyncIterator[str],
TurnSummary]) and overall docstring style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b5b5ec1-3397-465b-9779-1880cce7cf23

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae0d65 and 6827c1f.

📒 Files selected for processing (12)
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • src/utils/responses.py
  • src/utils/shields.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_metrics.py
✅ Files skipped from review due to trivial changes (1)
  • src/metrics/init.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/utils/shields.py
  • src/app/endpoints/query.py
  • tests/unit/utils/test_shields.py
  • tests/unit/utils/test_responses.py

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make sense, TYVM

@tisnik tisnik merged commit ca125c4 into lightspeed-core:main Apr 29, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants