RSPEED-2943: add Responses API inference metrics#1641
RSPEED-2943: add Responses API inference metrics#1641tisnik merged 1 commit intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
WalkthroughThis PR adds monotonic inference timing capture and metric recording to the Responses API endpoints for both streaming and non-streaming calls, with a new metric label normalization layer to bound result labels to allowed values ("success" or "failure"). ChangesLLM Inference Timing & Metric Recording
Sequence DiagramsequenceDiagram
participant Handler as Request Handler
participant Client as Backend Client
participant Generator as Stream Generator
participant Metrics as Metrics Recorder
Handler->>Handler: capture inference_start_time
Handler->>Client: create(...)
Client-->>Generator: return async stream
Handler->>Generator: start consuming chunks
loop Process Chunks
Generator->>Generator: yield chunk
alt Terminal Chunk
Generator->>Metrics: record_llm_inference_duration(result=success/failure, duration)
Metrics->>Metrics: normalize_llm_inference_result()
Metrics->>Metrics: emit histogram
end
end
alt Stream Error Before Terminal Event
Generator->>Generator: exception raised
Generator->>Metrics: record_llm_inference_duration(result=failure, duration)
Metrics->>Metrics: normalize & emit
Generator->>Handler: re-raise exception
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
15beb51 to
c00a020
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/responses.py (1)
1216-1318:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid emitting inference failure metrics after a successful backend call.
At Line 1223, success is already recorded right after
client.responses.create(...). Because the sametrycontinues through local post-processing, any later exception (e.g., append/persistence path) enters theseexceptblocks and records a failure for the same inference call.💡 Suggested fix (gate failure recording once success is emitted)
- else: - inference_start_time = time.monotonic() + else: + inference_start_time = time.monotonic() + inference_metric_recorded = False try: api_response = cast( OpenAIResponseObject, await client.responses.create( **api_params.model_dump(exclude_none=True) ), ) _record_response_inference_result( api_params.model, endpoint_path, "success", time.monotonic() - inference_start_time, ) + inference_metric_recorded = True token_usage = extract_token_usage( api_response.usage, api_params.model, endpoint_path ) @@ except RuntimeError as e: - _record_response_inference_result( - api_params.model, - endpoint_path, - "failure", - time.monotonic() - inference_start_time, - record_failure=True, - ) + if not inference_metric_recorded: + _record_response_inference_result( + api_params.model, + endpoint_path, + "failure", + time.monotonic() - inference_start_time, + record_failure=True, + ) @@ except APIConnectionError as e: - _record_response_inference_result( - api_params.model, - endpoint_path, - "failure", - time.monotonic() - inference_start_time, - record_failure=True, - ) + if not inference_metric_recorded: + _record_response_inference_result( + api_params.model, + endpoint_path, + "failure", + time.monotonic() - inference_start_time, + record_failure=True, + ) @@ except (LLSApiStatusError, OpenAIAPIStatusError) as e: - _record_response_inference_result( - api_params.model, - endpoint_path, - "failure", - time.monotonic() - inference_start_time, - record_failure=True, - ) + if not inference_metric_recorded: + _record_response_inference_result( + api_params.model, + endpoint_path, + "failure", + time.monotonic() - inference_start_time, + record_failure=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 1216 - 1318, The code records success immediately after client.responses.create(...) but later post-processing (e.g., extract_text_from_response_items, consume_query_tokens, append_turn_items_to_conversation) can raise and trigger the except blocks which then emit a failure metric; add a guard so failure is only recorded if success wasn't already emitted—set a local flag like success_recorded = False before the try, set it True right after the _record_response_inference_result(...) success call (the call that uses api_params.model and endpoint_path), and in every except handler check that flag before calling _record_response_inference_result(..., "failure", ...). This preserves the current success metric when post-response processing fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 1216-1318: The code records success immediately after
client.responses.create(...) but later post-processing (e.g.,
extract_text_from_response_items, consume_query_tokens,
append_turn_items_to_conversation) can raise and trigger the except blocks which
then emit a failure metric; add a guard so failure is only recorded if success
wasn't already emitted—set a local flag like success_recorded = False before the
try, set it True right after the _record_response_inference_result(...) success
call (the call that uses api_params.model and endpoint_path), and in every
except handler check that flag before calling
_record_response_inference_result(..., "failure", ...). This preserves the
current success metric when post-response processing fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6e26c242-f7f0-432b-81d3-31e5059dd352
📒 Files selected for processing (5)
src/app/endpoints/responses.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: bandit
- GitHub Check: integration_tests (3.13)
- GitHub Check: integration_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/metrics/__init__.pysrc/metrics/recording.pysrc/app/endpoints/responses.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI dependencies:
from fastapi import APIRouter, HTTPException, Request, status, Depends
Files:
src/app/endpoints/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (4)
tests/unit/app/endpoints/test_responses.py (1)
1270-1319: Strong coverage for streaming iterator failure metrics.Nice addition: this asserts both exception propagation and the single failure metric emission with
record_failure=True.tests/unit/metrics/test_recording.py (1)
186-237: Good shared test pattern for histogram recorders.The table-driven structure keeps success/failure-path validation concise while still checking warning behavior.
src/metrics/__init__.py (1)
75-83: Histogram definition looks consistent with the new inference recorder.Label set and bucket wiring are aligned with the recording helper contract.
src/metrics/recording.py (1)
135-154: Bounded result normalization and metric recording are implemented cleanly.This keeps label cardinality controlled while preserving non-fatal metric error handling.
b7af86f to
e53dfb8
Compare
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me verify the fix in the current branch. 🏁 Script executed: Length of output: 5709
The guard flag cleanly isolates inference failures from post-processing failures ( If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
tisnik
left a comment
There was a problem hiding this comment.
Please ignore docs/demos directory completely - the sources are incorrect there on purpose.
2b79abf to
1b033fa
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 1034-1074: The streaming path records the inference metric only
after post-processing which can throw and cause a spurious failure metric; move
the success/failure metric recording to immediately after casting
latest_response_object (after latest_response_object = cast(...)) so the LLM
terminal event is recorded before
extract_token_usage/consume_query_tokens/get_available_quotas/extract_text_from_response_items
run, calling _record_response_inference_result(api_params.model,
context.endpoint_path, "failure" if chunk.type == "response.failed" else
"success", time.monotonic() - inference_start_time, record_failure=(chunk.type
== "response.failed")) and set inference_metric_recorded = True there, and
ensure the later metric-call is removed or guarded to avoid double-recording.
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 2702-2761: Add assertions that _record_response_inference_result
is invoked for non-streaming paths: patch the module-level
_record_response_inference_result in the TestHandleNonStreamingResponse tests
(e.g., test_handle_non_streaming_success_returns_response and a non-streaming
error test) using mocker.patch(f"{MODULE}._record_response_inference_result")
and assert_called_once_with (or assert_called_once) verifying the expected
arguments (result="success" for success path, "failure" and record_failure=True
for error path); update those tests to build api_params/context the same way and
then check the patched _record_response_inference_result was called with the
correct positional and keyword args.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e11706d-2f5f-4529-962d-feba65eec42b
📒 Files selected for processing (4)
src/app/endpoints/responses.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: list_outdated_dependencies
- GitHub Check: black
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pyright
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/metrics/recording.pysrc/app/endpoints/responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI dependencies:
from fastapi import APIRouter, HTTPException, Request, status, Depends
Files:
src/app/endpoints/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (4)
src/metrics/recording.py (1)
115-160: LGTM — normalization layer and bounded metric recording are well-structured.Constants use
Final[frozenset[str]]for O(1) lookup and immutability,normalize_llm_inference_resultis correctly documented with an explicitReturns:section, andbounded_resultis computed outside thetryblock (no risk of suppression). Good defensive design.tests/unit/metrics/test_recording.py (1)
221-236: LGTM — precisely tests the label-bounding behavior.The test exercises the only missing dimension (out-of-set result →
"failure") while the parametrized test above covers the passthrough of valid"success". The duration is verified to remain unchanged, confirming the normalization only touches the label.src/app/endpoints/responses.py (2)
635-660: LGTM —_record_response_inference_resultis well-factored.Clean helper that centralises provider/model extraction, failure counter, and duration histogram recording in one place. The
record_failure=Falsedefault means callers on the success path don't need to pass it explicitly.
1177-1224: LGTM — non-streaming metric recording is correctly placed.Success is recorded immediately after
create()returns and before post-processing (extract_token_usage,consume_query_tokens). Theinference_metric_recordedguard prevents double-recording in all three exception types, and therecord_failure=Trueis correctly passed only in the failure branch.
1b033fa to
23ea1f1
Compare
23ea1f1 to
31849b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 1045-1052: Replace the raw "success"/"failure" string literals
passed to _record_response_inference_result with the shared constants
recording.LLM_INFERENCE_RESULT_SUCCESS and
recording.LLM_INFERENCE_RESULT_FAILURE: determine the result from chunk.type as
before but map it to recording.LLM_INFERENCE_RESULT_FAILURE when chunk.type ==
"response.failed" else recording.LLM_INFERENCE_RESULT_SUCCESS, and pass that
constant into _record_response_inference_result along with the same
api_params.model, context.endpoint_path, elapsed time, and record_failure flag;
apply the same replacement in the other occurrences noted around lines 1188-1193
and 1222-1225 where the raw literals are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 71b30a61-501e-4ce1-a82b-0b553edac82b
📒 Files selected for processing (4)
src/app/endpoints/responses.pysrc/metrics/recording.pytests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: bandit
- GitHub Check: Pylinter
- GitHub Check: Pyright
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/metrics/test_recording.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/metrics/recording.pysrc/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/responses.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (4)
src/metrics/recording.py (1)
115-138: Bounded inference-result labels are implemented correctly.Great change: normalizing unknown results to
"failure"before histogram observation keeps metric cardinality bounded without losing latency samples.Also applies to: 153-157
tests/unit/metrics/test_recording.py (1)
221-237: Targeted regression coverage looks solid.This test cleanly proves unexpected inference result labels are coerced to
"failure"while preserving the recorded duration.tests/unit/app/endpoints/test_responses.py (1)
856-879: Instrumentation test coverage is now explicit and well-targeted.The new assertions and the stream-iteration failure test provide strong confidence that inference metrics are recorded exactly once in both non-streaming and streaming failure paths.
Also applies to: 1024-1045, 2711-2770
src/app/endpoints/responses.py (1)
691-717: Inference metric boundary + double-recording guards look correct.Recording terminal streaming metrics before post-processing and guarding fallback failure recording with
inference_metric_recordedis the right behavior and matches non-streaming semantics.Also applies to: 1033-1087, 1179-1226
31849b3 to
fa10cf8
Compare
Record inference duration and outcome (success/failure) for both streaming and non-streaming Responses API paths. Metric is recorded at the terminal-event boundary before post-processing to prevent spurious failure metrics when post-processing raises after a successful LLM inference. Guarded with inference_metric_recorded flag to avoid double-recording. Signed-off-by: Major Hayden <major@redhat.com>
fa10cf8 to
80c64db
Compare
Description
Add bounded inference latency and failure metrics around the Responses API backend LLM calls, including streaming terminal events and stream iteration failures.
Type of change
Tools used to create PR
N/A
Related Tickets & Documents
Related Issue # RSPEED-2943
Checklist before requesting a review
Testing
uv run pytest tests/unit/metrics/test_recording.py tests/unit/app/endpoints/test_responses.pyuv run radon cc -s src/app/endpoints/responses.py src/metrics/recording.pyuv run make verifySummary by CodeRabbit
New Features
Bug Fixes
Tests