Skip to content

RSPEED-2978: refactor Responses API shared side effects#1649

Merged
tisnik merged 2 commits intolightspeed-core:mainfrom
major:refactor/responses-shared-side-effects
May 5, 2026
Merged

RSPEED-2978: refactor Responses API shared side effects#1649
tisnik merged 2 commits intolightspeed-core:mainfrom
major:refactor/responses-shared-side-effects

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 30, 2026

Description

Refactor the Responses API endpoint so streaming and non-streaming handlers reuse the same side-effect helpers for error telemetry, blocked response persistence, previous response continuation, result storage, and completed telemetry. This brings handle_non_streaming_response down to radon A while keeping the behavior exercised through the existing handler tests.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Hephaestus (OpenAI GPT-5.5), CodeRabbit CLI
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2978
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • uv run pyright src/app/endpoints/responses.py tests/unit/app/endpoints/test_responses_splunk.py
  • uv run pytest tests/unit/app/endpoints/test_responses.py tests/unit/app/endpoints/test_responses_splunk.py, 85 passed
  • uv run make verify
  • git diff --check origin/main...HEAD
  • uv run radon cc -s -a src/app/endpoints/responses.py, average A, handle_non_streaming_response A (5)
  • Local CodeRabbit review found one minor completed-telemetry user_agent finding, fixed in this branch with assertions in test_responses_splunk.py.

Summary by CodeRabbit

Release Notes

  • Bug Fixes & Improvements

    • Enhanced error handling for connection failures and capacity constraints with improved exception mapping
    • Improved handling of blocked requests with consistent logging across streaming and non-streaming flows
  • Tests

    • Strengthened telemetry validation tests with enhanced parameter coverage

@major
Copy link
Copy Markdown
Contributor Author

major commented Apr 30, 2026

@tisnik I meant to do this when I added the splunk stuff, but then I got caught up in Summit deliverables. 🤪

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c9b2f1a9-52c9-4db9-a429-8163ba6c1105

📥 Commits

Reviewing files that changed from the base of the PR and between 0c79c65 and 26ce9b9.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (5)
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends

Files:

  • src/app/endpoints/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
🔇 Additional comments (16)
src/app/endpoints/responses.py (14)

372-388: Topic-summary generation may still be called for shield-blocked requests.

This was raised in a previous review and marked as addressed, but the current guard at line 385 only checks generate_topic_summary without verifying moderation_result.decision == "passed". Both blocked code paths continue execution to where this helper is called, potentially invoking get_topic_summary() on rejected input.

If the omission is intentional per the parity discussion with query endpoints, consider adding a comment documenting the design decision.


7-16: LGTM!

Import additions align with the new helper functions: Sequence and OpenAIResponseOutput for _append_previous_response_turn, and NoReturn for _raise_response_api_http_exception.


228-251: LGTM!

The helper correctly uses fire_and_forget=True for error telemetry, since FastAPI discards BackgroundTasks on non-2xx responses. Parameters are properly sourced from api_params and context.


254-280: LGTM!

The error mapping logic correctly uses the utility functions from utils/query.py shown in the relevant code snippets. Returning None for unrecognized RuntimeError allows the caller to re-raise the original exception unchanged.


283-303: LGTM!

The NoReturn type hint correctly indicates the function always raises. Exception chaining with from error preserves the traceback context, and telemetry is properly queued before the exception is raised.


306-323: LGTM!

The helper correctly guards on api_params.store and the cast to ShieldModerationBlocked is safe since both call sites verify moderation_result.decision == "blocked" first.


326-348: LGTM!

Correctly uses fire_and_forget=False (default) since shield-blocked requests return a valid 200 response with refusal content, so BackgroundTasks will be executed normally.


351-369: LGTM!

The dual guard condition correctly handles the case where conversation context is being continued from a previous response. The Sequence[OpenAIResponseOutput] type is appropriately specific.


391-421: LGTM!

The helper correctly guards on api_params.store and properly extracts auth tuple components. The empty attachments=[] is appropriate for the Responses API path.


424-454: LGTM!

The guard at lines 440-441 correctly prevents duplicate telemetry—blocked requests emit responses_shield_blocked via _queue_blocked_response_event instead. Token counts and user_agent are properly propagated.


658-682: LGTM!

The streaming handler correctly delegates to the shared helpers for both the blocked path and exception handling. The exception types in the except clause align with what _raise_response_api_http_exception handles.


1035-1040: LGTM!

The guard on latest_response_object before calling _append_previous_response_turn is correct—it prevents attempting to append when no valid response was received.


1067-1082: LGTM!

The ordering is correct—_store_response_query_results executes before _queue_completed_response_event, ensuring telemetry is only emitted after successful persistence. This addresses the previous review concern about ordering consistency.


1099-1182: LGTM!

The non-streaming handler correctly delegates to all shared helpers. The ordering at lines 1169-1182 is correct—_store_response_query_results before _queue_completed_response_event. The simplified user_id = context.auth[0] at line 1099 is appropriate since only the user_id is needed at that point.

tests/unit/app/endpoints/test_responses_splunk.py (2)

425-449: LGTM!

The test correctly verifies user_agent propagation through the non-streaming success path. The assertion at line 449 confirms the parameter reaches _queue_responses_splunk_event.


655-684: LGTM!

The streaming success test mirrors the non-streaming test coverage for user_agent propagation. Both paths are now verified to pass the user agent through to telemetry.


Walkthrough

Refactored responses.py to extract error handling, shield-blocked request persistence, topic-summary generation, response storage, and completion telemetry into reusable private helper functions. Both streaming and non-streaming endpoints now delegate to these consolidated helpers instead of duplicating logic. Tests updated to verify user_agent parameter propagation.

Changes

Error Handling & Telemetry Consolidation

Layer / File(s) Summary
Helper Function Definitions
src/app/endpoints/responses.py (lines 228–455)
Introduced _http_exception_for_response_api_error, _raise_response_api_http_exception, and _queue_responses_error_event for centralized exception-to-HTTPException mapping and responses_error telemetry. Added _persist_blocked_response_turn and _queue_blocked_response_event for shield-blocked request handling. Added _append_previous_response_turn, _maybe_get_topic_summary, _store_response_query_results, and _queue_completed_response_event for response persistence and completion telemetry (suppressed unless moderation passed).
Streaming Flow Refactoring
src/app/endpoints/responses.py (lines 658–1082)
Updated streaming shield-blocked branch to use _persist_blocked_response_turn() and _queue_blocked_response_event(). Consolidated streaming exception handling into _raise_response_api_http_exception(e, api_params, context). Replaced inline response appending, topic summary, storage, and completion telemetry with calls to _append_previous_response_turn(), _maybe_get_topic_summary(), _store_response_query_results(...), and _queue_completed_response_event(...).
Non-Streaming Flow Refactoring
src/app/endpoints/responses.py (lines 1099–1182)
Simplified user_id extraction from tuple unpacking to context.auth[0]. Refactored shield-blocked branch to use consolidated helpers. Consolidated exception handling into _raise_response_api_http_exception(...). Replaced inline storage and completion telemetry with calls to shared helpers.
Test Verification
tests/unit/app/endpoints/test_responses_splunk.py (lines 432–449, 662–684)
Updated test_non_streaming_success and test_streaming_success to pass explicit user_agent="test-agent/1.0" and assert user_agent is present in _queue_responses_splunk_event call arguments.

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main change: refactoring shared side effects in the Responses API endpoint, which aligns directly with the primary objective of reducing code duplication between streaming and non-streaming handlers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 1168-1181: The completion side effects are run in the wrong order
here—call _store_response_query_results(...) before
_queue_completed_response_event(...) to match generate_response()'s streaming
path; update the sequence in the non-streaming branch so
_store_response_query_results(api_params, context, turn_summary, completed_at,
topic_summary) executes first and only then call
_queue_completed_response_event(api_params, context, turn_summary, completed_at,
output_text) with the same arguments and ordering.
- Around line 384-387: The snippet unconditionally calls get_topic_summary when
context.generate_topic_summary is true, which can trigger another model call for
requests already blocked by the shield; update the guard to also check the
shield-block indicator on the context (e.g., context.shield_blocked or the
existing moderation/shield flag present on Context) and return None if the
request was shield-blocked before calling get_topic_summary(context.input_text,
context.client, api_params.model) so that topic-summary generation is skipped
for rejected inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c607f27-fb46-4faa-8854-81479813f8cb

📥 Commits

Reviewing files that changed from the base of the PR and between be92b89 and 64670e6.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (5)
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends

Files:

  • src/app/endpoints/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (2)
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/app/**/*.py : Use FastAPI dependencies: `from fastapi import APIRouter, HTTPException, Request, status, Depends`

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
🔇 Additional comments (1)
src/app/endpoints/responses.py (1)

298-300: This review comment is based on an incorrect understanding of Python exception re-raise semantics.

The verification script demonstrates that raise error (raising by name) in Python 3.12+ actually preserves the full exception traceback, including both the original exception site and the re-raise site. A bare raise (without the exception variable) would show less traceback context, not more.

Additionally, the function has no internal except block, making a bare raise syntactically impossible in this context. The current raise error is appropriate for re-raising unmapped exceptions unchanged, and the code correctly uses raise http_exception from error to establish explicit chaining for mapped exceptions.

			> Likely an incorrect or invalid review comment.

Comment thread src/app/endpoints/responses.py
Comment thread src/app/endpoints/responses.py Outdated
@major major force-pushed the refactor/responses-shared-side-effects branch from 64670e6 to 21779f4 Compare April 30, 2026 18:04
@tisnik tisnik requested a review from asimurka May 4, 2026 07:17
Copy link
Copy Markdown
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work with refactoring in overall.

For discussion: The endpoint has become quite long. We should consider creating separate module for all the utility functions dedicated to this endpoint to keep it lightweight. Anyway, it is not necessary for this PR.

Comment thread src/app/endpoints/responses.py Outdated
Comment thread src/app/endpoints/responses.py Outdated
Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the refactor/responses-shared-side-effects branch from 21779f4 to 0c79c65 Compare May 4, 2026 11:53
@major major requested a review from asimurka May 4, 2026 18:04
Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the refactor/responses-shared-side-effects branch from 0c79c65 to 26ce9b9 Compare May 4, 2026 18:32
@asimurka
Copy link
Copy Markdown
Contributor

asimurka commented May 5, 2026

/ok-to-test

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit fa19cbd into lightspeed-core:main May 5, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants