Skip to content

LCORE-1262: Use read-only and mutable request copies#1593

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
asimurka:refactor_responses_endpoint
Apr 28, 2026
Merged

LCORE-1262: Use read-only and mutable request copies#1593
tisnik merged 1 commit into
lightspeed-core:mainfrom
asimurka:refactor_responses_endpoint

Conversation

@asimurka
Copy link
Copy Markdown
Contributor

@asimurka asimurka commented Apr 27, 2026

Description

Refactors responses endpoint to use 2 requests - read-only copy of input request and mutable copy with attributes being updated during request handling.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Refactor

    • Refactored internal request processing to better manage separation between original and server-modified request data, improving code organization and data flow clarity.
    • Streamlined model selection logic with more efficient parameter passing.
    • Enhanced response sanitization behavior for greater consistency in data handling.
  • Tests

    • Updated unit tests to align with new internal request handling architecture.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@asimurka has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 1 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 69114c00-e5a5-49be-b143-74c31bce5507

📥 Commits

Reviewing files that changed from the base of the PR and between 55085a6 and 62d265e.

📒 Files selected for processing (4)
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py

Walkthrough

This pull request refactors the response endpoint handler to use a two-request pattern: preserving an immutable original_request and applying all server-side modifications (model selection, tool resolution, RAG context injection, system prompts) to an updated_request copy. Helper functions and sanitization logic are updated to reference both requests accordingly.

Changes

Cohort / File(s) Summary
Request Handler Refactoring
src/app/endpoints/responses.py
Introduces two-request pattern throughout handler flow. Model selection, tool-choice resolution, RAG context building, and system-prompt injection now write to updated_request. Response sanitization logic updated to infer substitution behavior from presence/absence in original_request rather than using explicit boolean flags. Streaming and non-streaming flows updated to accept both requests.
Model Selection Logic
src/utils/responses.py
Function signature of select_model_for_responses updated to accept optional request_model parameter. When provided, function short-circuits and returns it directly, bypassing conversation history and default selection logic.
Handler Test Updates
tests/unit/app/endpoints/test_responses.py
Test assertions updated to inspect call_kwargs["updated_request"] instead of call_kwargs["request"]. Sanitizer tests refactored to construct original_request/updated_request explicitly using new test constants, removing instructions_substituted/model_substituted boolean parameters from calls. One obsolete test case removed.
Telemetry Test Updates
tests/unit/app/endpoints/test_responses_splunk.py
All handler calls updated to pass original_request and updated_request arguments (both pointing to same request object) instead of single request parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main refactoring change: introducing read-only and mutable request copies to the responses endpoint handling logic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 316-325: The call to the resolver is passing the wrong identity
value (auth[1] which is the username) so MCP auth headers are built incorrectly;
change the resolver invocation to pass the bearer token variable (token) instead
of auth[1] when calling resolve_tool_choice or resolve_client_tool_choice so
that resolve_client_tool_choice / resolve_tool_choice receive the actual token
used to construct MCP headers and update updated_request.tools and
updated_request.tool_choice correctly with original_request.tools and
original_request.tool_choice.
- Around line 404-407: The blocked-stream path is passing
updated_request.echoed_params() straight into shield_violation_generator
(creating generator) and thus can leak server-resolved prompts, model, or MCP
tools; update the blocked streaming flow to run the echoed params through the
same sanitizer used by the normal path (call _sanitize_response_dict or reuse
the existing sanitization helper) before passing the result into
shield_violation_generator, or alternatively make shield_violation_generator
accept and apply sanitization internally so both paths use the sanitized dict
consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 355763e2-00fa-415c-af35-13ac83951d93

📥 Commits

Reviewing files that changed from the base of the PR and between bc0c36c and 55085a6.

📒 Files selected for processing (4)
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/utils/responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/app/endpoints/test_responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (6)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-16T19:08:31.451Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:31.451Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Applied to files:

  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py

Comment thread src/app/endpoints/responses.py
Comment thread src/app/endpoints/responses.py
@asimurka asimurka marked this pull request as draft April 27, 2026 07:31
@asimurka asimurka force-pushed the refactor_responses_endpoint branch from 55085a6 to 62d265e Compare April 27, 2026 07:41
@asimurka asimurka marked this pull request as ready for review April 27, 2026 07:42
responses_request.shield_ids,
)

# Extract vector store IDs for Inline RAG context before resolving tool choice.
Copy link
Copy Markdown
Contributor Author

@asimurka asimurka Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done after tools resolution now (see L327-332)

Comment thread src/utils/responses.py
@@ -1330,6 +1332,7 @@ async def check_model_configured(


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method generalized and added early return statement.

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit c0fd451 into lightspeed-core:main Apr 28, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants