LCORE-1262: Use read-only and mutable request copies by asimurka · Pull Request #1593 · lightspeed-core/lightspeed-stack

asimurka · 2026-04-27T07:20:48Z

Description

Refactors responses endpoint to use 2 requests - read-only copy of input request and mutable copy with attributes being updated during request handling.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue # LCORE-1262
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Refactor
- Refactored internal request processing to better manage separation between original and server-modified request data, improving code organization and data flow clarity.
- Streamlined model selection logic with more efficient parameter passing.
- Enhanced response sanitization behavior for greater consistency in data handling.
Tests
- Updated unit tests to align with new internal request handling architecture.

coderabbitai · 2026-04-27T07:21:04Z

Warning

Rate limit exceeded

@asimurka has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 1 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 69114c00-e5a5-49be-b143-74c31bce5507

📥 Commits

Reviewing files that changed from the base of the PR and between 55085a6 and 62d265e.

📒 Files selected for processing (4)

src/app/endpoints/responses.py
src/utils/responses.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_responses_splunk.py

Walkthrough

This pull request refactors the response endpoint handler to use a two-request pattern: preserving an immutable original_request and applying all server-side modifications (model selection, tool resolution, RAG context injection, system prompts) to an updated_request copy. Helper functions and sanitization logic are updated to reference both requests accordingly.

Changes

Cohort / File(s)	Summary
Request Handler Refactoring `src/app/endpoints/responses.py`	Introduces two-request pattern throughout handler flow. Model selection, tool-choice resolution, RAG context building, and system-prompt injection now write to `updated_request`. Response sanitization logic updated to infer substitution behavior from presence/absence in `original_request` rather than using explicit boolean flags. Streaming and non-streaming flows updated to accept both requests.
Model Selection Logic `src/utils/responses.py`	Function signature of `select_model_for_responses` updated to accept optional `request_model` parameter. When provided, function short-circuits and returns it directly, bypassing conversation history and default selection logic.
Handler Test Updates `tests/unit/app/endpoints/test_responses.py`	Test assertions updated to inspect `call_kwargs["updated_request"]` instead of `call_kwargs["request"]`. Sanitizer tests refactored to construct `original_request`/`updated_request` explicitly using new test constants, removing `instructions_substituted`/`model_substituted` boolean parameters from calls. One obsolete test case removed.
Telemetry Test Updates `tests/unit/app/endpoints/test_responses_splunk.py`	All handler calls updated to pass `original_request` and `updated_request` arguments (both pointing to same request object) instead of single `request` parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main refactoring change: introducing read-only and mutable request copies to the responses endpoint handling logic.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 316-325: The call to the resolver is passing the wrong identity
value (auth[1] which is the username) so MCP auth headers are built incorrectly;
change the resolver invocation to pass the bearer token variable (token) instead
of auth[1] when calling resolve_tool_choice or resolve_client_tool_choice so
that resolve_client_tool_choice / resolve_tool_choice receive the actual token
used to construct MCP headers and update updated_request.tools and
updated_request.tool_choice correctly with original_request.tools and
original_request.tool_choice.
- Around line 404-407: The blocked-stream path is passing
updated_request.echoed_params() straight into shield_violation_generator
(creating generator) and thus can leak server-resolved prompts, model, or MCP
tools; update the blocked streaming flow to run the echoed params through the
same sanitizer used by the normal path (call _sanitize_response_dict or reuse
the existing sanitization helper) before passing the result into
shield_violation_generator, or alternatively make shield_violation_generator
accept and apply sanitization internally so both paths use the sanitized dict
consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 355763e2-00fa-415c-af35-13ac83951d93

📥 Commits

Reviewing files that changed from the base of the PR and between bc0c36c and 55085a6.

📒 Files selected for processing (4)

src/app/endpoints/responses.py
src/utils/responses.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_responses_splunk.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: integration_tests (3.12)
GitHub Check: unit_tests (3.12)
GitHub Check: unit_tests (3.13)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 3

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

src/utils/responses.py
tests/unit/app/endpoints/test_responses_splunk.py
src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/utils/responses.py
src/app/endpoints/responses.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/app/endpoints/test_responses.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

src/app/endpoints/responses.py

🧠 Learnings (6)

📚 Learning: 2026-02-25T07:46:39.608Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

src/utils/responses.py
src/app/endpoints/responses.py

📚 Learning: 2026-02-23T14:11:46.950Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-02-23T14:56:59.186Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-04-16T19:08:31.451Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:31.451Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Applied to files:

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

📚 Learning: 2026-04-16T19:08:38.217Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/responses.py

asimurka · 2026-04-27T07:44:04Z

-        responses_request.shield_ids,
-    )
-
-    # Extract vector store IDs for Inline RAG context before resolving tool choice.


This can be done after tools resolution now (see L327-332)

asimurka · 2026-04-27T09:05:52Z

@@ -1330,6 +1332,7 @@ async def check_model_configured(




Method generalized and added early return statement.

tisnik

LGTM

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread src/app/endpoints/responses.py

Comment thread src/app/endpoints/responses.py

asimurka marked this pull request as draft April 27, 2026 07:31

Use read-only and mutable request copies

62d265e

asimurka force-pushed the refactor_responses_endpoint branch from 55085a6 to 62d265e Compare April 27, 2026 07:41

asimurka marked this pull request as ready for review April 27, 2026 07:42

asimurka commented Apr 27, 2026

View reviewed changes

tisnik approved these changes Apr 28, 2026

View reviewed changes

tisnik merged commit c0fd451 into lightspeed-core:main Apr 28, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1262: Use read-only and mutable request copies#1593

LCORE-1262: Use read-only and mutable request copies#1593
tisnik merged 1 commit into
lightspeed-core:mainfrom
asimurka:refactor_responses_endpoint

asimurka commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

asimurka Apr 27, 2026 •

edited

Loading

Uh oh!

asimurka Apr 27, 2026

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asimurka commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asimurka Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asimurka commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

asimurka Apr 27, 2026 •

edited

Loading