RSPEED-2885: filter internal details from /v1/responses stream#1524
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new constant and a helper that can replace substituted Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5b571d9 to
31279ad
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/responses.py (2)
675-699:⚠️ Potential issue | 🟠 MajorKeep persisted turn summaries aligned with the sanitized response.
The client payload is now sanitized unconditionally, but
turn_summaryis still built from the rawresponse_objectand remains gated byfilter_server_tools. With the header absent, server-deployed MCP calls disappear from the returned response yet still flow intoturn_summary/store_query_results, so stored state no longer matches what the client actually saw.Also applies to: 758-764, 921-950
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 675 - 699, The persisted turn summary is still derived from the raw response_object while the outgoing chunk was sanitized (via configured_mcp_labels, server_mcp_output_indices, _should_filter_mcp_chunk and _sanitize_response_dict), so stored state can diverge from what the client saw; update the code paths that build turn_summary and call store_query_results (the places around the response handling where response_object / filter_server_tools are used) to use the same sanitized chunk_dict/response (or re-run the same filtering and _sanitize_response_dict logic) before constructing turn_summary and persisting results so persisted summaries match the sanitized client payload.
675-699:⚠️ Potential issue | 🔴 CriticalApply the same sanitization to the shield-blocked SSE path.
These lines only sanitize chunks that flow through
response_generator(). When moderation blocks a streaming request,handle_streaming_response()switches toshield_violation_generator(), which still serializesrequest.echoed_params()directly intoresponse.createdandresponse.completed. That means blocked SSE responses can still leakinstructions, merged server MCP tools, and unsanitized model IDs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 675 - 699, Shielded/blocked SSE responses from shield_violation_generator are not being sanitized, so ensure the same sanitization applied in response_generator runs for the shield path: where handle_streaming_response/ shield_violation_generator serializes request.echoed_params() into response.created and response.completed, call _sanitize_response_dict(...) with the same configured_mcp_labels (and preserve setting response["conversation"]=normalized_conv_id) and update any model IDs/instructions/mcp tool lists before sending; reuse the existing helpers (_should_filter_mcp_chunk, _sanitize_response_dict, configured_mcp_labels, server_mcp_output_indices, normalized_conv_id) to perform this sanitization so blocked SSE chunks cannot leak instructions, merged server MCP tools, or raw model IDs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 531-537: The current filter removes any tool whose server_label
matches configured_mcp_labels, which can accidentally drop client-supplied
tools; change the predicate in the response_dict "tools" cleanup so it only
excludes tools that both have server_label in configured_mcp_labels and are
explicitly server-injected (e.g., check a marker like tool.get("mcp_injected")
or tool.get("source") == "server"). Update the logic around the tools variable
and response_dict["tools"] so client-provided tools without the injected marker
are preserved while real injected MCP tools are still filtered.
- Around line 501-537: The helper _sanitize_response_dict currently mutates the
input response_dict; change it to be pure by returning a sanitized copy instead:
create a shallow copy of response_dict at the start (e.g., new =
dict(response_dict)), remove "instructions" from the copy, transform the "model"
value on the copy if it's a string with more than one "/", and build a new
"tools" list by filtering and copying each tool dict so callers are not mutated;
update the function signature to return dict[str, Any] and adjust all callers of
_sanitize_response_dict to use the returned sanitized dict instead of relying on
in-place modification.
---
Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 675-699: The persisted turn summary is still derived from the raw
response_object while the outgoing chunk was sanitized (via
configured_mcp_labels, server_mcp_output_indices, _should_filter_mcp_chunk and
_sanitize_response_dict), so stored state can diverge from what the client saw;
update the code paths that build turn_summary and call store_query_results (the
places around the response handling where response_object / filter_server_tools
are used) to use the same sanitized chunk_dict/response (or re-run the same
filtering and _sanitize_response_dict logic) before constructing turn_summary
and persisting results so persisted summaries match the sanitized client
payload.
- Around line 675-699: Shielded/blocked SSE responses from
shield_violation_generator are not being sanitized, so ensure the same
sanitization applied in response_generator runs for the shield path: where
handle_streaming_response/ shield_violation_generator serializes
request.echoed_params() into response.created and response.completed, call
_sanitize_response_dict(...) with the same configured_mcp_labels (and preserve
setting response["conversation"]=normalized_conv_id) and update any model
IDs/instructions/mcp tool lists before sending; reuse the existing helpers
(_should_filter_mcp_chunk, _sanitize_response_dict, configured_mcp_labels,
server_mcp_output_indices, normalized_conv_id) to perform this sanitization so
blocked SSE chunks cannot leak instructions, merged server MCP tools, or raw
model IDs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2c1847a9-af28-472f-85be-df3128ccba58
📒 Files selected for processing (2)
src/app/endpoints/responses.pytests/unit/app/endpoints/test_responses.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/app/endpoints/test_responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
| def _sanitize_response_dict( | ||
| response_dict: dict[str, Any], | ||
| configured_mcp_labels: set[str], | ||
| ) -> None: | ||
| """Sanitize a serialized response object in-place to remove internal details. | ||
|
|
||
| Strips fields that expose server-side implementation details from the | ||
| response object before it is forwarded to the client: | ||
|
|
||
| - ``instructions``: the server-side system prompt | ||
| - ``model``: deep provider paths (2+ slashes) are truncated to the | ||
| last segment (e.g. | ||
| ``google-vertex/publishers/google/models/gemini`` → ``gemini``); | ||
| simple ``provider/model`` strings are left intact | ||
| - ``tools``: server-deployed MCP tool definitions are removed; client- | ||
| provided tools (those whose ``server_label`` is not in | ||
| ``configured_mcp_labels``) are preserved | ||
|
|
||
| Args: | ||
| response_dict: Mutable dict produced by ``model_dump`` on a response | ||
| object. Modified in-place. | ||
| configured_mcp_labels: Set of ``server_label`` values that identify | ||
| server-deployed MCP servers. | ||
| """ | ||
| response_dict.pop("instructions", None) | ||
|
|
||
| model = response_dict.get("model") | ||
| if isinstance(model, str) and model.count("/") > 1: | ||
| response_dict["model"] = model.rsplit("/", 1)[-1] | ||
|
|
||
| tools = response_dict.get("tools") | ||
| if tools is not None: | ||
| response_dict["tools"] = [ | ||
| tool | ||
| for tool in tools | ||
| if tool.get("server_label") not in configured_mcp_labels | ||
| ] |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Return a sanitized copy instead of mutating the input dict.
Making this helper pure will avoid hidden caller side effects and make the two response paths easier to reason about.
As per coding guidelines "Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/responses.py` around lines 501 - 537, The helper
_sanitize_response_dict currently mutates the input response_dict; change it to
be pure by returning a sanitized copy instead: create a shallow copy of
response_dict at the start (e.g., new = dict(response_dict)), remove
"instructions" from the copy, transform the "model" value on the copy if it's a
string with more than one "/", and build a new "tools" list by filtering and
copying each tool dict so callers are not mutated; update the function signature
to return dict[str, Any] and adjust all callers of _sanitize_response_dict to
use the returned sanitized dict instead of relying on in-place modification.
There was a problem hiding this comment.
In-place mutation is intentional here. The dict is created by model_dump() on the line immediately before the call — it is a throwaway serialization dict, not a shared data structure. Returning a copy would allocate a new dict for every SSE chunk with zero safety benefit, since nothing else holds a reference to the original.
The AGENTS.md guideline about avoiding in-place modification targets a different anti-pattern: functions that silently mutate a caller's long-lived data structure via a parameter. That does not apply here — the dict is born, sanitized, serialized to JSON, and discarded within the same async for iteration.
There was a problem hiding this comment.
@Lifto, that's a well-reasoned clarification — you're right. The guideline is aimed at functions that silently mutate a caller's long-lived/shared data structure through a parameter reference, which is a genuine source of bugs. Here the dict is born from model_dump(), passed straight into _sanitize_response_dict, serialized, and discarded — there's no aliasing risk and no wasted allocation. The in-place mutation is appropriate and intentional. I'll withdraw the suggestion.
(ᵔᴥᵔ) 🐇✏️
✏️ Learnings added
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Prevent internal implementation details from leaking through the /v1/responses SSE stream: - Strip 'instructions' field (system prompt) from response objects - Sanitize 'model' field by removing provider prefix (e.g. google-vertex/publishers/google/models/gemini → gemini) - Strip server-deployed MCP tool definitions from the 'tools' array in response objects (client-provided tools are preserved) - Make MCP streaming event filtering unconditional — mcp_call, mcp_list_tools, and mcp_approval_request events for server-deployed tools are now always filtered, not only when X-LCS-Merge-Server-Tools header is present The X-LCS-Merge-Server-Tools header continues to control tool merging behavior (resolve_client_tool_choice vs resolve_tool_choice) unchanged. Fixes: https://redhat.atlassian.net/browse/RSPEED-2885 Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
31279ad to
da6ae27
Compare
Replies to CodeRabbit review commentsComment 3 — "Keep persisted turn summaries aligned with the sanitized response"This divergence is intentional. The sanitization controls what the client sees — we strip implementation details from the wire format. The server-side turn summary, metrics, and conversation storage intentionally retain the full context (including server-deployed MCP tool calls) because that data is needed for debugging, audit trails, and accurate token accounting. Aligning the persisted state with the sanitized client view would lose information the server needs. The Comment 4 — "Apply sanitization to shield-blocked SSE path"This is a valid observation. When moderation blocks a request, This is a pre-existing leak that exists on @pytest.mark.asyncio
async def test_shield_blocked_response_leaks_instructions(mocker, minimal_config):
"""Shield-blocked responses echo server-set instructions via echoed_params."""
# 1. Create request, let responses_endpoint_handler set instructions via get_system_prompt()
# 2. Trigger moderation block (mock moderation_result.decision = "blocked")
# 3. Collect SSE output from shield_violation_generator()
# 4. Assert "instructions" IS present in response.created / response.completed
# → This test PASSES on main, proving the leak is pre-existingThat said — this PR makes the leak more visible by establishing the pattern that We think the right fix is to either remove |
|
Follow-up filed: RSPEED-2889 — shield-blocked |
|
Correction: follow-up ticket moved to the correct project. LCORE-1851 — shield-blocked |
|
@Lifto is this for rhel-lightspeed? Initially I think I had the filtering always on so this seems resonable but would like to review. But curious if this is using Goose are you not using X-LCS-Merge-Server-Tools (that would leave Goose extensions mostly useless)? |
There was a problem hiding this comment.
@Lifto Tools filtering is reasonable as currently client tools are leaked in tools attribute.
I agree that echoing huge system prompts is unnecessary. However, removing them from the response entirely can be surprising as it is a native responses behavior.
Wdyt about echoing fixed-length prefix of the system prompt with predefined constant in constants.py e.g. 100?
Anyway, please add a short comment to docs/responses.md to document this (there is already a section dedicated to system prompt/instructions resolution). Thanks.
jrobertboos
left a comment
There was a problem hiding this comment.
100% agree with @asimurka also was wondering about some style changes and if they are necessary for this PR.
| responses_request.tool_choice, | ||
| auth[1], | ||
| mcp_headers, | ||
| request.headers, |
There was a problem hiding this comment.
is there a reason behind this change?
| responses_request.tool_choice, | ||
| auth[1], | ||
| mcp_headers, | ||
| request.headers, |
|
@dprince Yes, this is for rhel-lightspeed — the filtering prevents leaking the server-side system prompt ( On the Goose question: the filtering doesn't disable server tools, it hides the internal events from the client's SSE stream. When Goose's own extensions (client-provided tools) are unaffected — their The two concerns are complementary: Assisted by Claude Opus 4.6 |
When the server substitutes its own system prompt, the response now sets instructions to '<server prompt applied>' instead of stripping the field entirely. Client-provided instructions are echoed back unchanged, preserving native Responses API behavior. Adds documentation to docs/responses.md covering the filtering behavior for instructions and server-deployed MCP tools. RSPEED-2885 Signed-off-by: Ellis Low <elow@redhat.com>
|
@asimurka Thanks for the feedback. Updated the approach — instead of stripping
This preserves native Responses API behavior for clients that send Also added documentation to Assisted by Claude Opus 4.6 |
| def _sanitize_response_dict( | ||
| response_dict: dict[str, Any], | ||
| configured_mcp_labels: set[str], | ||
| instructions_substituted: bool, |
There was a problem hiding this comment.
This could probably have a default value of False.
| tools = response_dict.get("tools") | ||
| if tools is not None: | ||
| response_dict["tools"] = [ | ||
| tool | ||
| for tool in tools | ||
| if tool.get("server_label") not in configured_mcp_labels | ||
| ] |
There was a problem hiding this comment.
Nit: this could be simplified a bit.
| tools = response_dict.get("tools") | |
| if tools is not None: | |
| response_dict["tools"] = [ | |
| tool | |
| for tool in tools | |
| if tool.get("server_label") not in configured_mcp_labels | |
| ] | |
| if tools := response_dict.get("tools"): | |
| response_dict["tools"] = [ | |
| tool | |
| for tool in tools | |
| if tool.get("server_label") not in configured_mcp_labels | |
| ] |
- Revert unintended tuple unpacking style change (jrobertboos) - Add default value for instructions_substituted param (samdoran) - Use walrus operator for tools filtering (samdoran) RSPEED-2885 Signed-off-by: Ellis Low <elow@redhat.com>
|
@jrobertboos Unintended style change — not from the formatter. Reverted. @samdoran Both nits addressed: added default value for Assisted by Claude Opus 4.6 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 500-537: The _sanitize_response_dict helper is defined but never
used; call it on the serialized response in both streaming and non-streaming
code paths: for the streaming path, after building chunk_dict and before
yielding (i.e., where chunk_dict["response"] is set and alongside existing
_should_filter_mcp_chunk/_is_server_mcp_output_item checks) invoke
_sanitize_response_dict(chunk_dict["response"], configured_mcp_labels,
instructions_substituted=...) and for the non-streaming path, after
api_response_dict = api_response.model_dump(exclude_none=True) and before
translating tool IDs call _sanitize_response_dict(api_response_dict,
configured_mcp_labels, instructions_substituted=...). Also wire the correct
boolean for instructions_substituted (set it where the server decides to
substitute instructions and pass that variable into both calls) so
SUBSTITUTED_INSTRUCTIONS_PLACEHOLDER will be applied when appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 92d5d4bd-7daa-4084-9c61-3c3d1ca57f16
📒 Files selected for processing (1)
src/app/endpoints/responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.
📚 Learning: 2026-04-16T19:08:26.366Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-16T19:08:35.441Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (2)
src/app/endpoints/responses.py (2)
500-537: Consider returning the sanitized dict rather than mutating in place.Per AGENTS.md ("Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters"), a pure helper is generally preferred. I understand from prior discussion on this PR that the input is always a fresh
model_dump()throwaway so there's no aliasing risk, and the decision to mutate in-place is intentional — acknowledging that and not asking for a change. Noting here only so future readers aren't surprised by the-> Nonesignature on a function named_sanitize_….Based on learnings: "
_sanitize_response_dict(...)intentionally mutatesresponse_dictin-place. The dict is always a fresh throwaway produced bymodel_dump()on the immediately preceding line … Do not flag this as an anti-pattern in future reviews."
531-536: 🧹 Nitpick | 🔵 TrivialTool filtering assumes each
toolis a dict.
tool.get("server_label")will raiseAttributeErrorif any entry inresponse_dict["tools"]is not a mapping (e.g., already-unserialized pydantic model, or an unexpected shape from a future Llama Stack schema change). Sinceresponse_dictcomes frommodel_dump()today that's fine in practice, but a defensiveisinstance(tool, dict)guard (orgetattr(tool, "get", lambda _ : None)) would make this safer against schema drift and the non-streaming call path whereresponse_dictmay be assembled differently.Not blocking — flagging as optional hardening.
⛔ Skipped due to learnings
Learnt from: Lifto Repo: lightspeed-core/lightspeed-stack PR: 1524 File: src/app/endpoints/responses.py:523-529 Timestamp: 2026-04-16T19:08:35.441Z Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.Learnt from: major Repo: lightspeed-core/lightspeed-stack PR: 1463 File: src/app/endpoints/rlsapi_v1.py:266-271 Timestamp: 2026-04-06T20:18:11.336Z Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 1211 File: src/models/responses.py:8-16 Timestamp: 2026-02-25T07:46:39.608Z Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.Learnt from: Lifto Repo: lightspeed-core/lightspeed-stack PR: 1524 File: src/app/endpoints/responses.py:501-529 Timestamp: 2026-04-16T19:08:26.366Z Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.
The previous nit-fix commit (3d970d4) accidentally removed: - _sanitize_response_dict call sites in both streaming and non-streaming paths - instructions_substituted computation and threading - unconditional configured_mcp_labels computation Restored from working commit 68c0057 and reapplied only the three nit fixes (tuple unpacking style, default param value, walrus operator). RSPEED-2885 Signed-off-by: Ellis Low <elow@redhat.com>
071ec8e to
90b0d0e
Compare
| instructions_substituted = ( | ||
| client_instructions is None | ||
| or client_instructions != responses_request.instructions | ||
| ) |
There was a problem hiding this comment.
I think this logic is too complicated. Checking the value of the original instructions before applying get_system_prompt() is sufficient:
- if
instructions isNone` -> server-side system prompt will be applied - if
instructionsare specified per-request, they have the highest priority -> no override
There was a problem hiding this comment.
Good call — simplified to just client_instructions is None. The second condition was always redundant: when the client provides instructions, get_system_prompt() returns them unchanged, and when per-request customization is disabled it raises 422 before reaching this line.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…ilter-responses-internals RSPEED-2885: filter internal details from /v1/responses stream
…ilter-responses-internals RSPEED-2885: filter internal details from /v1/responses stream
Summary
Prevent internal implementation details from leaking through the
/v1/responsesSSE stream to clients.What Was Leaking
The
/v1/responsesendpoint was including in every SSE response object:instructions— the full server-side system prompt (~3KB of guardrails, RHEL lifecycle dates, formatting rules, search instructions)tools— server-deployed MCP tool definitions includingserver_url,server_label, auth headers, and full tool schemasmodel— full provider path likegoogle-vertex/publishers/google/models/gemini-2.5-flashinstead of a clean model nameAdditionally, MCP streaming events (
response.output_item.added,response.mcp_call.*,response.mcp_list_tools.*,response.mcp_approval_request.*) were only filtered when theX-LCS-Merge-Server-Tools: trueheader was present, leaking them in the default case.What This PR Does
instructionsfrom all response objects in the SSE stream viapop("instructions", None)modelby extracting the last path segment after/(e.g.gemini-2.5-flash)toolsarray in response objects; client-provided tools are preservedconfigured_mcp_labelsis now always populated fromconfiguration.mcp_servers, so server-deployed MCP streaming events are always filtered regardless of theX-LCS-Merge-Server-ToolsheaderThe
X-LCS-Merge-Server-Toolsheader continues to control tool merging behavior (resolve_client_tool_choicevsresolve_tool_choice) unchanged.All sanitization is done via a new
_sanitize_response_dict()helper, applied in both the streaming and non-streaming response paths.Testing
Added 13 new unit tests:
TestSanitizeResponseDict(11 tests) — unit tests for each sanitization behaviorTestMcpEventsFilteredUnconditionally(2 tests) — integration tests verifying unconditional MCP event filteringJira
RSPEED-2885
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests