Skip to content

RSPEED-2885: filter internal details from /v1/responses stream#1524

Merged
tisnik merged 7 commits into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2885-filter-responses-internals
Apr 20, 2026
Merged

RSPEED-2885: filter internal details from /v1/responses stream#1524
tisnik merged 7 commits into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2885-filter-responses-internals

Conversation

@Lifto
Copy link
Copy Markdown
Contributor

@Lifto Lifto commented Apr 16, 2026

Summary

Prevent internal implementation details from leaking through the /v1/responses SSE stream to clients.

What Was Leaking

The /v1/responses endpoint was including in every SSE response object:

  1. instructions — the full server-side system prompt (~3KB of guardrails, RHEL lifecycle dates, formatting rules, search instructions)
  2. tools — server-deployed MCP tool definitions including server_url, server_label, auth headers, and full tool schemas
  3. model — full provider path like google-vertex/publishers/google/models/gemini-2.5-flash instead of a clean model name

Additionally, MCP streaming events (response.output_item.added, response.mcp_call.*, response.mcp_list_tools.*, response.mcp_approval_request.*) were only filtered when the X-LCS-Merge-Server-Tools: true header was present, leaking them in the default case.

What This PR Does

  • Strips instructions from all response objects in the SSE stream via pop("instructions", None)
  • Sanitizes model by extracting the last path segment after / (e.g. gemini-2.5-flash)
  • Strips server-deployed MCP tool definitions from the tools array in response objects; client-provided tools are preserved
  • Makes MCP event filtering unconditionalconfigured_mcp_labels is now always populated from configuration.mcp_servers, so server-deployed MCP streaming events are always filtered regardless of the X-LCS-Merge-Server-Tools header

The X-LCS-Merge-Server-Tools header continues to control tool merging behavior (resolve_client_tool_choice vs resolve_tool_choice) unchanged.

All sanitization is done via a new _sanitize_response_dict() helper, applied in both the streaming and non-streaming response paths.

Testing

Added 13 new unit tests:

  • TestSanitizeResponseDict (11 tests) — unit tests for each sanitization behavior
  • TestMcpEventsFilteredUnconditionally (2 tests) — integration tests verifying unconditional MCP event filtering

Jira

RSPEED-2885

Summary by CodeRabbit

  • New Features

    • Responses now replace server-applied system instructions with a literal placeholder and filter out configured server-side tools/events from responses.
  • Bug Fixes

    • Sanitization consistently applied to streamed chunks and final responses to prevent exposure of substituted instructions or server-only tools.
  • Documentation

    • Updated docs to explain placeholder behavior and server tool filtering.
  • Tests

    • Added unit and integration tests for sanitization and event/tool filtering.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new constant and a helper that can replace substituted instructions with a placeholder and remove server-deployed MCP tools from serialized response dicts; adds tests exercising that helper and updates documentation to reflect the new placeholder and tool-filtering behavior.

Changes

Cohort / File(s) Summary
Responses helper
src/app/endpoints/responses.py
Introduces _sanitize_response_dict(response_dict, configured_mcp_labels, instructions_substituted=False) which mutates a serialized response dict in-place: replaces instructions with SUBSTITUTED_INSTRUCTIONS_PLACEHOLDER when instructions_substituted=True, and filters tools by removing entries whose server_label is in configured_mcp_labels. No other invocation points are shown in the provided diff.
Constants
src/constants.py
Adds SUBSTITUTED_INSTRUCTIONS_PLACEHOLDER = "<server prompt applied>".
Tests (responses)
tests/unit/app/endpoints/test_responses.py
Adds TestSanitizeResponseDict unit tests for instructions placeholder behavior and tool filtering (handles missing fields, empty configured labels). Adds async tests (TestMcpEventsFilteredUnconditionally) covering streaming event filtering behavior when MCP servers are configured vs. not configured; updates imports to include the new helper/constant/types.
Docs
docs/responses.md
Updates “System Prompt Resolution” to state that when the server substitutes the system prompt the response instructions is replaced with the literal "<server prompt applied>", and that server-deployed MCP tool definitions are filtered out of response tools while client-provided tools are preserved.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: filtering internal details from the /v1/responses stream, which aligns with the PR's core objective of sanitizing response objects to prevent implementation details from reaching clients.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Lifto Lifto force-pushed the feat/rspeed-2885-filter-responses-internals branch from 5b571d9 to 31279ad Compare April 16, 2026 18:18
@tisnik tisnik requested a review from asimurka April 16, 2026 18:39
@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Apr 16, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

✅ Actions performed

Full review triggered.

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Apr 16, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 Security concerns

No
followed all instructions

⚡ No major issues detected

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/responses.py (2)

675-699: ⚠️ Potential issue | 🟠 Major

Keep persisted turn summaries aligned with the sanitized response.

The client payload is now sanitized unconditionally, but turn_summary is still built from the raw response_object and remains gated by filter_server_tools. With the header absent, server-deployed MCP calls disappear from the returned response yet still flow into turn_summary/store_query_results, so stored state no longer matches what the client actually saw.

Also applies to: 758-764, 921-950

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 675 - 699, The persisted turn
summary is still derived from the raw response_object while the outgoing chunk
was sanitized (via configured_mcp_labels, server_mcp_output_indices,
_should_filter_mcp_chunk and _sanitize_response_dict), so stored state can
diverge from what the client saw; update the code paths that build turn_summary
and call store_query_results (the places around the response handling where
response_object / filter_server_tools are used) to use the same sanitized
chunk_dict/response (or re-run the same filtering and _sanitize_response_dict
logic) before constructing turn_summary and persisting results so persisted
summaries match the sanitized client payload.

675-699: ⚠️ Potential issue | 🔴 Critical

Apply the same sanitization to the shield-blocked SSE path.

These lines only sanitize chunks that flow through response_generator(). When moderation blocks a streaming request, handle_streaming_response() switches to shield_violation_generator(), which still serializes request.echoed_params() directly into response.created and response.completed. That means blocked SSE responses can still leak instructions, merged server MCP tools, and unsanitized model IDs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 675 - 699, Shielded/blocked SSE
responses from shield_violation_generator are not being sanitized, so ensure the
same sanitization applied in response_generator runs for the shield path: where
handle_streaming_response/ shield_violation_generator serializes
request.echoed_params() into response.created and response.completed, call
_sanitize_response_dict(...) with the same configured_mcp_labels (and preserve
setting response["conversation"]=normalized_conv_id) and update any model
IDs/instructions/mcp tool lists before sending; reuse the existing helpers
(_should_filter_mcp_chunk, _sanitize_response_dict, configured_mcp_labels,
server_mcp_output_indices, normalized_conv_id) to perform this sanitization so
blocked SSE chunks cannot leak instructions, merged server MCP tools, or raw
model IDs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 531-537: The current filter removes any tool whose server_label
matches configured_mcp_labels, which can accidentally drop client-supplied
tools; change the predicate in the response_dict "tools" cleanup so it only
excludes tools that both have server_label in configured_mcp_labels and are
explicitly server-injected (e.g., check a marker like tool.get("mcp_injected")
or tool.get("source") == "server"). Update the logic around the tools variable
and response_dict["tools"] so client-provided tools without the injected marker
are preserved while real injected MCP tools are still filtered.
- Around line 501-537: The helper _sanitize_response_dict currently mutates the
input response_dict; change it to be pure by returning a sanitized copy instead:
create a shallow copy of response_dict at the start (e.g., new =
dict(response_dict)), remove "instructions" from the copy, transform the "model"
value on the copy if it's a string with more than one "/", and build a new
"tools" list by filtering and copying each tool dict so callers are not mutated;
update the function signature to return dict[str, Any] and adjust all callers of
_sanitize_response_dict to use the returned sanitized dict instead of relying on
in-place modification.

---

Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 675-699: The persisted turn summary is still derived from the raw
response_object while the outgoing chunk was sanitized (via
configured_mcp_labels, server_mcp_output_indices, _should_filter_mcp_chunk and
_sanitize_response_dict), so stored state can diverge from what the client saw;
update the code paths that build turn_summary and call store_query_results (the
places around the response handling where response_object / filter_server_tools
are used) to use the same sanitized chunk_dict/response (or re-run the same
filtering and _sanitize_response_dict logic) before constructing turn_summary
and persisting results so persisted summaries match the sanitized client
payload.
- Around line 675-699: Shielded/blocked SSE responses from
shield_violation_generator are not being sanitized, so ensure the same
sanitization applied in response_generator runs for the shield path: where
handle_streaming_response/ shield_violation_generator serializes
request.echoed_params() into response.created and response.completed, call
_sanitize_response_dict(...) with the same configured_mcp_labels (and preserve
setting response["conversation"]=normalized_conv_id) and update any model
IDs/instructions/mcp tool lists before sending; reuse the existing helpers
(_should_filter_mcp_chunk, _sanitize_response_dict, configured_mcp_labels,
server_mcp_output_indices, normalized_conv_id) to perform this sanitization so
blocked SSE chunks cannot leak instructions, merged server MCP tools, or raw
model IDs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c1847a9-af28-472f-85be-df3128ccba58

📥 Commits

Reviewing files that changed from the base of the PR and between 0f95aca and 31279ad.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py

Comment on lines +501 to +537
def _sanitize_response_dict(
response_dict: dict[str, Any],
configured_mcp_labels: set[str],
) -> None:
"""Sanitize a serialized response object in-place to remove internal details.

Strips fields that expose server-side implementation details from the
response object before it is forwarded to the client:

- ``instructions``: the server-side system prompt
- ``model``: deep provider paths (2+ slashes) are truncated to the
last segment (e.g.
``google-vertex/publishers/google/models/gemini`` → ``gemini``);
simple ``provider/model`` strings are left intact
- ``tools``: server-deployed MCP tool definitions are removed; client-
provided tools (those whose ``server_label`` is not in
``configured_mcp_labels``) are preserved

Args:
response_dict: Mutable dict produced by ``model_dump`` on a response
object. Modified in-place.
configured_mcp_labels: Set of ``server_label`` values that identify
server-deployed MCP servers.
"""
response_dict.pop("instructions", None)

model = response_dict.get("model")
if isinstance(model, str) and model.count("/") > 1:
response_dict["model"] = model.rsplit("/", 1)[-1]

tools = response_dict.get("tools")
if tools is not None:
response_dict["tools"] = [
tool
for tool in tools
if tool.get("server_label") not in configured_mcp_labels
]
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Return a sanitized copy instead of mutating the input dict.

Making this helper pure will avoid hidden caller side effects and make the two response paths easier to reason about.

As per coding guidelines "Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 501 - 537, The helper
_sanitize_response_dict currently mutates the input response_dict; change it to
be pure by returning a sanitized copy instead: create a shallow copy of
response_dict at the start (e.g., new = dict(response_dict)), remove
"instructions" from the copy, transform the "model" value on the copy if it's a
string with more than one "/", and build a new "tools" list by filtering and
copying each tool dict so callers are not mutated; update the function signature
to return dict[str, Any] and adjust all callers of _sanitize_response_dict to
use the returned sanitized dict instead of relying on in-place modification.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-place mutation is intentional here. The dict is created by model_dump() on the line immediately before the call — it is a throwaway serialization dict, not a shared data structure. Returning a copy would allocate a new dict for every SSE chunk with zero safety benefit, since nothing else holds a reference to the original.

The AGENTS.md guideline about avoiding in-place modification targets a different anti-pattern: functions that silently mutate a caller's long-lived data structure via a parameter. That does not apply here — the dict is born, sanitized, serialized to JSON, and discarded within the same async for iteration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lifto, that's a well-reasoned clarification — you're right. The guideline is aimed at functions that silently mutate a caller's long-lived/shared data structure through a parameter reference, which is a genuine source of bugs. Here the dict is born from model_dump(), passed straight into _sanitize_response_dict, serialized, and discarded — there's no aliasing risk and no wasted allocation. The in-place mutation is appropriate and intentional. I'll withdraw the suggestion.

(ᵔᴥᵔ) 🐇✏️


✏️ Learnings added
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Comment thread src/app/endpoints/responses.py Outdated
Prevent internal implementation details from leaking through the /v1/responses
SSE stream:

- Strip 'instructions' field (system prompt) from response objects
- Sanitize 'model' field by removing provider prefix
  (e.g. google-vertex/publishers/google/models/gemini → gemini)
- Strip server-deployed MCP tool definitions from the 'tools' array
  in response objects (client-provided tools are preserved)
- Make MCP streaming event filtering unconditional — mcp_call,
  mcp_list_tools, and mcp_approval_request events for server-deployed
  tools are now always filtered, not only when X-LCS-Merge-Server-Tools
  header is present

The X-LCS-Merge-Server-Tools header continues to control tool merging
behavior (resolve_client_tool_choice vs resolve_tool_choice) unchanged.

Fixes: https://redhat.atlassian.net/browse/RSPEED-2885

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@Lifto Lifto force-pushed the feat/rspeed-2885-filter-responses-internals branch from 31279ad to da6ae27 Compare April 16, 2026 18:53
@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 16, 2026

Replies to CodeRabbit review comments

Comment 3 — "Keep persisted turn summaries aligned with the sanitized response"

This divergence is intentional. The sanitization controls what the client sees — we strip implementation details from the wire format. The server-side turn summary, metrics, and conversation storage intentionally retain the full context (including server-deployed MCP tool calls) because that data is needed for debugging, audit trails, and accurate token accounting. Aligning the persisted state with the sanitized client view would lose information the server needs.

The filter_server_tools flag in build_turn_summary (from PR #1435) already handles the case where the client explicitly requests tool merging — in that mode, the summary is also filtered. Our change does not alter that behavior.

Comment 4 — "Apply sanitization to shield-blocked SSE path"

This is a valid observation. When moderation blocks a request, shield_violation_generator() builds a synthetic response object using **echoed_params from the request. The echoed_params() method (in src/models/requests.py) dumps _ECHOED_FIELDS, which includes "instructions". Because responses_endpoint_handler overwrites responses_request.instructions with get_system_prompt() at line 182 before the moderation check, the shield response will echo back the server-side system prompt.

This is a pre-existing leak that exists on main today, independent of this PR. Our diff does not touch shield_violation_generator(), echoed_params(), or _ECHOED_FIELDS. We could demonstrate this leak with a test against the current main branch:

@pytest.mark.asyncio
async def test_shield_blocked_response_leaks_instructions(mocker, minimal_config):
    """Shield-blocked responses echo server-set instructions via echoed_params."""
    # 1. Create request, let responses_endpoint_handler set instructions via get_system_prompt()
    # 2. Trigger moderation block (mock moderation_result.decision = "blocked")  
    # 3. Collect SSE output from shield_violation_generator()
    # 4. Assert "instructions" IS present in response.created / response.completed
    #    → This test PASSES on main, proving the leak is pre-existing

That said — this PR makes the leak more visible by establishing the pattern that instructions should be stripped from client-facing responses. Now that response_generator() strips it, it is more obvious that shield_violation_generator() does not.

We think the right fix is to either remove "instructions" from _ECHOED_FIELDS in requests.py, or call _sanitize_response_dict() on the shield response dict before serialization. But that is a separate change to code outside this diff. We will file a follow-up ticket for it.

@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 16, 2026

Follow-up filed: RSPEED-2889 — shield-blocked /v1/responses SSE leaks server-side system prompt via echoed_params. Pre-existing bug in the moderation path, out of scope for this PR.

@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 16, 2026

Correction: follow-up ticket moved to the correct project. LCORE-1851 — shield-blocked /v1/responses SSE leaks server-side system prompt via echoed_params. Pre-existing bug in the moderation path, out of scope for this PR. (RSPEED-2889 closed as duplicate.)

@dprince
Copy link
Copy Markdown
Contributor

dprince commented Apr 17, 2026

@Lifto is this for rhel-lightspeed? Initially I think I had the filtering always on so this seems resonable but would like to review. But curious if this is using Goose are you not using X-LCS-Merge-Server-Tools (that would leave Goose extensions mostly useless)?

Copy link
Copy Markdown
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lifto Tools filtering is reasonable as currently client tools are leaked in tools attribute.
I agree that echoing huge system prompts is unnecessary. However, removing them from the response entirely can be surprising as it is a native responses behavior.
Wdyt about echoing fixed-length prefix of the system prompt with predefined constant in constants.py e.g. 100?
Anyway, please add a short comment to docs/responses.md to document this (there is already a section dedicated to system prompt/instructions resolution). Thanks.

Copy link
Copy Markdown
Contributor

@jrobertboos jrobertboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with @asimurka also was wondering about some style changes and if they are necessary for this PR.

Comment thread src/app/endpoints/responses.py Outdated
responses_request.tool_choice,
auth[1],
mcp_headers,
request.headers,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason behind this change?

Comment thread src/app/endpoints/responses.py Outdated
responses_request.tool_choice,
auth[1],
mcp_headers,
request.headers,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 17, 2026

@dprince Yes, this is for rhel-lightspeed — the filtering prevents leaking the server-side system prompt (instructions) and server-deployed MCP tool definitions to the client.

On the Goose question: the filtering doesn't disable server tools, it hides the internal events from the client's SSE stream. When X-LCS-Merge-Server-Tools: true is set, the server still merges its tools, the model still calls them, and the server still executes them internally — the model's response still benefits from the server tools. The filtering just strips the server tool definitions and call events from what the client sees, since the client doesn't execute those anyway.

Goose's own extensions (client-provided tools) are unaffected — their server_label doesn't match configured_mcp_labels, so they pass through the filter.

The two concerns are complementary: X-LCS-Merge-Server-Tools controls what the model has access to; the filtering controls what the client sees.

Assisted by Claude Opus 4.6

When the server substitutes its own system prompt, the response now
sets instructions to '<server prompt applied>' instead of stripping
the field entirely. Client-provided instructions are echoed back
unchanged, preserving native Responses API behavior.

Adds documentation to docs/responses.md covering the filtering
behavior for instructions and server-deployed MCP tools.

RSPEED-2885

Signed-off-by: Ellis Low <elow@redhat.com>
@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 17, 2026

@asimurka Thanks for the feedback. Updated the approach — instead of stripping instructions entirely or echoing a truncated prefix:

  • When the server substituted its own system prompt (client sent None), the response sets instructions to the placeholder <server prompt applied>. This avoids leaking any part of the server prompt while still signaling that a prompt was applied.
  • When the client provides their own instructions and per-request customization is allowed, they are echoed back unchanged.

This preserves native Responses API behavior for clients that send instructions, while preventing the leak when the server injects its own.

Also added documentation to docs/responses.md in the existing System Prompt Resolution section.

Assisted by Claude Opus 4.6

Comment thread src/app/endpoints/responses.py Outdated
def _sanitize_response_dict(
response_dict: dict[str, Any],
configured_mcp_labels: set[str],
instructions_substituted: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably have a default value of False.

Comment thread src/app/endpoints/responses.py Outdated
Comment on lines +542 to +548
tools = response_dict.get("tools")
if tools is not None:
response_dict["tools"] = [
tool
for tool in tools
if tool.get("server_label") not in configured_mcp_labels
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be simplified a bit.

Suggested change
tools = response_dict.get("tools")
if tools is not None:
response_dict["tools"] = [
tool
for tool in tools
if tool.get("server_label") not in configured_mcp_labels
]
if tools := response_dict.get("tools"):
response_dict["tools"] = [
tool
for tool in tools
if tool.get("server_label") not in configured_mcp_labels
]

- Revert unintended tuple unpacking style change (jrobertboos)
- Add default value for instructions_substituted param (samdoran)
- Use walrus operator for tools filtering (samdoran)

RSPEED-2885

Signed-off-by: Ellis Low <elow@redhat.com>
@Lifto
Copy link
Copy Markdown
Contributor Author

Lifto commented Apr 17, 2026

@jrobertboos Unintended style change — not from the formatter. Reverted.

@samdoran Both nits addressed: added default value for instructions_substituted, and switched to walrus operator for tools filtering.

Assisted by Claude Opus 4.6

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 500-537: The _sanitize_response_dict helper is defined but never
used; call it on the serialized response in both streaming and non-streaming
code paths: for the streaming path, after building chunk_dict and before
yielding (i.e., where chunk_dict["response"] is set and alongside existing
_should_filter_mcp_chunk/_is_server_mcp_output_item checks) invoke
_sanitize_response_dict(chunk_dict["response"], configured_mcp_labels,
instructions_substituted=...) and for the non-streaming path, after
api_response_dict = api_response.model_dump(exclude_none=True) and before
translating tool IDs call _sanitize_response_dict(api_response_dict,
configured_mcp_labels, instructions_substituted=...). Also wire the correct
boolean for instructions_substituted (set it where the server decides to
substitute instructions and pass that variable into both calls) so
SUBSTITUTED_INSTRUCTIONS_PLACEHOLDER will be applied when appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 92d5d4bd-7daa-4084-9c61-3c3d1ca57f16

📥 Commits

Reviewing files that changed from the base of the PR and between 68c0057 and 3d970d4.

📒 Files selected for processing (1)
  • src/app/endpoints/responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.
📚 Learning: 2026-04-16T19:08:26.366Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-16T19:08:35.441Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports

Applied to files:

  • src/app/endpoints/responses.py
🔇 Additional comments (2)
src/app/endpoints/responses.py (2)

500-537: Consider returning the sanitized dict rather than mutating in place.

Per AGENTS.md ("Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters"), a pure helper is generally preferred. I understand from prior discussion on this PR that the input is always a fresh model_dump() throwaway so there's no aliasing risk, and the decision to mutate in-place is intentional — acknowledging that and not asking for a change. Noting here only so future readers aren't surprised by the -> None signature on a function named _sanitize_….

Based on learnings: "_sanitize_response_dict(...) intentionally mutates response_dict in-place. The dict is always a fresh throwaway produced by model_dump() on the immediately preceding line … Do not flag this as an anti-pattern in future reviews."


531-536: 🧹 Nitpick | 🔵 Trivial

Tool filtering assumes each tool is a dict.

tool.get("server_label") will raise AttributeError if any entry in response_dict["tools"] is not a mapping (e.g., already-unserialized pydantic model, or an unexpected shape from a future Llama Stack schema change). Since response_dict comes from model_dump() today that's fine in practice, but a defensive isinstance(tool, dict) guard (or getattr(tool, "get", lambda _ : None)) would make this safer against schema drift and the non-streaming call path where response_dict may be assembled differently.

Not blocking — flagging as optional hardening.

⛔ Skipped due to learnings
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Comment thread src/app/endpoints/responses.py
The previous nit-fix commit (3d970d4) accidentally removed:
- _sanitize_response_dict call sites in both streaming and non-streaming paths
- instructions_substituted computation and threading
- unconditional configured_mcp_labels computation

Restored from working commit 68c0057 and reapplied only the three
nit fixes (tuple unpacking style, default param value, walrus operator).

RSPEED-2885

Signed-off-by: Ellis Low <elow@redhat.com>
@Lifto Lifto force-pushed the feat/rspeed-2885-filter-responses-internals branch from 071ec8e to 90b0d0e Compare April 17, 2026 20:20
Comment thread src/app/endpoints/responses.py Outdated
instructions_substituted = (
client_instructions is None
or client_instructions != responses_request.instructions
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is too complicated. Checking the value of the original instructions before applying get_system_prompt() is sufficient:

  • if instructions is None` -> server-side system prompt will be applied
  • if instructions are specified per-request, they have the highest priority -> no override

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — simplified to just client_instructions is None. The second condition was always redundant: when the client provides instructions, get_system_prompt() returns them unchanged, and when per-request customization is disabled it raises 422 before reaching this line.

Lifto and others added 2 commits April 20, 2026 10:37
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Copy link
Copy Markdown
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 95fced9 into lightspeed-core:main Apr 20, 2026
26 of 27 checks passed
major pushed a commit to major/lightspeed-stack that referenced this pull request Apr 20, 2026
…ilter-responses-internals

RSPEED-2885: filter internal details from /v1/responses stream
Anxhela21 pushed a commit to Anxhela21/lightspeed-stack that referenced this pull request Apr 27, 2026
…ilter-responses-internals

RSPEED-2885: filter internal details from /v1/responses stream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants