RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator by Lifto · Pull Request #1510 · lightspeed-core/lightspeed-stack

Lifto · 2026-04-15T18:25:21Z

Description

Two targeted input-size guardrails:

/v1/infer — raise question limit from 10 KB → 32 KB
The CLA client enforces a 32,000-character input limit (MAX_QUESTION_SIZE in command_line_assistant/commands/chat.py). The previous 10,240-character cap on the question field caused legitimate CLA requests to be rejected with a spurious 422. Raised max_length in RlsapiV1InferRequest to 32,768 to comfortably accommodate the CLA limit with headroom.
/v1/responses — add 64 KB whole-body size guard
The /responses endpoint previously accepted arbitrarily-sized payloads. A new @model_validator(mode='before') on ResponsesRequest serialises the raw request with json.dumps and rejects anything whose total character count exceeds 65,536. Using mode='before' ensures the limit applies to the actual client payload, not the expanded form Pydantic produces after applying defaults. Returns 422 (not 413, which is reserved for LLM context-window exceeded).

Type of change

Tools used to create PR

Assisted-by: Claude (Anthropic)
Generated by: opencode / Atlas orchestrator

Related Tickets & Documents

Related Issue: RSPEED-2875

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Unit tests — run with uv run make test-unit:

tests/unit/models/requests/test_responses_request.py — 4 new boundary tests:
- Normal request ({"input": "hello"}) accepted
- At-limit request (exactly 65,536 chars serialised) accepted
- One-over-limit request (65,537 chars) rejected with clear error message containing actual and max size
- Large list input (>64 KB) rejected

Integration tests — run with uv run make test-integration:

tests/integration/endpoints/test_rlsapi_v1_integration.py::test_infer_size_limit[question] — confirms HTTP 422 for a 32,769-char question (one over the new limit); the other three parametrised cases (stdin, attachment, terminal) remain at 65,536.

Linters — uv run make verify passes clean: black, ruff, pylint 10/10, pyright 0 errors, pydocstyle, mypy.

Files changed

File	Change
`src/models/rlsapi/requests.py`	`max_length=10_240` → `max_length=32_768` on `question` field
`src/models/requests.py`	`import json` + new `validate_body_size` model validator on `ResponsesRequest`
`tests/unit/models/rlsapi/test_requests.py`	Parametrised value updated to 32,768
`tests/integration/endpoints/test_rlsapi_v1_integration.py`	Payload changed to `"?" * 32_769`
`tests/unit/models/requests/test_responses_request.py`	New — 4 boundary tests for the body-size validator

Summary by CodeRabbit

New Features
- Added request body size validation for /v1/responses endpoint with a maximum limit of 65,536 characters.
Improvements
- Increased maximum question field length for /v1/infer endpoint to 32,768 characters.

coderabbitai · 2026-04-15T18:25:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 95749187-e951-4fd0-91b7-e0b0032ae848

📥 Commits

Reviewing files that changed from the base of the PR and between 4e8d0a0 and fb3efda.

📒 Files selected for processing (7)

docs/openapi.json
src/constants.py
src/models/requests.py
src/models/rlsapi/requests.py
tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/requests/test_responses_request.py
tests/unit/models/rlsapi/test_requests.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Pyright
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: integration_tests (3.12)
GitHub Check: integration_tests (3.13)
GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/unit/models/rlsapi/test_requests.py
tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/requests/test_responses_request.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/models/rlsapi/test_requests.py
tests/unit/models/requests/test_responses_request.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/constants.py
src/models/rlsapi/requests.py
src/models/requests.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

src/models/rlsapi/requests.py
src/models/requests.py

🧠 Learnings (12)

📓 Common learnings

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:57.901Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

📚 Learning: 2026-04-15T18:54:09.157Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

tests/unit/models/rlsapi/test_requests.py
src/constants.py
src/models/rlsapi/requests.py
tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/requests/test_responses_request.py

📚 Learning: 2026-04-15T18:53:57.901Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:57.901Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.

Applied to files:

src/constants.py
src/models/rlsapi/requests.py
tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/requests/test_responses_request.py
src/models/requests.py

📚 Learning: 2026-04-07T14:44:42.022Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

src/models/rlsapi/requests.py
src/models/requests.py

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

src/models/rlsapi/requests.py

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/rlsapi/requests.py
src/models/requests.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/rlsapi/requests.py
src/models/requests.py

📚 Learning: 2026-04-15T18:54:07.540Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

src/models/requests.py

📚 Learning: 2026-02-23T14:11:46.950Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

src/models/requests.py

📚 Learning: 2026-04-06T20:18:11.336Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.

Applied to files:

src/models/requests.py

📚 Learning: 2026-04-16T19:08:26.366Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Applied to files:

src/models/requests.py

📚 Learning: 2026-04-16T19:08:35.441Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

src/models/requests.py

🔇 Additional comments (7)

src/constants.py (1)

232-236: LGTM!

New constants are well-named, documented with KiB units, and values match the PR's stated limits (32 KiB / 64 KiB).

src/models/rlsapi/requests.py (1)

5-5: LGTM!

Swapping the literal 10_240 for RLSAPI_V1_QUESTION_MAX_LENGTH (32_768) aligns /v1/infer with the CLA client's 32,000-char limit. The absolute import follows the repository convention, and the OpenAPI schema was already updated per prior review.

Also applies to: 181-181

docs/openapi.json (1)

14679-14679: OpenAPI limit update is consistent with the API change.

Line 14679 correctly updates question.maxLength to 32768, matching the intended /v1/infer contract increase.

tests/unit/models/rlsapi/test_requests.py (1)

10-10: LGTM — constant-driven boundary test.

Importing RLSAPI_V1_QUESTION_MAX_LENGTH and using it in the test_value_max_length parametrization correctly keeps the boundary assertion aligned with the production max_length on RlsapiV1InferRequest.question, and the existing 65_536 literals for the other three fields remain intentionally inline per the established convention.

Also applies to: 640-680

src/models/requests.py (1)

743-775: LGTM — body-size guard is sound and well-scoped.

mode="before" correctly measures the client-submitted payload rather than Pydantic-expanded defaults, and the TypeError/ValueError escape hatch cleanly handles programmatic/model-instance construction where the size guard is not meaningful. The error message embeds both the actual and the limit, which downstream tests rely on. Per previously captured learnings, the json.dumps default parameters (conservative overcount) and use of RESPONSES_REQUEST_MAX_SIZE from constants.py are intentional here.

tests/unit/models/requests/test_responses_request.py (1)

1-57: LGTM — comprehensive boundary coverage.

The _OVERHEAD derivation correctly anticipates that mode="before" sees only the constructor kwargs (not the full model with defaults), so {"input": ""} is the right reference shape for the at-limit calculation. The four cases (normal, at-limit, one-over, list-shaped oversize) cover the validator's behavior well, and asserting both "65536" and str(_LIMIT + 1) pins the error-message contract without being over-strict.

tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

539-555: LGTM — question boundary now tracks the shared constant.

Switching the question over-limit payload to "?" * (constants.RLSAPI_V1_QUESTION_MAX_LENGTH + 1) removes the last hardcoded 10_241 and keeps this integration test in lockstep with the model constraint. The other three parametrized cases correctly retain their independent 65_537 (the context-field max_length=65_536 boundary) since they target different fields.

Walkthrough

Adds a JSON-serialized request-body size validator (65,536 chars) to ResponsesRequest and raises the Rlsapi v1 question max length from 10,240 to 32,768 characters; tests and OpenAPI schema updated accordingly.

Changes

Cohort / File(s)	Summary
Request Model Validation `src/models/requests.py`	Added `@model_validator(mode="before")` to `ResponsesRequest` to JSON-serialize incoming values and enforce `RESPONSES_REQUEST_MAX_SIZE` (65,536). Serialization errors skip the size check.
RLSAPI Request Constraint `src/models/rlsapi/requests.py`	Replaced literal `10_240` max length with `RLSAPI_V1_QUESTION_MAX_LENGTH` constant (32,768) for `RlsapiV1InferRequest.question`.
Constants `src/constants.py`	Introduced `RLSAPI_V1_QUESTION_MAX_LENGTH = 32_768` and `RESPONSES_REQUEST_MAX_SIZE = 65_536`.
OpenAPI Schema `docs/openapi.json`	Updated `RlsapiV1InferRequest.question` schema `maxLength` from `10240` to `32768`.
Tests — Unit & Integration `tests/unit/models/requests/test_responses_request.py`, `tests/unit/models/rlsapi/test_requests.py`, `tests/integration/endpoints/test_rlsapi_v1_integration.py`	Added unit tests covering ResponsesRequest JSON-size limit (exact limit, over-limit, list-shaped input). Updated unit and integration tests to use `RLSAPI_V1_QUESTION_MAX_LENGTH` for boundary cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main changes: increasing the /v1/infer question limit to 32KB and adding a body size validator to /v1/responses.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/requests.py`:
- Around line 769-773: Extract the hardcoded 65_536/65536 into a shared constant
(e.g., MAX_REQUEST_BODY_CHARS = 65536) in constants.py, import that constant
into src.models.requests and replace both the numeric literal in the conditional
(len(serialized) > MAX_REQUEST_BODY_CHARS) and the message interpolation so the
message uses the constant value, and update any tests to reference the same
constant to prevent drift.
- Around line 742-769: The validate_body_size model_validator currently
serializes the input with json.dumps defaults which escapes non-ASCII and adds
spaces, inflating the character count; update the validate_body_size classmethod
to import a new constant (e.g., REQUEST_BODY_MAX_CHARS) from constants.py and
use json.dumps(values, ensure_ascii=False, separators=(",", ":")) to get a
compact, accurate wire-format length check, then compare len(serialized) against
REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the
constant in constants.py (set to 65536) and update the imports in
src/models/requests.py accordingly.

In `@src/models/rlsapi/requests.py`:
- Line 180: The OpenAPI schema still documents question.maxLength as 10240 while
the model RlsapiV1InferRequest.question now allows 32768; update the generated
docs/openapi.json so the question schema's maxLength is 32768 (or regenerate the
OpenAPI spec from the updated model/schema definitions) to keep clients and UI
validation consistent with RlsapiV1InferRequest.question.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d58987c7-f102-47e9-9bf5-78c11f847d72

📥 Commits

Reviewing files that changed from the base of the PR and between 7702284 and 4e8d0a0.

📒 Files selected for processing (5)

src/models/requests.py
src/models/rlsapi/requests.py
tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/requests/test_responses_request.py
tests/unit/models/rlsapi/test_requests.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: build-pr
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: bandit
GitHub Check: radon
GitHub Check: mypy
GitHub Check: Pylinter
GitHub Check: integration_tests (3.12)
GitHub Check: integration_tests (3.13)
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/integration/endpoints/test_rlsapi_v1_integration.py
tests/unit/models/rlsapi/test_requests.py
tests/unit/models/requests/test_responses_request.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/models/rlsapi/test_requests.py
tests/unit/models/requests/test_responses_request.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/models/rlsapi/requests.py
src/models/requests.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

src/models/rlsapi/requests.py
src/models/requests.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/rlsapi/requests.py
src/models/requests.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/rlsapi/requests.py
src/models/requests.py

📚 Learning: 2026-02-23T14:11:46.950Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

src/models/requests.py

🔇 Additional comments (3)

tests/unit/models/requests/test_responses_request.py (1)

20-58: Good boundary coverage for the new body-size guard.

These tests validate the critical accept/reject edges (normal, exact limit, one-over, oversized structured input) and assert useful error messaging.

tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

519-535: Boundary update is correct and aligned with the new infer question limit.

Using 32,769 chars for the question case correctly tests the one-over rejection path for max_length=32,768.

tests/unit/models/rlsapi/test_requests.py (1)

639-646: Test parameter update is correct and keeps boundary validation intact.

The question max-length case now matches the new 32_768 constraint and is exercised through the existing at-limit/over-limit test pattern.

coderabbitai · 2026-04-15T18:29:51Z

+    @model_validator(mode="before")
+    @classmethod
+    def validate_body_size(cls, values: Any) -> Any:
+        """Validate that the request body does not exceed the maximum allowed size.
+
+        Serializes the raw request payload to JSON and checks the total character
+        count against the 65,536-character limit.  This guard runs before field
+        coercion so that the limit reflects only what the client actually sent,
+        not the expanded representation produced by Pydantic's defaults.
+
+        Parameters:
+            values: The raw input dict (or other object) passed to the model.
+
+        Returns:
+            Any: ``values`` unchanged when the size check passes.
+
+        Raises:
+            ValueError: If the JSON-serialized size of ``values`` exceeds
+                65,536 characters.
+        """
+        try:
+            serialized = json.dumps(values)
+        except (TypeError, ValueError):
+            # Non-JSON-serializable payload (e.g. programmatic use with Pydantic
+            # model instances).  The size guard only applies to wire-format HTTP
+            # requests which FastAPI always parses into JSON-compatible dicts.
+            return values
+        if len(serialized) > 65_536:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import json samples = [ '{"input":"é"}', # non-ASCII character '{"input":"' + ("x" * 20) + '"}' # compact ASCII payload ] for raw in samples: obj = json.loads(raw) current = json.dumps(obj) # current implementation compact = json.dumps(obj, separators=(",", ":"), ensure_ascii=False) print("raw_len=", len(raw), "current_len=", len(current), "compact_len=", len(compact)) print("current:", current) print("compact:", compact) print("---") PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 308

🏁 Script executed:

# Verify the file exists and check the relevant lines head -n 775 src/models/requests.py | tail -n 35

Repository: lightspeed-core/lightspeed-stack

Length of output: 1508

🏁 Script executed:

# Check if a constant is defined for the 65536 limit in constants.py find . -name "constants.py" -type f -exec grep -l "65536\|65_536" {} \;

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Search for any reference to body size limits in the codebase rg "65.?536" --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2249

Body-size validation overcounts non-ASCII and compact JSON payloads.

The validator uses json.dumps(values) with default parameters, which applies ASCII escaping and spaced separators. This inflates the character count compared to the actual client payload—particularly for non-ASCII characters (which expand 40%+ when escaped as \uXXXX) and even ASCII payloads (which gain spaces after colons). Requests near the 65,536-character boundary will be incorrectly rejected.

Additionally, the magic number 65_536 should be extracted to constants.py per the coding guidelines.

💡 Suggested fix

- serialized = json.dumps(values) + serialized = json.dumps(values, separators=(",", ":"), ensure_ascii=False)

Also extract the limit to constants.py and import it here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 742 - 769, The validate_body_size model_validator currently serializes the input with json.dumps defaults which escapes non-ASCII and adds spaces, inflating the character count; update the validate_body_size classmethod to import a new constant (e.g., REQUEST_BODY_MAX_CHARS) from constants.py and use json.dumps(values, ensure_ascii=False, separators=(",", ":")) to get a compact, accurate wire-format length check, then compare len(serialized) against REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the constant in constants.py (set to 65536) and update the imports in src/models/requests.py accordingly.

Verified — no change needed.

The default json.dumps separators add ~0.2% overhead (a few hundred chars on a 64KB payload). This makes the validator slightly more conservative (rejects marginally earlier), which is the safe direction. The ensure_ascii=True default similarly inflates non-ASCII — again, conservative.

An exact wire-format match is impossible regardless of separator/encoding choices, because Python's json.dumps output is never byte-identical to what the client sent (key ordering, whitespace, encoding all vary by client). Adding ensure_ascii=False, separators=(",",":") would create a false sense of precision without meaningfully changing the behavior.

The current implementation is correct and intentionally simple.

@Lifto, that's a well-reasoned explanation — the conservative overcount is clearly the right trade-off here, and the "false precision" point about separators/ensure_ascii is a good one. I'll withdraw the suggestion.

(╯°□°）╯ ← ┻━┻

✏️ Learnings added

Learnt from: Lifto Repo: lightspeed-core/lightspeed-stack PR: 1510 File: src/models/requests.py:742-769 Timestamp: 2026-04-15T18:53:54.785Z Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: major Repo: lightspeed-core/lightspeed-stack PR: 1469 File: src/models/config.py:1928-1933 Timestamp: 2026-04-07T14:44:42.022Z Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Learnt from: blublinsky Repo: lightspeed-core/lightspeed-stack PR: 972 File: src/models/config.py:459-513 Timestamp: 2026-01-12T10:58:40.230Z Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 1211 File: src/models/responses.py:8-16 Timestamp: 2026-02-25T07:46:33.545Z Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

tisnik · 2026-04-16T13:42:20Z

/ok-to-test

tisnik

Please rebase, the OpenAPI spec probably should not be there

asimurka

LGTM in overall. Please add those limits to constants.py to avoid magic numbers in code.

…y size validator - Raise question field max_length in RlsapiV1InferRequest from 10,240 to 32,768 chars to match the CLA client's 32KB input limit. Previously, valid CLA inputs would be rejected with a spurious 422. - Add @model_validator(mode='before') to ResponsesRequest that serialises the raw request body with json.dumps and rejects payloads whose total character count exceeds 65,536. The /v1/responses endpoint previously accepted arbitrarily large payloads without any guard. - Update the existing parametrised unit test and integration size-limit test to match the new 32,768 threshold. - Add tests/unit/models/requests/test_responses_request.py with four boundary tests: normal request accepted, at-limit (65,536 chars) accepted, one-over (65,537) rejected with clear error message, and large list input rejected. Relates-to: RSPEED-2875

tisnik

LGTM

asimurka

LGTM

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Lifto force-pushed the feat/rspeed-2875-input-size-limits branch from 4e8d0a0 to 7ef053a Compare April 15, 2026 18:37

openshift-ci Bot added the ok-to-test label Apr 16, 2026

tisnik requested a review from asimurka April 17, 2026 11:50

tisnik requested changes Apr 17, 2026

View reviewed changes

asimurka requested changes Apr 17, 2026

View reviewed changes

Lifto force-pushed the feat/rspeed-2875-input-size-limits branch from 7ef053a to fb3efda Compare April 17, 2026 15:53

tisnik approved these changes Apr 17, 2026

View reviewed changes

tisnik requested a review from asimurka April 17, 2026 15:59

asimurka approved these changes Apr 17, 2026

View reviewed changes

tisnik merged commit 3c291f8 into lightspeed-core:main Apr 17, 2026
27 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator#1510

RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator#1510
tisnik merged 1 commit into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2875-input-size-limits

Lifto commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 15, 2026 •

edited

Loading

Uh oh!

Lifto Apr 15, 2026

Uh oh!

coderabbitai Bot Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

tisnik commented Apr 16, 2026

Uh oh!

tisnik left a comment

Uh oh!

asimurka left a comment

Uh oh!

tisnik left a comment

Uh oh!

asimurka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lifto commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Files changed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lifto Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tisnik commented Apr 16, 2026

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

asimurka left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

asimurka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lifto commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot Apr 15, 2026 •

edited

Loading