Skip to content

RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator#1510

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2875-input-size-limits
Apr 17, 2026
Merged

RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator#1510
tisnik merged 1 commit into
lightspeed-core:mainfrom
Lifto:feat/rspeed-2875-input-size-limits

Conversation

@Lifto
Copy link
Copy Markdown
Contributor

@Lifto Lifto commented Apr 15, 2026

Description

Two targeted input-size guardrails:

  1. /v1/infer — raise question limit from 10 KB → 32 KB
    The CLA client enforces a 32,000-character input limit (MAX_QUESTION_SIZE in command_line_assistant/commands/chat.py). The previous 10,240-character cap on the question field caused legitimate CLA requests to be rejected with a spurious 422. Raised max_length in RlsapiV1InferRequest to 32,768 to comfortably accommodate the CLA limit with headroom.

  2. /v1/responses — add 64 KB whole-body size guard
    The /responses endpoint previously accepted arbitrarily-sized payloads. A new @model_validator(mode='before') on ResponsesRequest serialises the raw request with json.dumps and rejects anything whose total character count exceeds 65,536. Using mode='before' ensures the limit applies to the actual client payload, not the expanded form Pydantic produces after applying defaults. Returns 422 (not 413, which is reserved for LLM context-window exceeded).

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude (Anthropic)
  • Generated by: opencode / Atlas orchestrator

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Unit tests — run with uv run make test-unit:

  • tests/unit/models/requests/test_responses_request.py — 4 new boundary tests:
    • Normal request ({"input": "hello"}) accepted
    • At-limit request (exactly 65,536 chars serialised) accepted
    • One-over-limit request (65,537 chars) rejected with clear error message containing actual and max size
    • Large list input (>64 KB) rejected

Integration tests — run with uv run make test-integration:

  • tests/integration/endpoints/test_rlsapi_v1_integration.py::test_infer_size_limit[question] — confirms HTTP 422 for a 32,769-char question (one over the new limit); the other three parametrised cases (stdin, attachment, terminal) remain at 65,536.

Lintersuv run make verify passes clean: black, ruff, pylint 10/10, pyright 0 errors, pydocstyle, mypy.

Files changed

File Change
src/models/rlsapi/requests.py max_length=10_240max_length=32_768 on question field
src/models/requests.py import json + new validate_body_size model validator on ResponsesRequest
tests/unit/models/rlsapi/test_requests.py Parametrised value updated to 32,768
tests/integration/endpoints/test_rlsapi_v1_integration.py Payload changed to "?" * 32_769
tests/unit/models/requests/test_responses_request.py New — 4 boundary tests for the body-size validator

Summary by CodeRabbit

  • New Features

    • Added request body size validation for /v1/responses endpoint with a maximum limit of 65,536 characters.
  • Improvements

    • Increased maximum question field length for /v1/infer endpoint to 32,768 characters.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 95749187-e951-4fd0-91b7-e0b0032ae848

📥 Commits

Reviewing files that changed from the base of the PR and between 4e8d0a0 and fb3efda.

📒 Files selected for processing (7)
  • docs/openapi.json
  • src/constants.py
  • src/models/requests.py
  • src/models/rlsapi/requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/requests/test_responses_request.py
  • tests/unit/models/rlsapi/test_requests.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Pyright
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/models/rlsapi/test_requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/requests/test_responses_request.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/models/rlsapi/test_requests.py
  • tests/unit/models/requests/test_responses_request.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/constants.py
  • src/models/rlsapi/requests.py
  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
🧠 Learnings (12)
📓 Common learnings
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:57.901Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • tests/unit/models/rlsapi/test_requests.py
  • src/constants.py
  • src/models/rlsapi/requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/requests/test_responses_request.py
📚 Learning: 2026-04-15T18:53:57.901Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:57.901Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.

Applied to files:

  • src/constants.py
  • src/models/rlsapi/requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/requests/test_responses_request.py
  • src/models/requests.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/models/rlsapi/requests.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-06T20:18:11.336Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-16T19:08:26.366Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:501-529
Timestamp: 2026-04-16T19:08:26.366Z
Learning: In `src/app/endpoints/responses.py`, `_sanitize_response_dict(response_dict, configured_mcp_labels)` intentionally mutates `response_dict` in-place. The dict is always a fresh throwaway produced by `model_dump()` on the immediately preceding line (both in the streaming `async for` loop and in the non-streaming path); no other reference to it exists. The AGENTS.md guideline "avoid in-place parameter modification" applies to mutating a caller's long-lived/shared data structures, not to ephemeral serialization dicts. Do not flag this as an anti-pattern in future reviews.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-16T19:08:35.441Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:35.441Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • src/models/requests.py
🔇 Additional comments (7)
src/constants.py (1)

232-236: LGTM!

New constants are well-named, documented with KiB units, and values match the PR's stated limits (32 KiB / 64 KiB).

src/models/rlsapi/requests.py (1)

5-5: LGTM!

Swapping the literal 10_240 for RLSAPI_V1_QUESTION_MAX_LENGTH (32_768) aligns /v1/infer with the CLA client's 32,000-char limit. The absolute import follows the repository convention, and the OpenAPI schema was already updated per prior review.

Also applies to: 181-181

docs/openapi.json (1)

14679-14679: OpenAPI limit update is consistent with the API change.

Line 14679 correctly updates question.maxLength to 32768, matching the intended /v1/infer contract increase.

tests/unit/models/rlsapi/test_requests.py (1)

10-10: LGTM — constant-driven boundary test.

Importing RLSAPI_V1_QUESTION_MAX_LENGTH and using it in the test_value_max_length parametrization correctly keeps the boundary assertion aligned with the production max_length on RlsapiV1InferRequest.question, and the existing 65_536 literals for the other three fields remain intentionally inline per the established convention.

Also applies to: 640-680

src/models/requests.py (1)

743-775: LGTM — body-size guard is sound and well-scoped.

mode="before" correctly measures the client-submitted payload rather than Pydantic-expanded defaults, and the TypeError/ValueError escape hatch cleanly handles programmatic/model-instance construction where the size guard is not meaningful. The error message embeds both the actual and the limit, which downstream tests rely on. Per previously captured learnings, the json.dumps default parameters (conservative overcount) and use of RESPONSES_REQUEST_MAX_SIZE from constants.py are intentional here.

tests/unit/models/requests/test_responses_request.py (1)

1-57: LGTM — comprehensive boundary coverage.

The _OVERHEAD derivation correctly anticipates that mode="before" sees only the constructor kwargs (not the full model with defaults), so {"input": ""} is the right reference shape for the at-limit calculation. The four cases (normal, at-limit, one-over, list-shaped oversize) cover the validator's behavior well, and asserting both "65536" and str(_LIMIT + 1) pins the error-message contract without being over-strict.

tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

539-555: LGTM — question boundary now tracks the shared constant.

Switching the question over-limit payload to "?" * (constants.RLSAPI_V1_QUESTION_MAX_LENGTH + 1) removes the last hardcoded 10_241 and keeps this integration test in lockstep with the model constraint. The other three parametrized cases correctly retain their independent 65_537 (the context-field max_length=65_536 boundary) since they target different fields.


Walkthrough

Adds a JSON-serialized request-body size validator (65,536 chars) to ResponsesRequest and raises the Rlsapi v1 question max length from 10,240 to 32,768 characters; tests and OpenAPI schema updated accordingly.

Changes

Cohort / File(s) Summary
Request Model Validation
src/models/requests.py
Added @model_validator(mode="before") to ResponsesRequest to JSON-serialize incoming values and enforce RESPONSES_REQUEST_MAX_SIZE (65,536). Serialization errors skip the size check.
RLSAPI Request Constraint
src/models/rlsapi/requests.py
Replaced literal 10_240 max length with RLSAPI_V1_QUESTION_MAX_LENGTH constant (32,768) for RlsapiV1InferRequest.question.
Constants
src/constants.py
Introduced RLSAPI_V1_QUESTION_MAX_LENGTH = 32_768 and RESPONSES_REQUEST_MAX_SIZE = 65_536.
OpenAPI Schema
docs/openapi.json
Updated RlsapiV1InferRequest.question schema maxLength from 10240 to 32768.
Tests — Unit & Integration
tests/unit/models/requests/test_responses_request.py, tests/unit/models/rlsapi/test_requests.py, tests/integration/endpoints/test_rlsapi_v1_integration.py
Added unit tests covering ResponsesRequest JSON-size limit (exact limit, over-limit, list-shaped input). Updated unit and integration tests to use RLSAPI_V1_QUESTION_MAX_LENGTH for boundary cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: increasing the /v1/infer question limit to 32KB and adding a body size validator to /v1/responses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/requests.py`:
- Around line 769-773: Extract the hardcoded 65_536/65536 into a shared constant
(e.g., MAX_REQUEST_BODY_CHARS = 65536) in constants.py, import that constant
into src.models.requests and replace both the numeric literal in the conditional
(len(serialized) > MAX_REQUEST_BODY_CHARS) and the message interpolation so the
message uses the constant value, and update any tests to reference the same
constant to prevent drift.
- Around line 742-769: The validate_body_size model_validator currently
serializes the input with json.dumps defaults which escapes non-ASCII and adds
spaces, inflating the character count; update the validate_body_size classmethod
to import a new constant (e.g., REQUEST_BODY_MAX_CHARS) from constants.py and
use json.dumps(values, ensure_ascii=False, separators=(",", ":")) to get a
compact, accurate wire-format length check, then compare len(serialized) against
REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the
constant in constants.py (set to 65536) and update the imports in
src/models/requests.py accordingly.

In `@src/models/rlsapi/requests.py`:
- Line 180: The OpenAPI schema still documents question.maxLength as 10240 while
the model RlsapiV1InferRequest.question now allows 32768; update the generated
docs/openapi.json so the question schema's maxLength is 32768 (or regenerate the
OpenAPI spec from the updated model/schema definitions) to keep clients and UI
validation consistent with RlsapiV1InferRequest.question.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d58987c7-f102-47e9-9bf5-78c11f847d72

📥 Commits

Reviewing files that changed from the base of the PR and between 7702284 and 4e8d0a0.

📒 Files selected for processing (5)
  • src/models/requests.py
  • src/models/rlsapi/requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/requests/test_responses_request.py
  • tests/unit/models/rlsapi/test_requests.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: bandit
  • GitHub Check: radon
  • GitHub Check: mypy
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/models/rlsapi/test_requests.py
  • tests/unit/models/requests/test_responses_request.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/models/rlsapi/test_requests.py
  • tests/unit/models/requests/test_responses_request.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/rlsapi/requests.py
  • src/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/models/requests.py
🔇 Additional comments (3)
tests/unit/models/requests/test_responses_request.py (1)

20-58: Good boundary coverage for the new body-size guard.

These tests validate the critical accept/reject edges (normal, exact limit, one-over, oversized structured input) and assert useful error messaging.

tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

519-535: Boundary update is correct and aligned with the new infer question limit.

Using 32,769 chars for the question case correctly tests the one-over rejection path for max_length=32,768.

tests/unit/models/rlsapi/test_requests.py (1)

639-646: Test parameter update is correct and keeps boundary validation intact.

The question max-length case now matches the new 32_768 constraint and is exercised through the existing at-limit/over-limit test pattern.

Comment thread src/models/requests.py Outdated
Comment on lines +742 to +769
@model_validator(mode="before")
@classmethod
def validate_body_size(cls, values: Any) -> Any:
"""Validate that the request body does not exceed the maximum allowed size.

Serializes the raw request payload to JSON and checks the total character
count against the 65,536-character limit. This guard runs before field
coercion so that the limit reflects only what the client actually sent,
not the expanded representation produced by Pydantic's defaults.

Parameters:
values: The raw input dict (or other object) passed to the model.

Returns:
Any: ``values`` unchanged when the size check passes.

Raises:
ValueError: If the JSON-serialized size of ``values`` exceeds
65,536 characters.
"""
try:
serialized = json.dumps(values)
except (TypeError, ValueError):
# Non-JSON-serializable payload (e.g. programmatic use with Pydantic
# model instances). The size guard only applies to wire-format HTTP
# requests which FastAPI always parses into JSON-compatible dicts.
return values
if len(serialized) > 65_536:
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import json

samples = [
    '{"input":"é"}',                 # non-ASCII character
    '{"input":"' + ("x" * 20) + '"}' # compact ASCII payload
]

for raw in samples:
    obj = json.loads(raw)
    current = json.dumps(obj)  # current implementation
    compact = json.dumps(obj, separators=(",", ":"), ensure_ascii=False)

    print("raw_len=", len(raw), "current_len=", len(current), "compact_len=", len(compact))
    print("current:", current)
    print("compact:", compact)
    print("---")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 308


🏁 Script executed:

# Verify the file exists and check the relevant lines
head -n 775 src/models/requests.py | tail -n 35

Repository: lightspeed-core/lightspeed-stack

Length of output: 1508


🏁 Script executed:

# Check if a constant is defined for the 65536 limit in constants.py
find . -name "constants.py" -type f -exec grep -l "65536\|65_536" {} \;

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Search for any reference to body size limits in the codebase
rg "65.?536" --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2249


Body-size validation overcounts non-ASCII and compact JSON payloads.

The validator uses json.dumps(values) with default parameters, which applies ASCII escaping and spaced separators. This inflates the character count compared to the actual client payload—particularly for non-ASCII characters (which expand 40%+ when escaped as \uXXXX) and even ASCII payloads (which gain spaces after colons). Requests near the 65,536-character boundary will be incorrectly rejected.

Additionally, the magic number 65_536 should be extracted to constants.py per the coding guidelines.

💡 Suggested fix
-            serialized = json.dumps(values)
+            serialized = json.dumps(values, separators=(",", ":"), ensure_ascii=False)

Also extract the limit to constants.py and import it here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 742 - 769, The validate_body_size
model_validator currently serializes the input with json.dumps defaults which
escapes non-ASCII and adds spaces, inflating the character count; update the
validate_body_size classmethod to import a new constant (e.g.,
REQUEST_BODY_MAX_CHARS) from constants.py and use json.dumps(values,
ensure_ascii=False, separators=(",", ":")) to get a compact, accurate
wire-format length check, then compare len(serialized) against
REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the
constant in constants.py (set to 65536) and update the imports in
src/models/requests.py accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — no change needed.

The default json.dumps separators add ~0.2% overhead (a few hundred chars on a 64KB payload). This makes the validator slightly more conservative (rejects marginally earlier), which is the safe direction. The ensure_ascii=True default similarly inflates non-ASCII — again, conservative.

An exact wire-format match is impossible regardless of separator/encoding choices, because Python's json.dumps output is never byte-identical to what the client sent (key ordering, whitespace, encoding all vary by client). Adding ensure_ascii=False, separators=(",",":") would create a false sense of precision without meaningfully changing the behavior.

The current implementation is correct and intentionally simple.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lifto, that's a well-reasoned explanation — the conservative overcount is clearly the right trade-off here, and the "false precision" point about separators/ensure_ascii is a good one. I'll withdraw the suggestion.

(╯°□°)╯ ← ┻━┻


✏️ Learnings added
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Comment thread src/models/requests.py Outdated
Comment thread src/models/rlsapi/requests.py Outdated
@Lifto Lifto force-pushed the feat/rspeed-2875-input-size-limits branch from 4e8d0a0 to 7ef053a Compare April 15, 2026 18:37
@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Apr 16, 2026

/ok-to-test

@tisnik tisnik requested a review from asimurka April 17, 2026 11:50
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase, the OpenAPI spec probably should not be there

Copy link
Copy Markdown
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall. Please add those limits to constants.py to avoid magic numbers in code.

…y size validator

- Raise question field max_length in RlsapiV1InferRequest from 10,240 to
  32,768 chars to match the CLA client's 32KB input limit. Previously,
  valid CLA inputs would be rejected with a spurious 422.

- Add @model_validator(mode='before') to ResponsesRequest that serialises
  the raw request body with json.dumps and rejects payloads whose total
  character count exceeds 65,536. The /v1/responses endpoint previously
  accepted arbitrarily large payloads without any guard.

- Update the existing parametrised unit test and integration size-limit
  test to match the new 32,768 threshold.

- Add tests/unit/models/requests/test_responses_request.py with four
  boundary tests: normal request accepted, at-limit (65,536 chars)
  accepted, one-over (65,537) rejected with clear error message, and
  large list input rejected.

Relates-to: RSPEED-2875
@Lifto Lifto force-pushed the feat/rspeed-2875-input-size-limits branch from 7ef053a to fb3efda Compare April 17, 2026 15:53
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik requested a review from asimurka April 17, 2026 15:59
Copy link
Copy Markdown
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 3c291f8 into lightspeed-core:main Apr 17, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants